* [PATCH v5 0/4] Cleaning printk stuff in NMI context @ 2016-04-21 11:48 Petr Mladek 2016-04-21 11:48 ` [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI Petr Mladek ` (4 more replies) 0 siblings, 5 replies; 32+ messages in thread From: Petr Mladek @ 2016-04-21 11:48 UTC (permalink / raw) To: Andrew Morton Cc: Peter Zijlstra, Steven Rostedt, Russell King, Daniel Thompson, Jiri Kosina, Ingo Molnar, Thomas Gleixner, Sergey Senozhatsky, Chris Metcalf, linux-kernel, x86, linux-arm-kernel, adi-buildroot-devel, linux-cris-kernel, linux-mips, linuxppc-dev, linux-s390, linux-sh, sparclinux, Petr Mladek This patch set generalizes the already existing solution for printing NMI messages. The main idea comes from Peter Zijlstra. v5 adds changes suggested by Sergey Senozhatsky. It should not longer conflict with his async printk patchset. There are some conflicts with the nmi_backtrace improvements from Chris Metcalf, see https://lkml.kernel.org/g/<1459877208-15119-1-git-send-email-cmetcalf@mellanox.com> Feel free to ask me to resolve them. Changes against v4: + merged 2nd patch into the 1st one to do not break bisection + set vprintk_nmi() early in nmi_enter() and set vprintk_default() later in nmi_exit() to make sure that all NMI messages are printed to the temporary ring buffer + used printk_deferred() when flushing the temporary buffers on panic in NMI; we do not longer need to touch vprintk_emit() and define the ugly macro deferred_console_in_nmi(). Changes against v3: + merged all small changes from -mm tree, including commit descriptions + disabled interrupts when taking the read_lock in __printk_nmi_flush(); printk_nmi_flush() might be called from any context; reported by lockdep + never introduce NEED_PRINTK_NMI; in -mm tree was introduced in 1st patch and removed in the 4th one + flush NMI buffers when the system goes down (new 5th patch); addresses Daniel's concerns Changes against v2: + fixed compilation problems reported by 0-day build robot + MN10300 and Xtensa architectures will get handled separately + dropped the patch that printed NMI messages directly when Oops in progress; it made the solution less reliable + made the size of the buffer configurable; use real numbers instead of PAGE_SIZE Changes against v1: + rebased on top of 4.4-rc2; there the old implementation was moved to lib/nmi_backtrace.c and used also on arm; I hope that I got the arm side correctly; I was not able to test on arm :-( + defined HAVE_NMI on arm for !CPU_V7M instead of !CPU_V7; handle_fiq_as_nmi() is called from entry-armv.S that is compiled when !CPU_V7M + defined HAVE_NMI also on mips; it calls nmi_enter() and seems to have real NMIs (or am I wrong?) + serialized backtraces when printing directly (oops_in_progress) Petr Mladek (4): printk/nmi: generic solution for safe printk in NMI printk/nmi: warn when some message has been lost in NMI context printk/nmi: increase the size of NMI buffer and make it configurable printk/nmi: flush NMI messages on the system panic arch/Kconfig | 4 + arch/arm/Kconfig | 1 + arch/arm/kernel/smp.c | 2 + arch/avr32/Kconfig | 1 + arch/blackfin/Kconfig | 1 + arch/cris/Kconfig | 1 + arch/mips/Kconfig | 1 + arch/powerpc/Kconfig | 1 + arch/s390/Kconfig | 1 + arch/sh/Kconfig | 1 + arch/sparc/Kconfig | 1 + arch/tile/Kconfig | 1 + arch/x86/Kconfig | 1 + arch/x86/kernel/apic/hw_nmi.c | 1 - include/linux/hardirq.h | 2 + include/linux/percpu.h | 3 - include/linux/printk.h | 14 ++- init/Kconfig | 27 +++++ init/main.c | 1 + kernel/kexec_core.c | 1 + kernel/panic.c | 6 +- kernel/printk/Makefile | 1 + kernel/printk/internal.h | 57 +++++++++ kernel/printk/nmi.c | 260 ++++++++++++++++++++++++++++++++++++++++++ kernel/printk/printk.c | 31 ++--- lib/nmi_backtrace.c | 89 +-------------- 26 files changed, 401 insertions(+), 109 deletions(-) create mode 100644 kernel/printk/internal.h create mode 100644 kernel/printk/nmi.c -- 1.8.5.6 ^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI 2016-04-21 11:48 [PATCH v5 0/4] Cleaning printk stuff in NMI context Petr Mladek @ 2016-04-21 11:48 ` Petr Mladek 2016-04-27 9:31 ` Russell King - ARM Linux 2017-04-19 17:13 ` Steven Rostedt 2016-04-21 11:48 ` [PATCH v5 2/4] printk/nmi: warn when some message has been lost in NMI context Petr Mladek ` (3 subsequent siblings) 4 siblings, 2 replies; 32+ messages in thread From: Petr Mladek @ 2016-04-21 11:48 UTC (permalink / raw) To: Andrew Morton Cc: Peter Zijlstra, Steven Rostedt, Russell King, Daniel Thompson, Jiri Kosina, Ingo Molnar, Thomas Gleixner, Sergey Senozhatsky, Chris Metcalf, linux-kernel, x86, linux-arm-kernel, adi-buildroot-devel, linux-cris-kernel, linux-mips, linuxppc-dev, linux-s390, linux-sh, sparclinux, Petr Mladek, Jan Kara, Ralf Baechle, Benjamin Herrenschmidt, Martin Schwidefsky, David Miller printk() takes some locks and could not be used a safe way in NMI context. The chance of a deadlock is real especially when printing stacks from all CPUs. This particular problem has been addressed on x86 by the commit a9edc8809328 ("x86/nmi: Perform a safe NMI stack trace on all CPUs"). The patchset brings two big advantages. First, it makes the NMI backtraces safe on all architectures for free. Second, it makes all NMI messages almost safe on all architectures (the temporary buffer is limited. We still should keep the number of messages in NMI context at minimum). Note that there already are several messages printed in NMI context: WARN_ON(in_nmi()), BUG_ON(in_nmi()), anything being printed out from MCE handlers. These are not easy to avoid. This patch reuses most of the code and makes it generic. It is useful for all messages and architectures that support NMI. The alternative printk_func is set when entering and is reseted when leaving NMI context. It queues IRQ work to copy the messages into the main ring buffer in a safe context. __printk_nmi_flush() copies all available messages and reset the buffer. Then we could use a simple cmpxchg operations to get synchronized with writers. There is also used a spinlock to get synchronized with other flushers. We do not longer use seq_buf because it depends on external lock. It would be hard to make all supported operations safe for a lockless use. It would be confusing and error prone to make only some operations safe. The code is put into separate printk/nmi.c as suggested by Steven Rostedt. It needs a per-CPU buffer and is compiled only on architectures that call nmi_enter(). This is achieved by the new HAVE_NMI Kconfig flag. The are MN10300 and Xtensa architectures. We need to clean up NMI handling there first. Let's do it separately. The patch is heavily based on the draft from Peter Zijlstra, see https://lkml.org/lkml/2015/6/10/327 [arnd@arndb.de: printk-nmi: use %zu format string for size_t] [akpm@linux-foundation.org: min_t->min - all types are size_t here] Signed-off-by: Petr Mladek <pmladek@suse.com> Suggested-by: Peter Zijlstra <peterz@infradead.org> Suggested-by: Steven Rostedt <rostedt@goodmis.org> Cc: Jan Kara <jack@suse.cz> Cc: Russell King <rmk+kernel@arm.linux.org.uk> Cc: Daniel Thompson <daniel.thompson@linaro.org> Cc: Jiri Kosina <jkosina@suse.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: David Miller <davem@davemloft.net> Cc: Daniel Thompson <daniel.thompson@linaro.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- arch/Kconfig | 4 + arch/arm/Kconfig | 1 + arch/arm/kernel/smp.c | 2 + arch/avr32/Kconfig | 1 + arch/blackfin/Kconfig | 1 + arch/cris/Kconfig | 1 + arch/mips/Kconfig | 1 + arch/powerpc/Kconfig | 1 + arch/s390/Kconfig | 1 + arch/sh/Kconfig | 1 + arch/sparc/Kconfig | 1 + arch/tile/Kconfig | 1 + arch/x86/Kconfig | 1 + arch/x86/kernel/apic/hw_nmi.c | 1 - include/linux/hardirq.h | 2 + include/linux/percpu.h | 3 - include/linux/printk.h | 12 ++- init/Kconfig | 5 + init/main.c | 1 + kernel/printk/Makefile | 1 + kernel/printk/internal.h | 44 +++++++++ kernel/printk/nmi.c | 219 ++++++++++++++++++++++++++++++++++++++++++ kernel/printk/printk.c | 19 +--- lib/nmi_backtrace.c | 89 +---------------- 24 files changed, 306 insertions(+), 107 deletions(-) create mode 100644 kernel/printk/internal.h create mode 100644 kernel/printk/nmi.c diff --git a/arch/Kconfig b/arch/Kconfig index 81869a5e7e17..049f243a7af8 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -187,7 +187,11 @@ config HAVE_OPTPROBES config HAVE_KPROBES_ON_FTRACE bool +config HAVE_NMI + bool + config HAVE_NMI_WATCHDOG + depends on HAVE_NMI bool # # An arch should select this if it provides all these things: diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index cdfa6c2b7626..259543ec6dc9 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -66,6 +66,7 @@ config ARM select HAVE_KRETPROBES if (HAVE_KPROBES) select HAVE_MEMBLOCK select HAVE_MOD_ARCH_SPECIFIC + select HAVE_NMI select HAVE_OPROFILE if (HAVE_PERF_EVENTS) select HAVE_OPTPROBES if !THUMB2_KERNEL select HAVE_PERF_EVENTS diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c index baee70267f29..df90bc59bfce 100644 --- a/arch/arm/kernel/smp.c +++ b/arch/arm/kernel/smp.c @@ -644,9 +644,11 @@ void handle_IPI(int ipinr, struct pt_regs *regs) break; case IPI_CPU_BACKTRACE: + printk_nmi_enter(); irq_enter(); nmi_cpu_backtrace(regs); irq_exit(); + printk_nmi_exit(); break; default: diff --git a/arch/avr32/Kconfig b/arch/avr32/Kconfig index b6878eb64884..9dc3e2b1180b 100644 --- a/arch/avr32/Kconfig +++ b/arch/avr32/Kconfig @@ -17,6 +17,7 @@ config AVR32 select GENERIC_CLOCKEVENTS select HAVE_MOD_ARCH_SPECIFIC select MODULES_USE_ELF_RELA + select HAVE_NMI help AVR32 is a high-performance 32-bit RISC microprocessor core, designed for cost-sensitive embedded applications, with particular diff --git a/arch/blackfin/Kconfig b/arch/blackfin/Kconfig index a63c12259e77..28c63fea786d 100644 --- a/arch/blackfin/Kconfig +++ b/arch/blackfin/Kconfig @@ -40,6 +40,7 @@ config BLACKFIN select HAVE_MOD_ARCH_SPECIFIC select MODULES_USE_ELF_RELA select HAVE_DEBUG_STACKOVERFLOW + select HAVE_NMI config GENERIC_CSUM def_bool y diff --git a/arch/cris/Kconfig b/arch/cris/Kconfig index e086f9e93728..62148eaf0189 100644 --- a/arch/cris/Kconfig +++ b/arch/cris/Kconfig @@ -69,6 +69,7 @@ config CRIS select GENERIC_CLOCKEVENTS if ETRAX_ARCH_V32 select GENERIC_SCHED_CLOCK if ETRAX_ARCH_V32 select HAVE_DEBUG_BUGVERBOSE if ETRAX_ARCH_V32 + select HAVE_NMI config HZ int diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig index 2018c2b0e078..7de958bb19f1 100644 --- a/arch/mips/Kconfig +++ b/arch/mips/Kconfig @@ -62,6 +62,7 @@ config MIPS select HAVE_IRQ_TIME_ACCOUNTING select GENERIC_TIME_VSYSCALL select ARCH_CLOCKSOURCE_DATA + select HAVE_NMI menu "Machine selection" diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 7cd32c038286..f99b19d2d572 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -153,6 +153,7 @@ config PPC select NO_BOOTMEM select HAVE_GENERIC_RCU_GUP select HAVE_PERF_EVENTS_NMI if PPC64 + select HAVE_NMI if PERF_EVENTS select EDAC_SUPPORT select EDAC_ATOMIC_SCRUB select ARCH_HAS_DMA_SET_COHERENT_MASK diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig index bf24ab188921..42d315fb8a00 100644 --- a/arch/s390/Kconfig +++ b/arch/s390/Kconfig @@ -164,6 +164,7 @@ config S390 select TTY select VIRT_CPU_ACCOUNTING select VIRT_TO_BUS + select HAVE_NMI config SCHED_OMIT_FRAME_POINTER diff --git a/arch/sh/Kconfig b/arch/sh/Kconfig index 7ed20fc3fc81..1725e365d751 100644 --- a/arch/sh/Kconfig +++ b/arch/sh/Kconfig @@ -44,6 +44,7 @@ config SUPERH select OLD_SIGSUSPEND select OLD_SIGACTION select HAVE_ARCH_AUDITSYSCALL + select HAVE_NMI help The SuperH is a RISC processor targeted for use in embedded systems and consumer electronics; it was also used in the Sega Dreamcast diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig index 57ffaf285c2f..a900e7f35f6a 100644 --- a/arch/sparc/Kconfig +++ b/arch/sparc/Kconfig @@ -79,6 +79,7 @@ config SPARC64 select NO_BOOTMEM select HAVE_ARCH_AUDITSYSCALL select ARCH_SUPPORTS_ATOMIC_RMW + select HAVE_NMI config ARCH_DEFCONFIG string diff --git a/arch/tile/Kconfig b/arch/tile/Kconfig index 81719302b056..cedcbbefcd1c 100644 --- a/arch/tile/Kconfig +++ b/arch/tile/Kconfig @@ -29,6 +29,7 @@ config TILE select HAVE_DEBUG_STACKOVERFLOW select ARCH_WANT_FRAME_POINTERS select HAVE_CONTEXT_TRACKING + select HAVE_NMI if USE_PMC select EDAC_SUPPORT select GENERIC_STRNCPY_FROM_USER select GENERIC_STRNLEN_USER diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 2dc18605831f..30ea1f834a06 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -130,6 +130,7 @@ config X86 select HAVE_MEMBLOCK select HAVE_MEMBLOCK_NODE_MAP select HAVE_MIXED_BREAKPOINTS_REGS + select HAVE_NMI select HAVE_OPROFILE select HAVE_OPTPROBES select HAVE_PCSPKR_PLATFORM diff --git a/arch/x86/kernel/apic/hw_nmi.c b/arch/x86/kernel/apic/hw_nmi.c index 045e424fb368..7788ce643bf4 100644 --- a/arch/x86/kernel/apic/hw_nmi.c +++ b/arch/x86/kernel/apic/hw_nmi.c @@ -18,7 +18,6 @@ #include <linux/nmi.h> #include <linux/module.h> #include <linux/delay.h> -#include <linux/seq_buf.h> #ifdef CONFIG_HARDLOCKUP_DETECTOR u64 hw_nmi_get_sample_period(int watchdog_thresh) diff --git a/include/linux/hardirq.h b/include/linux/hardirq.h index dfd59d6bc6f0..c683996110b1 100644 --- a/include/linux/hardirq.h +++ b/include/linux/hardirq.h @@ -61,6 +61,7 @@ extern void irq_exit(void); #define nmi_enter() \ do { \ + printk_nmi_enter(); \ lockdep_off(); \ ftrace_nmi_enter(); \ BUG_ON(in_nmi()); \ @@ -77,6 +78,7 @@ extern void irq_exit(void); preempt_count_sub(NMI_OFFSET + HARDIRQ_OFFSET); \ ftrace_nmi_exit(); \ lockdep_on(); \ + printk_nmi_exit(); \ } while (0) #endif /* LINUX_HARDIRQ_H */ diff --git a/include/linux/percpu.h b/include/linux/percpu.h index 4bc6dafb703e..56939d3f6e53 100644 --- a/include/linux/percpu.h +++ b/include/linux/percpu.h @@ -129,7 +129,4 @@ extern phys_addr_t per_cpu_ptr_to_phys(void *addr); (typeof(type) __percpu *)__alloc_percpu(sizeof(type), \ __alignof__(type)) -/* To avoid include hell, as printk can not declare this, we declare it here */ -DECLARE_PER_CPU(printk_func_t, printk_func); - #endif /* __LINUX_PERCPU_H */ diff --git a/include/linux/printk.h b/include/linux/printk.h index 9ccbdf2c1453..51dd6b824fe2 100644 --- a/include/linux/printk.h +++ b/include/linux/printk.h @@ -122,7 +122,17 @@ static inline __printf(1, 2) __cold void early_printk(const char *s, ...) { } #endif -typedef __printf(1, 0) int (*printk_func_t)(const char *fmt, va_list args); +#ifdef CONFIG_PRINTK_NMI +extern void printk_nmi_init(void); +extern void printk_nmi_enter(void); +extern void printk_nmi_exit(void); +extern void printk_nmi_flush(void); +#else +static inline void printk_nmi_init(void) { } +static inline void printk_nmi_enter(void) { } +static inline void printk_nmi_exit(void) { } +static inline void printk_nmi_flush(void) { } +#endif /* PRINTK_NMI */ #ifdef CONFIG_PRINTK asmlinkage __printf(5, 0) diff --git a/init/Kconfig b/init/Kconfig index 0dfd09d54c65..85c7a2bf1ea4 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -1454,6 +1454,11 @@ config PRINTK very difficult to diagnose system problems, saying N here is strongly discouraged. +config PRINTK_NMI + def_bool y + depends on PRINTK + depends on HAVE_NMI + config BUG bool "BUG() support" if EXPERT default y diff --git a/init/main.c b/init/main.c index b3c6e363ae18..c0d3ff91c8e0 100644 --- a/init/main.c +++ b/init/main.c @@ -569,6 +569,7 @@ asmlinkage __visible void __init start_kernel(void) timekeeping_init(); time_init(); sched_clock_postinit(); + printk_nmi_init(); perf_event_init(); profile_init(); call_function_init(); diff --git a/kernel/printk/Makefile b/kernel/printk/Makefile index 85405bdcf2b3..abb0042a427b 100644 --- a/kernel/printk/Makefile +++ b/kernel/printk/Makefile @@ -1,2 +1,3 @@ obj-y = printk.o +obj-$(CONFIG_PRINTK_NMI) += nmi.o obj-$(CONFIG_A11Y_BRAILLE_CONSOLE) += braille.o diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h new file mode 100644 index 000000000000..2de99faedfc1 --- /dev/null +++ b/kernel/printk/internal.h @@ -0,0 +1,44 @@ +/* + * internal.h - printk internal definitions + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, see <http://www.gnu.org/licenses/>. + */ +#include <linux/percpu.h> + +typedef __printf(1, 0) int (*printk_func_t)(const char *fmt, va_list args); + +int __printf(1, 0) vprintk_default(const char *fmt, va_list args); + +#ifdef CONFIG_PRINTK_NMI + +/* + * printk() could not take logbuf_lock in NMI context. Instead, + * it temporary stores the strings into a per-CPU buffer. + * The alternative implementation is chosen transparently + * via per-CPU variable. + */ +DECLARE_PER_CPU(printk_func_t, printk_func); +static inline __printf(1, 0) int vprintk_func(const char *fmt, va_list args) +{ + return this_cpu_read(printk_func)(fmt, args); +} + +#else /* CONFIG_PRINTK_NMI */ + +static inline __printf(1, 0) int vprintk_func(const char *fmt, va_list args) +{ + return vprintk_default(fmt, args); +} + +#endif /* CONFIG_PRINTK_NMI */ diff --git a/kernel/printk/nmi.c b/kernel/printk/nmi.c new file mode 100644 index 000000000000..303cf0d15e57 --- /dev/null +++ b/kernel/printk/nmi.c @@ -0,0 +1,219 @@ +/* + * nmi.c - Safe printk in NMI context + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, see <http://www.gnu.org/licenses/>. + */ + +#include <linux/preempt.h> +#include <linux/spinlock.h> +#include <linux/smp.h> +#include <linux/cpumask.h> +#include <linux/irq_work.h> +#include <linux/printk.h> + +#include "internal.h" + +/* + * printk() could not take logbuf_lock in NMI context. Instead, + * it uses an alternative implementation that temporary stores + * the strings into a per-CPU buffer. The content of the buffer + * is later flushed into the main ring buffer via IRQ work. + * + * The alternative implementation is chosen transparently + * via @printk_func per-CPU variable. + * + * The implementation allows to flush the strings also from another CPU. + * There are situations when we want to make sure that all buffers + * were handled or when IRQs are blocked. + */ +DEFINE_PER_CPU(printk_func_t, printk_func) = vprintk_default; +static int printk_nmi_irq_ready; + +#define NMI_LOG_BUF_LEN (4096 - sizeof(atomic_t) - sizeof(struct irq_work)) + +struct nmi_seq_buf { + atomic_t len; /* length of written data */ + struct irq_work work; /* IRQ work that flushes the buffer */ + unsigned char buffer[NMI_LOG_BUF_LEN]; +}; +static DEFINE_PER_CPU(struct nmi_seq_buf, nmi_print_seq); + +/* + * Safe printk() for NMI context. It uses a per-CPU buffer to + * store the message. NMIs are not nested, so there is always only + * one writer running. But the buffer might get flushed from another + * CPU, so we need to be careful. + */ +static int vprintk_nmi(const char *fmt, va_list args) +{ + struct nmi_seq_buf *s = this_cpu_ptr(&nmi_print_seq); + int add = 0; + size_t len; + +again: + len = atomic_read(&s->len); + + if (len >= sizeof(s->buffer)) + return 0; + + /* + * Make sure that all old data have been read before the buffer was + * reseted. This is not needed when we just append data. + */ + if (!len) + smp_rmb(); + + add = vsnprintf(s->buffer + len, sizeof(s->buffer) - len, fmt, args); + + /* + * Do it once again if the buffer has been flushed in the meantime. + * Note that atomic_cmpxchg() is an implicit memory barrier that + * makes sure that the data were written before updating s->len. + */ + if (atomic_cmpxchg(&s->len, len, len + add) != len) + goto again; + + /* Get flushed in a more safe context. */ + if (add && printk_nmi_irq_ready) { + /* Make sure that IRQ work is really initialized. */ + smp_rmb(); + irq_work_queue(&s->work); + } + + return add; +} + +/* + * printk one line from the temporary buffer from @start index until + * and including the @end index. + */ +static void print_nmi_seq_line(struct nmi_seq_buf *s, int start, int end) +{ + const char *buf = s->buffer + start; + + printk("%.*s", (end - start) + 1, buf); +} + +/* + * Flush data from the associated per_CPU buffer. The function + * can be called either via IRQ work or independently. + */ +static void __printk_nmi_flush(struct irq_work *work) +{ + static raw_spinlock_t read_lock = + __RAW_SPIN_LOCK_INITIALIZER(read_lock); + struct nmi_seq_buf *s = container_of(work, struct nmi_seq_buf, work); + unsigned long flags; + size_t len, size; + int i, last_i; + + /* + * The lock has two functions. First, one reader has to flush all + * available message to make the lockless synchronization with + * writers easier. Second, we do not want to mix messages from + * different CPUs. This is especially important when printing + * a backtrace. + */ + raw_spin_lock_irqsave(&read_lock, flags); + + i = 0; +more: + len = atomic_read(&s->len); + + /* + * This is just a paranoid check that nobody has manipulated + * the buffer an unexpected way. If we printed something then + * @len must only increase. + */ + if (i && i >= len) + pr_err("printk_nmi_flush: internal error: i=%d >= len=%zu\n", + i, len); + + if (!len) + goto out; /* Someone else has already flushed the buffer. */ + + /* Make sure that data has been written up to the @len */ + smp_rmb(); + + size = min(len, sizeof(s->buffer)); + last_i = i; + + /* Print line by line. */ + for (; i < size; i++) { + if (s->buffer[i] == '\n') { + print_nmi_seq_line(s, last_i, i); + last_i = i + 1; + } + } + /* Check if there was a partial line. */ + if (last_i < size) { + print_nmi_seq_line(s, last_i, size - 1); + pr_cont("\n"); + } + + /* + * Check that nothing has got added in the meantime and truncate + * the buffer. Note that atomic_cmpxchg() is an implicit memory + * barrier that makes sure that the data were copied before + * updating s->len. + */ + if (atomic_cmpxchg(&s->len, len, 0) != len) + goto more; + +out: + raw_spin_unlock_irqrestore(&read_lock, flags); +} + +/** + * printk_nmi_flush - flush all per-cpu nmi buffers. + * + * The buffers are flushed automatically via IRQ work. This function + * is useful only when someone wants to be sure that all buffers have + * been flushed at some point. + */ +void printk_nmi_flush(void) +{ + int cpu; + + for_each_possible_cpu(cpu) + __printk_nmi_flush(&per_cpu(nmi_print_seq, cpu).work); +} + +void __init printk_nmi_init(void) +{ + int cpu; + + for_each_possible_cpu(cpu) { + struct nmi_seq_buf *s = &per_cpu(nmi_print_seq, cpu); + + init_irq_work(&s->work, __printk_nmi_flush); + } + + /* Make sure that IRQ works are initialized before enabling. */ + smp_wmb(); + printk_nmi_irq_ready = 1; + + /* Flush pending messages that did not have scheduled IRQ works. */ + printk_nmi_flush(); +} + +void printk_nmi_enter(void) +{ + this_cpu_write(printk_func, vprintk_nmi); +} + +void printk_nmi_exit(void) +{ + this_cpu_write(printk_func, vprintk_default); +} diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index bfbf284e4218..71eba0607034 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -55,6 +55,7 @@ #include "console_cmdline.h" #include "braille.h" +#include "internal.h" int console_printk[4] = { CONSOLE_LOGLEVEL_DEFAULT, /* console_loglevel */ @@ -1807,14 +1808,6 @@ int vprintk_default(const char *fmt, va_list args) } EXPORT_SYMBOL_GPL(vprintk_default); -/* - * This allows printk to be diverted to another function per cpu. - * This is useful for calling printk functions from within NMI - * without worrying about race conditions that can lock up the - * box. - */ -DEFINE_PER_CPU(printk_func_t, printk_func) = vprintk_default; - /** * printk - print a kernel message * @fmt: format string @@ -1838,21 +1831,11 @@ DEFINE_PER_CPU(printk_func_t, printk_func) = vprintk_default; */ asmlinkage __visible int printk(const char *fmt, ...) { - printk_func_t vprintk_func; va_list args; int r; va_start(args, fmt); - - /* - * If a caller overrides the per_cpu printk_func, then it needs - * to disable preemption when calling printk(). Otherwise - * the printk_func should be set to the default. No need to - * disable preemption here. - */ - vprintk_func = this_cpu_read(printk_func); r = vprintk_func(fmt, args); - va_end(args); return r; diff --git a/lib/nmi_backtrace.c b/lib/nmi_backtrace.c index 6019c53c669e..26caf51cc238 100644 --- a/lib/nmi_backtrace.c +++ b/lib/nmi_backtrace.c @@ -16,33 +16,14 @@ #include <linux/delay.h> #include <linux/kprobes.h> #include <linux/nmi.h> -#include <linux/seq_buf.h> #ifdef arch_trigger_all_cpu_backtrace /* For reliability, we're prepared to waste bits here. */ static DECLARE_BITMAP(backtrace_mask, NR_CPUS) __read_mostly; -static cpumask_t printtrace_mask; - -#define NMI_BUF_SIZE 4096 - -struct nmi_seq_buf { - unsigned char buffer[NMI_BUF_SIZE]; - struct seq_buf seq; -}; - -/* Safe printing in NMI context */ -static DEFINE_PER_CPU(struct nmi_seq_buf, nmi_print_seq); /* "in progress" flag of arch_trigger_all_cpu_backtrace */ static unsigned long backtrace_flag; -static void print_seq_line(struct nmi_seq_buf *s, int start, int end) -{ - const char *buf = s->buffer + start; - - printk("%.*s", (end - start) + 1, buf); -} - /* * When raise() is called it will be is passed a pointer to the * backtrace_mask. Architectures that call nmi_cpu_backtrace() @@ -52,8 +33,7 @@ static void print_seq_line(struct nmi_seq_buf *s, int start, int end) void nmi_trigger_all_cpu_backtrace(bool include_self, void (*raise)(cpumask_t *mask)) { - struct nmi_seq_buf *s; - int i, cpu, this_cpu = get_cpu(); + int i, this_cpu = get_cpu(); if (test_and_set_bit(0, &backtrace_flag)) { /* @@ -68,17 +48,6 @@ void nmi_trigger_all_cpu_backtrace(bool include_self, if (!include_self) cpumask_clear_cpu(this_cpu, to_cpumask(backtrace_mask)); - cpumask_copy(&printtrace_mask, to_cpumask(backtrace_mask)); - - /* - * Set up per_cpu seq_buf buffers that the NMIs running on the other - * CPUs will write to. - */ - for_each_cpu(cpu, to_cpumask(backtrace_mask)) { - s = &per_cpu(nmi_print_seq, cpu); - seq_buf_init(&s->seq, s->buffer, NMI_BUF_SIZE); - } - if (!cpumask_empty(to_cpumask(backtrace_mask))) { pr_info("Sending NMI to %s CPUs:\n", (include_self ? "all" : "other")); @@ -94,73 +63,25 @@ void nmi_trigger_all_cpu_backtrace(bool include_self, } /* - * Now that all the NMIs have triggered, we can dump out their - * back traces safely to the console. + * Force flush any remote buffers that might be stuck in IRQ context + * and therefore could not run their irq_work. */ - for_each_cpu(cpu, &printtrace_mask) { - int len, last_i = 0; + printk_nmi_flush(); - s = &per_cpu(nmi_print_seq, cpu); - len = seq_buf_used(&s->seq); - if (!len) - continue; - - /* Print line by line. */ - for (i = 0; i < len; i++) { - if (s->buffer[i] == '\n') { - print_seq_line(s, last_i, i); - last_i = i + 1; - } - } - /* Check if there was a partial line. */ - if (last_i < len) { - print_seq_line(s, last_i, len - 1); - pr_cont("\n"); - } - } - - clear_bit(0, &backtrace_flag); - smp_mb__after_atomic(); + clear_bit_unlock(0, &backtrace_flag); put_cpu(); } -/* - * It is not safe to call printk() directly from NMI handlers. - * It may be fine if the NMI detected a lock up and we have no choice - * but to do so, but doing a NMI on all other CPUs to get a back trace - * can be done with a sysrq-l. We don't want that to lock up, which - * can happen if the NMI interrupts a printk in progress. - * - * Instead, we redirect the vprintk() to this nmi_vprintk() that writes - * the content into a per cpu seq_buf buffer. Then when the NMIs are - * all done, we can safely dump the contents of the seq_buf to a printk() - * from a non NMI context. - */ -static int nmi_vprintk(const char *fmt, va_list args) -{ - struct nmi_seq_buf *s = this_cpu_ptr(&nmi_print_seq); - unsigned int len = seq_buf_used(&s->seq); - - seq_buf_vprintf(&s->seq, fmt, args); - return seq_buf_used(&s->seq) - len; -} - bool nmi_cpu_backtrace(struct pt_regs *regs) { int cpu = smp_processor_id(); if (cpumask_test_cpu(cpu, to_cpumask(backtrace_mask))) { - printk_func_t printk_func_save = this_cpu_read(printk_func); - - /* Replace printk to write into the NMI seq */ - this_cpu_write(printk_func, nmi_vprintk); pr_warn("NMI backtrace for cpu %d\n", cpu); if (regs) show_regs(regs); else dump_stack(); - this_cpu_write(printk_func, printk_func_save); - cpumask_clear_cpu(cpu, to_cpumask(backtrace_mask)); return true; } -- 1.8.5.6 ^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI 2016-04-21 11:48 ` [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI Petr Mladek @ 2016-04-27 9:31 ` Russell King - ARM Linux 2017-04-19 17:13 ` Steven Rostedt 1 sibling, 0 replies; 32+ messages in thread From: Russell King - ARM Linux @ 2016-04-27 9:31 UTC (permalink / raw) To: Petr Mladek Cc: Andrew Morton, Peter Zijlstra, Steven Rostedt, Daniel Thompson, Jiri Kosina, Ingo Molnar, Thomas Gleixner, Sergey Senozhatsky, Chris Metcalf, linux-kernel, x86, linux-arm-kernel, adi-buildroot-devel, linux-cris-kernel, linux-mips, linuxppc-dev, linux-s390, linux-sh, sparclinux, Jan Kara, Ralf Baechle, Benjamin Herrenschmidt, Martin Schwidefsky, David Miller On Thu, Apr 21, 2016 at 01:48:42PM +0200, Petr Mladek wrote: > diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig > index cdfa6c2b7626..259543ec6dc9 100644 > --- a/arch/arm/Kconfig > +++ b/arch/arm/Kconfig > @@ -66,6 +66,7 @@ config ARM > select HAVE_KRETPROBES if (HAVE_KPROBES) > select HAVE_MEMBLOCK > select HAVE_MOD_ARCH_SPECIFIC > + select HAVE_NMI > select HAVE_OPROFILE if (HAVE_PERF_EVENTS) > select HAVE_OPTPROBES if !THUMB2_KERNEL > select HAVE_PERF_EVENTS > diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c > index baee70267f29..df90bc59bfce 100644 > --- a/arch/arm/kernel/smp.c > +++ b/arch/arm/kernel/smp.c > @@ -644,9 +644,11 @@ void handle_IPI(int ipinr, struct pt_regs *regs) > break; > > case IPI_CPU_BACKTRACE: > + printk_nmi_enter(); > irq_enter(); > nmi_cpu_backtrace(regs); > irq_exit(); > + printk_nmi_exit(); > break; > > default: For the above, Acked-by: Russell King <rmk+kernel@arm.linux.org.uk> -- RMK's Patch system: http://www.arm.linux.org.uk/developer/patches/ FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI 2016-04-21 11:48 ` [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI Petr Mladek 2016-04-27 9:31 ` Russell King - ARM Linux @ 2017-04-19 17:13 ` Steven Rostedt 2017-04-19 17:21 ` Peter Zijlstra 2017-04-20 3:31 ` Sergey Senozhatsky 1 sibling, 2 replies; 32+ messages in thread From: Steven Rostedt @ 2017-04-19 17:13 UTC (permalink / raw) To: Petr Mladek Cc: Andrew Morton, Peter Zijlstra, Russell King, Daniel Thompson, Jiri Kosina, Ingo Molnar, Thomas Gleixner, Sergey Senozhatsky, Chris Metcalf, linux-kernel, x86, linux-arm-kernel, adi-buildroot-devel, linux-cris-kernel, linux-mips, linuxppc-dev, linux-s390, linux-sh, sparclinux, Jan Kara, Ralf Baechle, Benjamin Herrenschmidt, Martin Schwidefsky, David Miller On Thu, 21 Apr 2016 13:48:42 +0200 Petr Mladek <pmladek@suse.com> wrote: > printk() takes some locks and could not be used a safe way in NMI context. I just found a problem with this solution. It kills ftrace dumps from NMI context :-( [ 1295.168495] <...>-67423 10dNh1 382171111us : do_raw_spin_lock <-_raw_spin_lock [ 1295.168495] <...>-67423 10dNh1 382171111us : sched_stat_runtime: comm=cc1 pid=67423 runtime=96858 [ns] vruntime=11924198270 [ns] [ 1295.168496] <...>-67423 10dNh1 382171111us : lock_acquire: ffffffff81c5c940 read rcu_read_lock [ 1295.168497] [ 1295.168498] Lost 4890096 message(s)! [ 1296.805063] ---[ end Kernel panic - not syncing: Hard LOCKUP [ 1296.811553] unchecked MSR access error: WRMSR to 0x83f (tried to write 0x00000000000000f6) at rIP: 0xffffffff81046fc7 (native_apic_msr_write+0x27/0x40) [ 1296.811553] Call Trace: [ 1296.811553] <NMI> I was hoping to see a cause of a hard lockup by enabling ftrace_dump_on_oops. But as NMIs now have a very small buffer that gets flushed, we need to find a new way to print out the full ftrace buffer over serial. Thoughts? -- Steve > > The chance of a deadlock is real especially when printing stacks from all > CPUs. This particular problem has been addressed on x86 by the commit > a9edc8809328 ("x86/nmi: Perform a safe NMI stack trace on all CPUs"). > > The patchset brings two big advantages. First, it makes the NMI > backtraces safe on all architectures for free. Second, it makes all NMI > messages almost safe on all architectures (the temporary buffer is > limited. We still should keep the number of messages in NMI context at > minimum). > > Note that there already are several messages printed in NMI context: > WARN_ON(in_nmi()), BUG_ON(in_nmi()), anything being printed out from MCE > handlers. These are not easy to avoid. > > This patch reuses most of the code and makes it generic. It is useful for > all messages and architectures that support NMI. > > The alternative printk_func is set when entering and is reseted when > leaving NMI context. It queues IRQ work to copy the messages into the > main ring buffer in a safe context. > > __printk_nmi_flush() copies all available messages and reset the buffer. > Then we could use a simple cmpxchg operations to get synchronized with > writers. There is also used a spinlock to get synchronized with other > flushers. > > We do not longer use seq_buf because it depends on external lock. It > would be hard to make all supported operations safe for a lockless use. > It would be confusing and error prone to make only some operations safe. > > The code is put into separate printk/nmi.c as suggested by Steven Rostedt. > It needs a per-CPU buffer and is compiled only on architectures that call > nmi_enter(). This is achieved by the new HAVE_NMI Kconfig flag. > > The are MN10300 and Xtensa architectures. We need to clean up NMI > handling there first. Let's do it separately. > > The patch is heavily based on the draft from Peter Zijlstra, see > https://lkml.org/lkml/2015/6/10/327 > > [arnd@arndb.de: printk-nmi: use %zu format string for size_t] > [akpm@linux-foundation.org: min_t->min - all types are size_t here] > Signed-off-by: Petr Mladek <pmladek@suse.com> > Suggested-by: Peter Zijlstra <peterz@infradead.org> > Suggested-by: Steven Rostedt <rostedt@goodmis.org> > Cc: Jan Kara <jack@suse.cz> > Cc: Russell King <rmk+kernel@arm.linux.org.uk> > Cc: Daniel Thompson <daniel.thompson@linaro.org> > Cc: Jiri Kosina <jkosina@suse.com> > Cc: Ingo Molnar <mingo@redhat.com> > Cc: Thomas Gleixner <tglx@linutronix.de> > Cc: Ralf Baechle <ralf@linux-mips.org> > Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> > Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> > Cc: David Miller <davem@davemloft.net> > Cc: Daniel Thompson <daniel.thompson@linaro.org> > Signed-off-by: Andrew Morton <akpm@linux-foundation.org> ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI 2017-04-19 17:13 ` Steven Rostedt @ 2017-04-19 17:21 ` Peter Zijlstra 2017-04-20 3:31 ` Sergey Senozhatsky 1 sibling, 0 replies; 32+ messages in thread From: Peter Zijlstra @ 2017-04-19 17:21 UTC (permalink / raw) To: Steven Rostedt Cc: Petr Mladek, Andrew Morton, Russell King, Daniel Thompson, Jiri Kosina, Ingo Molnar, Thomas Gleixner, Sergey Senozhatsky, Chris Metcalf, linux-kernel, x86, linux-arm-kernel, adi-buildroot-devel, linux-cris-kernel, linux-mips, linuxppc-dev, linux-s390, linux-sh, sparclinux, Jan Kara, Ralf Baechle, Benjamin Herrenschmidt, Martin Schwidefsky, David Miller On Wed, Apr 19, 2017 at 01:13:41PM -0400, Steven Rostedt wrote: > On Thu, 21 Apr 2016 13:48:42 +0200 > Petr Mladek <pmladek@suse.com> wrote: > > > printk() takes some locks and could not be used a safe way in NMI context. > > I just found a problem with this solution. It kills ftrace dumps from > NMI context :-( > > [ 1295.168495] <...>-67423 10dNh1 382171111us : do_raw_spin_lock <-_raw_spin_lock > [ 1295.168495] <...>-67423 10dNh1 382171111us : sched_stat_runtime: comm=cc1 pid=67423 runtime=96858 [ns] vruntime=11924198270 [ns] > [ 1295.168496] <...>-67423 10dNh1 382171111us : lock_acquire: ffffffff81c5c940 read rcu_read_lock > [ 1295.168497] > [ 1295.168498] Lost 4890096 message(s)! > [ 1296.805063] ---[ end Kernel panic - not syncing: Hard LOCKUP > [ 1296.811553] unchecked MSR access error: WRMSR to 0x83f (tried to write 0x00000000000000f6) at rIP: 0xffffffff81046fc7 (native_apic_msr_write+0x27/0x40) > [ 1296.811553] Call Trace: > [ 1296.811553] <NMI> > > I was hoping to see a cause of a hard lockup by enabling > ftrace_dump_on_oops. But as NMIs now have a very small buffer that > gets flushed, we need to find a new way to print out the full ftrace > buffer over serial. > > Thoughts? early_printk ;-) ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI 2017-04-19 17:13 ` Steven Rostedt 2017-04-19 17:21 ` Peter Zijlstra @ 2017-04-20 3:31 ` Sergey Senozhatsky 2017-04-20 13:11 ` Petr Mladek 1 sibling, 1 reply; 32+ messages in thread From: Sergey Senozhatsky @ 2017-04-20 3:31 UTC (permalink / raw) To: Steven Rostedt Cc: Petr Mladek, Andrew Morton, Peter Zijlstra, Russell King, Daniel Thompson, Jiri Kosina, Ingo Molnar, Thomas Gleixner, Sergey Senozhatsky, Chris Metcalf, linux-kernel, x86, linux-arm-kernel, adi-buildroot-devel, linux-cris-kernel, linux-mips, linuxppc-dev, linux-s390, linux-sh, sparclinux, Jan Kara, Ralf Baechle, Benjamin Herrenschmidt, Martin Schwidefsky, David Miller Hello Steven, On (04/19/17 13:13), Steven Rostedt wrote: > > printk() takes some locks and could not be used a safe way in NMI context. > > I just found a problem with this solution. It kills ftrace dumps from > NMI context :-( > > [ 1295.168495] <...>-67423 10dNh1 382171111us : do_raw_spin_lock <-_raw_spin_lock > [ 1295.168495] <...>-67423 10dNh1 382171111us : sched_stat_runtime: comm=cc1 pid=67423 runtime=96858 [ns] vruntime=11924198270 [ns] > [ 1295.168496] <...>-67423 10dNh1 382171111us : lock_acquire: ffffffff81c5c940 read rcu_read_lock > [ 1295.168497] > [ 1295.168498] Lost 4890096 message(s)! > [ 1296.805063] ---[ end Kernel panic - not syncing: Hard LOCKUP > [ 1296.811553] unchecked MSR access error: WRMSR to 0x83f (tried to write 0x00000000000000f6) at rIP: 0xffffffff81046fc7 (native_apic_msr_write+0x27/0x40) > [ 1296.811553] Call Trace: > [ 1296.811553] <NMI> > > I was hoping to see a cause of a hard lockup by enabling > ftrace_dump_on_oops. But as NMIs now have a very small buffer that > gets flushed, we need to find a new way to print out the full ftrace > buffer over serial. > > Thoughts? hmmm... a really tough one. well, someone has to say this: the simplest thing is to have a bigger PRINTK_SAFE_LOG_BUF_SHIFT value :) just thinking (well, sort of) out loud. the problem is that we can't tell if we already hold any printk related locks ("printk related locks" is not even well defined term). so printk from NMI can deadlock or it can be OK, we never know. but looking and vprintk_emit() and console_unlock() it seems that we have some sort of a hint now, which is this_cpu_read(printk_context) - if we are not in printk_safe context then we can say that _probably_ (and that's a Russian roulette) doing "normal" printk() will work. that is a *very-very* risky (and admittedly dumb) thing to assume, so we will move in a slightly different direction. checking this_cpu_read(printk_context) only assures us that we don't hold `logbuf_lock' on this CPU. and that is sort of something, at least we can be sure that doing printk_deferred() from this CPU is safe. printk_deferred() means that your NMI messages will end up in the logbuf, which is a) bigger in size than per-CPU buffer and b) some other CPU can immediately print those messages (hopefully). we also switch to printk_safe mode for call_console_drivers() in console_unlock(). but we can't make any solid assumptions there - serial console lock can already be acquired, we don't have any markers for that. it may be reasonable to assume that if we are not in printk_safe mode on this CPU then serial console is not locked from this CPU, but there is nothing that can assure us. so at the moment what I can think of is something like -- check this_cpu_read(printk_context) in NMI prink -- if we are NOT in printk_safe on this CPU, then do printk_deferred() and bypass `nmi_print_seq' buffer -- if we are in printk_safe -- well... bad luck... have a bigger buffer. .... or there are some other options here, but I'd prefer not to list them, because people will hate me :) so this shifts the purpose of `nmi_print_seq' buffer: use it only when you can't use logbuf. otherwise, do printk_deferred(). need to think more. -ss ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI 2017-04-20 3:31 ` Sergey Senozhatsky @ 2017-04-20 13:11 ` Petr Mladek 2017-04-21 1:57 ` Sergey Senozhatsky ` (2 more replies) 0 siblings, 3 replies; 32+ messages in thread From: Petr Mladek @ 2017-04-20 13:11 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Steven Rostedt, Andrew Morton, Peter Zijlstra, Russell King, Daniel Thompson, Jiri Kosina, Ingo Molnar, Thomas Gleixner, Chris Metcalf, linux-kernel, x86, linux-arm-kernel, adi-buildroot-devel, linux-cris-kernel, linux-mips, linuxppc-dev, linux-s390, linux-sh, sparclinux, Jan Kara, Ralf Baechle, Benjamin Herrenschmidt, Martin Schwidefsky, David Miller On Thu 2017-04-20 12:31:12, Sergey Senozhatsky wrote: > Hello Steven, > > On (04/19/17 13:13), Steven Rostedt wrote: > > > printk() takes some locks and could not be used a safe way in NMI context. > > > > I just found a problem with this solution. It kills ftrace dumps from > > NMI context :-( > > > > [ 1295.168495] <...>-67423 10dNh1 382171111us : do_raw_spin_lock <-_raw_spin_lock > > [ 1295.168495] <...>-67423 10dNh1 382171111us : sched_stat_runtime: comm=cc1 pid=67423 runtime=96858 [ns] vruntime=11924198270 [ns] > > [ 1295.168496] <...>-67423 10dNh1 382171111us : lock_acquire: ffffffff81c5c940 read rcu_read_lock > > [ 1295.168497] > > [ 1295.168498] Lost 4890096 message(s)! > > [ 1296.805063] ---[ end Kernel panic - not syncing: Hard LOCKUP > > [ 1296.811553] unchecked MSR access error: WRMSR to 0x83f (tried to write 0x00000000000000f6) at rIP: 0xffffffff81046fc7 (native_apic_msr_write+0x27/0x40) > > [ 1296.811553] Call Trace: > > [ 1296.811553] <NMI> > > > > I was hoping to see a cause of a hard lockup by enabling > > ftrace_dump_on_oops. But as NMIs now have a very small buffer that > > gets flushed, we need to find a new way to print out the full ftrace > > buffer over serial. > > > > Thoughts? > > hmmm... a really tough one. > > well, someone has to say this: > the simplest thing is to have a bigger PRINTK_SAFE_LOG_BUF_SHIFT value :) > > > just thinking (well, sort of) out loud. the problem is that we can't tell if > we already hold any printk related locks ("printk related locks" is not even > well defined term). so printk from NMI can deadlock or it can be OK, we > never know. but looking and vprintk_emit() and console_unlock() it seems that > we have some sort of a hint now, which is this_cpu_read(printk_context) - if > we are not in printk_safe context then we can say that _probably_ (and that's > a Russian roulette) doing "normal" printk() will work. that is a *very-very* > risky (and admittedly dumb) thing to assume, so we will move in a slightly > different direction. checking this_cpu_read(printk_context) only assures us > that we don't hold `logbuf_lock' on this CPU. and that is sort of something, > at least we can be sure that doing printk_deferred() from this CPU is safe. > printk_deferred() means that your NMI messages will end up in the logbuf, > which is a) bigger in size than per-CPU buffer and b) some other CPU can > immediately print those messages (hopefully). > > we also switch to printk_safe mode for call_console_drivers() in > console_unlock(). but we can't make any solid assumptions there - serial > console lock can already be acquired, we don't have any markers for that. > it may be reasonable to assume that if we are not in printk_safe mode on > this CPU then serial console is not locked from this CPU, but there is > nothing that can assure us. Good analyze. I would summarize it that we need to be careful of: + logbug_lock + PRINTK_SAFE_CONTEXT + locks used by console drivers The first two things are easy to check. Except that a check for logbuf_lock might produce false negatives. The last check is very hard. > so at the moment what I can think of is something like > > -- check this_cpu_read(printk_context) in NMI prink > > -- if we are NOT in printk_safe on this CPU, then do printk_deferred() > and bypass `nmi_print_seq' buffer I would add also a check for logbuf_lock. > -- if we are in printk_safe > -- well... bad luck... have a bigger buffer. Yup, we do the best effort while still trying to stay on the safe side. I have cooked up a patch based on this. It uses printk_deferred() in NMI when it is safe. Note that console_flush_on_panic() will try to get them on the console when a kdump is not generated. I believe that it will help Steven. >From c530d9dee91c74db5e6a198479e2e63b24cb84a2 Mon Sep 17 00:00:00 2001 From: Petr Mladek <pmladek@suse.com> Date: Thu, 20 Apr 2017 10:52:31 +0200 Subject: [PATCH] printk: Use the main logbuf in NMI when logbuf_lock is available The commit 42a0bb3f71383b457a7d ("printk/nmi: generic solution for safe printk in NMI") caused that printk stores messages into a temporary buffer in NMI context. The buffer is per-CPU and therefore the size is rather limited. It works quite well for NMI backtraces. But there are longer logs that might get printed in NMI context, for example, lockdep warnings, ftrace_dump_on_oops. The temporary buffer is used to avoid deadlocks caused by logbuf_lock. Also it is needed to avoid races with the other temporary buffer that is used when PRINTK_SAFE_CONTEXT is entered. But the main buffer can be used in NMI if the lock is available and we did not interrupt PRINTK_SAFE_CONTEXT. The lock is checked using raw_spin_is_locked(). It might cause false negatives when the lock is taken on another CPU outside NMI. For this reason, we do the check in printk_nmi_enter(). It makes the handling consistent for the entire NMI handler and avoids reshuffling of the messages. The patch also defines special printk context that allows to use printk_deferred() in NMI. Note that we could not flush the messages to the consoles because console drivers might use many other internal locks. The newly created vprintk_deferred() disables the preemption only around the irq work handling. It is needed there to keep the consistency between the two per-CPU variables. But there is no reason to disable preemption around vprintk_emit(). Finally, the patch patch puts back explicit serialization of the NMI backtraces from different CPUs. It was removed by the commit a9edc88093287183ac934b ("x86/nmi: Perform a safe NMI stack trace on all CPUs"). It was not needed because the flushing of the temporary per-CPU buffers was serialized. Suggested-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: Petr Mladek <pmladek@suse.com> --- kernel/printk/internal.h | 6 ++++-- kernel/printk/printk.c | 19 ++++++++++++++----- kernel/printk/printk_safe.c | 25 +++++++++++++++++++++++-- lib/nmi_backtrace.c | 3 +++ 4 files changed, 44 insertions(+), 9 deletions(-) diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h index 1db044f808b7..2a7d04049af4 100644 --- a/kernel/printk/internal.h +++ b/kernel/printk/internal.h @@ -18,12 +18,14 @@ #ifdef CONFIG_PRINTK -#define PRINTK_SAFE_CONTEXT_MASK 0x7fffffff -#define PRINTK_NMI_CONTEXT_MASK 0x80000000 +#define PRINTK_SAFE_CONTEXT_MASK 0x3fffffff +#define PRINTK_NMI_DEFERRED_CONTEXT_MASK 0x40000000 +#define PRINTK_NMI_CONTEXT_MASK 0x80000000 extern raw_spinlock_t logbuf_lock; __printf(1, 0) int vprintk_default(const char *fmt, va_list args); +__printf(1, 0) int vprintk_deferred(const char *fmt, va_list args); __printf(1, 0) int vprintk_func(const char *fmt, va_list args); void __printk_safe_enter(void); void __printk_safe_exit(void); diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index 2984fb0f0257..16b519927d35 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -2691,16 +2691,13 @@ void wake_up_klogd(void) preempt_enable(); } -int printk_deferred(const char *fmt, ...) +int vprintk_deferred(const char *fmt, va_list args) { - va_list args; int r; - preempt_disable(); - va_start(args, fmt); r = vprintk_emit(0, LOGLEVEL_SCHED, NULL, 0, fmt, args); - va_end(args); + preempt_disable(); __this_cpu_or(printk_pending, PRINTK_PENDING_OUTPUT); irq_work_queue(this_cpu_ptr(&wake_up_klogd_work)); preempt_enable(); @@ -2708,6 +2705,18 @@ int printk_deferred(const char *fmt, ...) return r; } +int printk_deferred(const char *fmt, ...) +{ + va_list args; + int r; + + va_start(args, fmt); + r = vprintk_deferred(fmt, args); + va_end(args); + + return r; +} + /* * printk rate limiting, lifted from the networking subsystem. * diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c index 033e50a7d706..c3d165bcde42 100644 --- a/kernel/printk/printk_safe.c +++ b/kernel/printk/printk_safe.c @@ -308,12 +308,23 @@ static int vprintk_nmi(const char *fmt, va_list args) void printk_nmi_enter(void) { - this_cpu_or(printk_context, PRINTK_NMI_CONTEXT_MASK); + /* + * The size of the extra per-CPU buffer is limited. Use it + * only when really needed. + */ + if (this_cpu_read(printk_context) & PRINTK_SAFE_CONTEXT_MASK || + raw_spin_is_locked(&logbuf_lock)) { + this_cpu_or(printk_context, PRINTK_NMI_CONTEXT_MASK); + } else { + this_cpu_or(printk_context, PRINTK_NMI_DEFERRED_CONTEXT_MASK); + } } void printk_nmi_exit(void) { - this_cpu_and(printk_context, ~PRINTK_NMI_CONTEXT_MASK); + this_cpu_and(printk_context, + ~(PRINTK_NMI_CONTEXT_MASK || + PRINTK_NMI_DEFERRED_CONTEXT_MASK)); } #else @@ -351,12 +362,22 @@ void __printk_safe_exit(void) __printf(1, 0) int vprintk_func(const char *fmt, va_list args) { + /* Use extra buffer in NMI when logbuf_lock is taken or in safe mode. */ if (this_cpu_read(printk_context) & PRINTK_NMI_CONTEXT_MASK) return vprintk_nmi(fmt, args); + /* Use extra buffer to prevent a recursion deadlock in safe mode. */ if (this_cpu_read(printk_context) & PRINTK_SAFE_CONTEXT_MASK) return vprintk_safe(fmt, args); + /* + * Use the main logbuf when logbuf_lock is available in NMI. + * But avoid calling console drivers that might have their own locks. + */ + if (this_cpu_read(printk_context) & PRINTK_NMI_DEFERRED_CONTEXT_MASK) + return vprintk_deferred(fmt, args); + + /* No obstacles. */ return vprintk_default(fmt, args); } diff --git a/lib/nmi_backtrace.c b/lib/nmi_backtrace.c index 4e8a30d1c22f..0bc0a3535a8a 100644 --- a/lib/nmi_backtrace.c +++ b/lib/nmi_backtrace.c @@ -86,9 +86,11 @@ void nmi_trigger_cpumask_backtrace(const cpumask_t *mask, bool nmi_cpu_backtrace(struct pt_regs *regs) { + static arch_spinlock_t lock = __ARCH_SPIN_LOCK_UNLOCKED; int cpu = smp_processor_id(); if (cpumask_test_cpu(cpu, to_cpumask(backtrace_mask))) { + arch_spin_lock(&lock); if (regs && cpu_in_idle(instruction_pointer(regs))) { pr_warn("NMI backtrace for cpu %d skipped: idling at pc %#lx\n", cpu, instruction_pointer(regs)); @@ -99,6 +101,7 @@ bool nmi_cpu_backtrace(struct pt_regs *regs) else dump_stack(); } + arch_spin_unlock(&lock); cpumask_clear_cpu(cpu, to_cpumask(backtrace_mask)); return true; } -- 1.8.5.6 ^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI 2017-04-20 13:11 ` Petr Mladek @ 2017-04-21 1:57 ` Sergey Senozhatsky 2017-04-21 12:06 ` Petr Mladek 2017-04-27 16:14 ` Steven Rostedt 2017-04-28 1:25 ` Sergey Senozhatsky 2 siblings, 1 reply; 32+ messages in thread From: Sergey Senozhatsky @ 2017-04-21 1:57 UTC (permalink / raw) To: Petr Mladek Cc: Sergey Senozhatsky, Steven Rostedt, Andrew Morton, Peter Zijlstra, Russell King, Daniel Thompson, Jiri Kosina, Ingo Molnar, Thomas Gleixner, Chris Metcalf, linux-kernel, x86, linux-arm-kernel, adi-buildroot-devel, linux-cris-kernel, linux-mips, linuxppc-dev, linux-s390, linux-sh, sparclinux, Jan Kara, Ralf Baechle, Benjamin Herrenschmidt, Martin Schwidefsky, David Miller Hello, On (04/20/17 15:11), Petr Mladek wrote: [..] > Good analyze. I would summarize it that we need to be careful of: > > + logbug_lock > + PRINTK_SAFE_CONTEXT > + locks used by console drivers > > The first two things are easy to check. Except that a check for logbuf_lock > might produce false negatives. The last check is very hard. > > > so at the moment what I can think of is something like > > > > -- check this_cpu_read(printk_context) in NMI prink > > > > -- if we are NOT in printk_safe on this CPU, then do printk_deferred() > > and bypass `nmi_print_seq' buffer > > I would add also a check for logbuf_lock. > > -- if we are in printk_safe > > -- well... bad luck... have a bigger buffer. > > Yup, we do the best effort while still trying to stay on the safe > side. > > I have cooked up a patch based on this. It uses printk_deferred() > in NMI when it is safe. Note that console_flush_on_panic() will > try to get them on the console when a kdump is not generated. > I believe that it will help Steven. OK. I need to look more at the patch. It does more than I'd expected/imagined. [..] > void printk_nmi_enter(void) > { > - this_cpu_or(printk_context, PRINTK_NMI_CONTEXT_MASK); > + /* > + * The size of the extra per-CPU buffer is limited. Use it > + * only when really needed. > + */ > + if (this_cpu_read(printk_context) & PRINTK_SAFE_CONTEXT_MASK || > + raw_spin_is_locked(&logbuf_lock)) { > + this_cpu_or(printk_context, PRINTK_NMI_CONTEXT_MASK); > + } else { > + this_cpu_or(printk_context, PRINTK_NMI_DEFERRED_CONTEXT_MASK); > + } > } well... the logbuf_lock can temporarily be locked from another CPU. I'd say that spin_is_locked() has better chances for false positive than this_cpu_read(printk_context). because this_cpu_read(printk_context) depends only on this CPU state, while spin_is_locked() depends on all CPUs. and the idea with this_cpu_read(printk_context) was that we check if the logbuf_lock was locked from this particular CPU. I agree that this_cpu_read(printk_context) covers slightly more than logbuf_lock scope, so we may get positive this_cpu_read(printk_context) with unlocked logbuf_lock, but I don't tend to think that it's a big problem. wouldn't something as simple as below do the trick? // absolutely and completely untested // diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c index 033e50a7d706..c7477654c5b1 100644 --- a/kernel/printk/printk_safe.c +++ b/kernel/printk/printk_safe.c @@ -303,7 +303,10 @@ static int vprintk_nmi(const char *fmt, va_list args) { struct printk_safe_seq_buf *s = this_cpu_ptr(&nmi_print_seq); - return printk_safe_log_store(s, fmt, args); + if (this_cpu_read(printk_context) & PRINTK_SAFE_CONTEXT_MASK) + return printk_safe_log_store(s, fmt, args); + + return vprintk_emit(0, LOGLEVEL_SCHED, NULL, 0, fmt, args); } -ss ^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI 2017-04-21 1:57 ` Sergey Senozhatsky @ 2017-04-21 12:06 ` Petr Mladek 2017-04-24 2:17 ` Sergey Senozhatsky 0 siblings, 1 reply; 32+ messages in thread From: Petr Mladek @ 2017-04-21 12:06 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Steven Rostedt, Andrew Morton, Peter Zijlstra, Russell King, Daniel Thompson, Jiri Kosina, Ingo Molnar, Thomas Gleixner, Chris Metcalf, linux-kernel, x86, linux-arm-kernel, adi-buildroot-devel, linux-cris-kernel, linux-mips, linuxppc-dev, linux-s390, linux-sh, sparclinux, Jan Kara, Ralf Baechle, Benjamin Herrenschmidt, Martin Schwidefsky, David Miller On Fri 2017-04-21 10:57:25, Sergey Senozhatsky wrote: > On (04/20/17 15:11), Petr Mladek wrote: > [..] > > void printk_nmi_enter(void) > > { > > - this_cpu_or(printk_context, PRINTK_NMI_CONTEXT_MASK); > > + /* > > + * The size of the extra per-CPU buffer is limited. Use it > > + * only when really needed. > > + */ > > + if (this_cpu_read(printk_context) & PRINTK_SAFE_CONTEXT_MASK || > > + raw_spin_is_locked(&logbuf_lock)) { > > + this_cpu_or(printk_context, PRINTK_NMI_CONTEXT_MASK); > > + } else { > > + this_cpu_or(printk_context, PRINTK_NMI_DEFERRED_CONTEXT_MASK); > > + } > > } > > well... the logbuf_lock can temporarily be locked from another CPU. I'd say > that spin_is_locked() has better chances for false positive than > this_cpu_read(printk_context). because this_cpu_read(printk_context) depends > only on this CPU state, while spin_is_locked() depends on all CPUs. and the > idea with this_cpu_read(printk_context) was that we check if the logbuf_lock > was locked from this particular CPU. I finally see the point. I confess that I did not think much about this way yesterday because it looked too tricky (indirect) and possibly error prone. > I agree that this_cpu_read(printk_context) covers slightly more than > logbuf_lock scope, so we may get positive this_cpu_read(printk_context) > with unlocked logbuf_lock, but I don't tend to think that it's a big > problem. PRINTK_SAFE_CONTEXT is set also in call_console_drivers(). It might take rather long and logbuf_lock is availe. So, it is noticeable source of false positives. Hmm, my version actually checks both the lock and the context. It is very deffensive to be always on the safe side. We could get the best restults with both checks and by using "&&": void printk_nmi_enter(void) { /* * The size of the extra per-CPU buffer is limited. Use it * only when the lock for the main log buffer is not * available. * * logbuf_lock might be taken on another CPU. But it must be * in PRINTK_SAFE context. Reduce false positives by a check * of the context. */ if (this_cpu_read(printk_context) & PRINTK_SAFE_CONTEXT_MASK || raw_spin_is_locked(&logbuf_lock)) { this_cpu_or(printk_context, PRINTK_NMI_CONTEXT_MASK); } else { this_cpu_or(printk_context, PRINTK_NMI_DEFERRED_CONTEXT_MASK); } } But after all, I would prefer to keep it simple, strightforward, and check only the logbuf_lock: if (raw_spin_is_locked(&logbuf_lock)) this_cpu_or(printk_context, PRINTK_NMI_CONTEXT_MASK); else this_cpu_or(printk_context, PRINTK_NMI_DEFERRED_CONTEXT_MASK); > wouldn't something as simple as below do the trick? > // absolutely and completely untested // > > > diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c > index 033e50a7d706..c7477654c5b1 100644 > --- a/kernel/printk/printk_safe.c > +++ b/kernel/printk/printk_safe.c > @@ -303,7 +303,10 @@ static int vprintk_nmi(const char *fmt, va_list args) > { > struct printk_safe_seq_buf *s = this_cpu_ptr(&nmi_print_seq); > > - return printk_safe_log_store(s, fmt, args); > + if (this_cpu_read(printk_context) & PRINTK_SAFE_CONTEXT_MASK) > + return printk_safe_log_store(s, fmt, args); > + > + return vprintk_emit(0, LOGLEVEL_SCHED, NULL, 0, fmt, args); > } It looks simple but some things are missing. It will be used also outside panic/oops, so it should queue the irq_work to flush the console. Also the serialization of nmi_cpu_backtrace() backtrace calls is needed. All in all, we could get rid only of the extra PRINTK_NMI_DEFERRED_CONTEXT_MASK with this approach. Other than that it looks more tricky to me. Sigh, I hate problems without the single best solution. Best Regards, Petr ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI 2017-04-21 12:06 ` Petr Mladek @ 2017-04-24 2:17 ` Sergey Senozhatsky 2017-04-27 13:38 ` Petr Mladek 0 siblings, 1 reply; 32+ messages in thread From: Sergey Senozhatsky @ 2017-04-24 2:17 UTC (permalink / raw) To: Petr Mladek Cc: Sergey Senozhatsky, Steven Rostedt, Andrew Morton, Peter Zijlstra, Russell King, Daniel Thompson, Jiri Kosina, Ingo Molnar, Thomas Gleixner, Chris Metcalf, linux-kernel, x86, linux-arm-kernel, adi-buildroot-devel, linux-cris-kernel, linux-mips, linuxppc-dev, linux-s390, linux-sh, sparclinux, Jan Kara, Ralf Baechle, Benjamin Herrenschmidt, Martin Schwidefsky, David Miller On (04/21/17 14:06), Petr Mladek wrote: [..] > > I agree that this_cpu_read(printk_context) covers slightly more than > > logbuf_lock scope, so we may get positive this_cpu_read(printk_context) > > with unlocked logbuf_lock, but I don't tend to think that it's a big > > problem. > > PRINTK_SAFE_CONTEXT is set also in call_console_drivers(). > It might take rather long and logbuf_lock is availe. So, it is > noticeable source of false positives. yes, agree. probably we need additional printk_safe annotations for "logbuf_lock is locked from _this_ CPU" false positives there can be very painful. [..] > if (raw_spin_is_locked(&logbuf_lock)) > this_cpu_or(printk_context, PRINTK_NMI_CONTEXT_MASK); > else > this_cpu_or(printk_context, PRINTK_NMI_DEFERRED_CONTEXT_MASK); well, if everyone is fine with logbuf_lock access from every CPU from every NMI then I won't object either. but may be it makes sense to reduce the possibility of false positives. Steven is loosing critically important logs, after all. by the way, does this `nmi_print_seq' bypass even fix anything for Steven? it sort of can, in theory, but just in theory. so may be we need direct message flush from NMI handler (printk->console_unlock), which will be a really big problem. logbuf might not be big enough for 4890096 messages (Steven's report mentions "Lost 4890096 message(s)!"). we are counting on the fact that in case of `nmi_print_seq' bypass some other CPU will call console_unlock() and print pending logbuf messages, but this is not guaranteed and the messages can be dropped even from logbuf. I don't know, should we try to queue printk_deferred irq_work for all online CPUs from vprintk_nmi() when it bypasses printk_safe_log_store()? in order to minimize possibilities of logbuf overflow. printk_deferred() will queue work on vprintk_nmi() CPU, sure, but we don't know how many messages we are going to add to logbuf from NMI. > > @@ -303,7 +303,10 @@ static int vprintk_nmi(const char *fmt, va_list args) > > { > > struct printk_safe_seq_buf *s = this_cpu_ptr(&nmi_print_seq); > > > > - return printk_safe_log_store(s, fmt, args); > > + if (this_cpu_read(printk_context) & PRINTK_SAFE_CONTEXT_MASK) > > + return printk_safe_log_store(s, fmt, args); > > + > > + return vprintk_emit(0, LOGLEVEL_SCHED, NULL, 0, fmt, args); > > } > > It looks simple but some things are missing. It will be used also > outside panic/oops, so it should queue the irq_work to flush the console. you are right. I thought about moving irq_work to vprintk_emit(), but completely forgot about it. without that missing bit the proposed two-liner is not complete. -ss ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI 2017-04-24 2:17 ` Sergey Senozhatsky @ 2017-04-27 13:38 ` Petr Mladek 2017-04-27 14:31 ` Steven Rostedt 2017-04-28 9:02 ` Peter Zijlstra 0 siblings, 2 replies; 32+ messages in thread From: Petr Mladek @ 2017-04-27 13:38 UTC (permalink / raw) To: Sergey Senozhatsky, Steven Rostedt Cc: Andrew Morton, Peter Zijlstra, Russell King, Daniel Thompson, Jiri Kosina, Ingo Molnar, Thomas Gleixner, Chris Metcalf, linux-kernel, x86, linux-arm-kernel, adi-buildroot-devel, linux-cris-kernel, linux-mips, linuxppc-dev, linux-s390, linux-sh, sparclinux, Jan Kara, Ralf Baechle, Benjamin Herrenschmidt, Martin Schwidefsky, David Miller On Mon 2017-04-24 11:17:47, Sergey Senozhatsky wrote: > On (04/21/17 14:06), Petr Mladek wrote: > [..] > > > I agree that this_cpu_read(printk_context) covers slightly more than > > > logbuf_lock scope, so we may get positive this_cpu_read(printk_context) > > > with unlocked logbuf_lock, but I don't tend to think that it's a big > > > problem. > > > > PRINTK_SAFE_CONTEXT is set also in call_console_drivers(). > > It might take rather long and logbuf_lock is availe. So, it is > > noticeable source of false positives. > > yes, agree. > > probably we need additional printk_safe annotations for > "logbuf_lock is locked from _this_ CPU" > > false positives there can be very painful. > > [..] > > if (raw_spin_is_locked(&logbuf_lock)) > > this_cpu_or(printk_context, PRINTK_NMI_CONTEXT_MASK); > > else > > this_cpu_or(printk_context, PRINTK_NMI_DEFERRED_CONTEXT_MASK); > > well, if everyone is fine with logbuf_lock access from every CPU from every > NMI then I won't object either. but may be it makes sense to reduce the > possibility of false positives. Steven is loosing critically important logs, > after all. > > > by the way, > does this `nmi_print_seq' bypass even fix anything for Steven? I think that this is the most important question. Steven, does the patch from https://lkml.kernel.org/r/20170420131154.GL3452@pathway.suse.cz help you to see the debug messages, please? > it sort of > can, in theory, but just in theory. so may be we need direct message flush > from NMI handler (printk->console_unlock), which will be a really big problem. I thought about it a lot and got scared where this might go. We need to balance the usefulness and the complexity of the solution. It took one year to discover this regression. Before it was suggested to avoid calling printk() in NMI context at all. Now, we are trying to fix printk() to handle MBs of messages in NMI context. If my proposed patch solves the problem for Steven, I would still like to get similar solution in. It is not that complex and helps to bypass the limited per-CPU buffer in most cases. I always thought that 8kB might be not enough in some cases. Note that my patch is very defensive. It uses the main log buffer only when it is really safe. It has higher potential for unneeded fallback but if it works for Steven (really existing usecase), ... On the other hand, I would prefer to avoid any much more complex solution until we have a real reports that they are needed. Also we need to look for alternatives. There is a chance to create crashdump and get the ftrace messages from it. Also this might be scenario when we might need to suggest the early_printk() patchset from Peter Zijlstra. > logbuf might not be big enough for 4890096 messages (Steven's report > mentions "Lost 4890096 message(s)!"). we are counting on the fact that > in case of `nmi_print_seq' bypass some other CPU will call console_unlock() > and print pending logbuf messages, but this is not guaranteed and the > messages can be dropped even from logbuf. Yup. I tested the patch here and I needed to increase the main log buffer size to see all ftrace messages. Fortunately, it was possible to use a really huge global buffer. But it is not realistic to use huge per-CPU ones. Best Regards, Petr ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI 2017-04-27 13:38 ` Petr Mladek @ 2017-04-27 14:31 ` Steven Rostedt 2017-04-27 15:28 ` Petr Mladek 2017-04-28 9:02 ` Peter Zijlstra 1 sibling, 1 reply; 32+ messages in thread From: Steven Rostedt @ 2017-04-27 14:31 UTC (permalink / raw) To: Petr Mladek Cc: Sergey Senozhatsky, Andrew Morton, Peter Zijlstra, Russell King, Daniel Thompson, Jiri Kosina, Ingo Molnar, Thomas Gleixner, Chris Metcalf, linux-kernel, x86, linux-arm-kernel, adi-buildroot-devel, linux-cris-kernel, linux-mips, linuxppc-dev, linux-s390, linux-sh, sparclinux, Jan Kara, Ralf Baechle, Benjamin Herrenschmidt, Martin Schwidefsky, David Miller On Thu, 27 Apr 2017 15:38:19 +0200 Petr Mladek <pmladek@suse.com> wrote: > > by the way, > > does this `nmi_print_seq' bypass even fix anything for Steven? > > I think that this is the most important question. > > Steven, does the patch from > https://lkml.kernel.org/r/20170420131154.GL3452@pathway.suse.cz > help you to see the debug messages, please? You'll have to wait for a bit. The box that I was debugging takes 45 minutes to reboot. And I don't have much more time to play on it before I have to give it back. I already found the bug I was looking for and I'm trying not to crash it again (due to the huge bring up time). When I get a chance, I'll see if I can insert a trigger to crash the kernel from NMI on another box and see if this patch helps. Thanks, -- Steve ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI 2017-04-27 14:31 ` Steven Rostedt @ 2017-04-27 15:28 ` Petr Mladek 2017-04-27 15:42 ` Steven Rostedt 0 siblings, 1 reply; 32+ messages in thread From: Petr Mladek @ 2017-04-27 15:28 UTC (permalink / raw) To: Steven Rostedt Cc: Sergey Senozhatsky, Andrew Morton, Peter Zijlstra, Russell King, Daniel Thompson, Jiri Kosina, Ingo Molnar, Thomas Gleixner, Chris Metcalf, linux-kernel, x86, linux-arm-kernel, adi-buildroot-devel, linux-cris-kernel, linux-mips, linuxppc-dev, linux-s390, linux-sh, sparclinux, Jan Kara, Ralf Baechle, Benjamin Herrenschmidt, Martin Schwidefsky, David Miller On Thu 2017-04-27 10:31:18, Steven Rostedt wrote: > On Thu, 27 Apr 2017 15:38:19 +0200 > Petr Mladek <pmladek@suse.com> wrote: > > > > by the way, > > > does this `nmi_print_seq' bypass even fix anything for Steven? > > > > I think that this is the most important question. > > > > Steven, does the patch from > > https://lkml.kernel.org/r/20170420131154.GL3452@pathway.suse.cz > > help you to see the debug messages, please? > > You'll have to wait for a bit. The box that I was debugging takes 45 > minutes to reboot. And I don't have much more time to play on it before > I have to give it back. I already found the bug I was looking for and > I'm trying not to crash it again (due to the huge bring up time). I see. > When I get a chance, I'll see if I can insert a trigger to crash the > kernel from NMI on another box and see if this patch helps. I actually tested it here using this hack: diff --cc lib/nmi_backtrace.c index d531f85c0c9b,0bc0a3535a8a..000000000000 --- a/lib/nmi_backtrace.c +++ b/lib/nmi_backtrace.c @@@ -89,8 -90,7 +90,9 @@@ bool nmi_cpu_backtrace(struct pt_regs * int cpu = smp_processor_id(); if (cpumask_test_cpu(cpu, to_cpumask(backtrace_mask))) { + if (in_nmi()) + panic("Simulating panic in NMI\n"); + arch_spin_lock(&lock); if (regs && cpu_in_idle(instruction_pointer(regs))) { pr_warn("NMI backtrace for cpu %d skipped: idling at pc %#lx\n", cpu, instruction_pointer(regs)); and triggered by: echo l > /proc/sysrq-trigger The patch really helped to see much more (all) messages from the ftrace buffers in NMI mode. But the test is a bit artifical. The patch might not help when there is a big printk() activity on the system when the panic() is triggered. We might wrongly use the small per-CPU buffer when the logbuf_lock is tested and taken on another CPU at the same time. It means that it will not always help. I personally think that the patch might be good enough. I am not sure if a perfect (more comlpex) solution is worth it. Best Regards, Petr ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI 2017-04-27 15:28 ` Petr Mladek @ 2017-04-27 15:42 ` Steven Rostedt 0 siblings, 0 replies; 32+ messages in thread From: Steven Rostedt @ 2017-04-27 15:42 UTC (permalink / raw) To: Petr Mladek Cc: Sergey Senozhatsky, Andrew Morton, Peter Zijlstra, Russell King, Daniel Thompson, Jiri Kosina, Ingo Molnar, Thomas Gleixner, Chris Metcalf, linux-kernel, x86, linux-arm-kernel, adi-buildroot-devel, linux-cris-kernel, linux-mips, linuxppc-dev, linux-s390, linux-sh, sparclinux, Jan Kara, Ralf Baechle, Benjamin Herrenschmidt, Martin Schwidefsky, David Miller On Thu, 27 Apr 2017 17:28:07 +0200 Petr Mladek <pmladek@suse.com> wrote: > > When I get a chance, I'll see if I can insert a trigger to crash the > > kernel from NMI on another box and see if this patch helps. > > I actually tested it here using this hack: > > diff --cc lib/nmi_backtrace.c > index d531f85c0c9b,0bc0a3535a8a..000000000000 > --- a/lib/nmi_backtrace.c > +++ b/lib/nmi_backtrace.c > @@@ -89,8 -90,7 +90,9 @@@ bool nmi_cpu_backtrace(struct pt_regs * > int cpu = smp_processor_id(); > > if (cpumask_test_cpu(cpu, to_cpumask(backtrace_mask))) { > + if (in_nmi()) > + panic("Simulating panic in NMI\n"); > + arch_spin_lock(&lock); I was going to create a ftrace trigger, to crash on demand, but this may do as well. > if (regs && cpu_in_idle(instruction_pointer(regs))) { > pr_warn("NMI backtrace for cpu %d skipped: idling at pc %#lx\n", > cpu, instruction_pointer(regs)); > > and triggered by: > > echo l > /proc/sysrq-trigger > > The patch really helped to see much more (all) messages from the ftrace > buffers in NMI mode. > > But the test is a bit artifical. The patch might not help when there > is a big printk() activity on the system when the panic() is > triggered. We might wrongly use the small per-CPU buffer when > the logbuf_lock is tested and taken on another CPU at the same time. > It means that it will not always help. > > I personally think that the patch might be good enough. I am not sure > if a perfect (more comlpex) solution is worth it. I wasn't asking for perfect, as the previous solutions never were either. I just want an optimistic dump if possible. I'll try to get some time today to test this, and let you know. But it wont be on the machine that I originally had the issue with. Thanks, -- Steve ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI 2017-04-27 13:38 ` Petr Mladek 2017-04-27 14:31 ` Steven Rostedt @ 2017-04-28 9:02 ` Peter Zijlstra 2017-04-28 13:44 ` Petr Mladek 1 sibling, 1 reply; 32+ messages in thread From: Peter Zijlstra @ 2017-04-28 9:02 UTC (permalink / raw) To: Petr Mladek Cc: Sergey Senozhatsky, Steven Rostedt, Andrew Morton, Russell King, Daniel Thompson, Jiri Kosina, Ingo Molnar, Thomas Gleixner, Chris Metcalf, linux-kernel, x86, linux-arm-kernel, adi-buildroot-devel, linux-cris-kernel, linux-mips, linuxppc-dev, linux-s390, linux-sh, sparclinux, Jan Kara, Ralf Baechle, Benjamin Herrenschmidt, Martin Schwidefsky, David Miller On Thu, Apr 27, 2017 at 03:38:19PM +0200, Petr Mladek wrote: > Also we need to look for alternatives. There is a chance > to create crashdump and get the ftrace messages from it. > Also this might be scenario when we might need to suggest > the early_printk() patchset from Peter Zijlstra. I'd be happy to repost those. I still carry them in my tree. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI 2017-04-28 9:02 ` Peter Zijlstra @ 2017-04-28 13:44 ` Petr Mladek 2017-04-28 13:58 ` Peter Zijlstra 0 siblings, 1 reply; 32+ messages in thread From: Petr Mladek @ 2017-04-28 13:44 UTC (permalink / raw) To: Peter Zijlstra Cc: Sergey Senozhatsky, Steven Rostedt, Andrew Morton, Russell King, Daniel Thompson, Jiri Kosina, Ingo Molnar, Thomas Gleixner, Chris Metcalf, linux-kernel, x86, linux-arm-kernel, adi-buildroot-devel, linux-cris-kernel, linux-mips, linuxppc-dev, linux-s390, linux-sh, sparclinux, Jan Kara, Ralf Baechle, Benjamin Herrenschmidt, Martin Schwidefsky, David Miller On Fri 2017-04-28 11:02:26, Peter Zijlstra wrote: > On Thu, Apr 27, 2017 at 03:38:19PM +0200, Petr Mladek wrote: > > Also we need to look for alternatives. There is a chance > > to create crashdump and get the ftrace messages from it. > > Also this might be scenario when we might need to suggest > > the early_printk() patchset from Peter Zijlstra. > > I'd be happy to repost those. I still carry them in my tree. You do not need to if they are still the same as https://lkml.kernel.org/r/20161018170830.405990950@infradead.org I rather do not promise anything but I would like to look at them within next few weeks (after the merge window). Best Regards, Petr ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI 2017-04-28 13:44 ` Petr Mladek @ 2017-04-28 13:58 ` Peter Zijlstra 2017-04-28 14:47 ` Steven Rostedt 0 siblings, 1 reply; 32+ messages in thread From: Peter Zijlstra @ 2017-04-28 13:58 UTC (permalink / raw) To: Petr Mladek Cc: Sergey Senozhatsky, Steven Rostedt, Andrew Morton, Russell King, Daniel Thompson, Jiri Kosina, Ingo Molnar, Thomas Gleixner, Chris Metcalf, linux-kernel, x86, linux-arm-kernel, adi-buildroot-devel, linux-cris-kernel, linux-mips, linuxppc-dev, linux-s390, linux-sh, sparclinux, Jan Kara, Ralf Baechle, Benjamin Herrenschmidt, Martin Schwidefsky, David Miller On Fri, Apr 28, 2017 at 03:44:23PM +0200, Petr Mladek wrote: > On Fri 2017-04-28 11:02:26, Peter Zijlstra wrote: > > On Thu, Apr 27, 2017 at 03:38:19PM +0200, Petr Mladek wrote: > > > Also we need to look for alternatives. There is a chance > > > to create crashdump and get the ftrace messages from it. > > > Also this might be scenario when we might need to suggest > > > the early_printk() patchset from Peter Zijlstra. > > > > I'd be happy to repost those. I still carry them in my tree. > > You do not need to if they are still the same as > https://lkml.kernel.org/r/20161018170830.405990950@infradead.org More or less, I think I fixed an intermediate compile fail reported by you and the 0-day thing and I wrote a few comments in the last patch. They live here fwiw: git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git debug/tmp > I rather do not promise anything but I would like to look at them > within next few weeks (after the merge window). Sure, I'll post them again during/after the merge window. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI 2017-04-28 13:58 ` Peter Zijlstra @ 2017-04-28 14:47 ` Steven Rostedt 0 siblings, 0 replies; 32+ messages in thread From: Steven Rostedt @ 2017-04-28 14:47 UTC (permalink / raw) To: Peter Zijlstra Cc: Petr Mladek, Sergey Senozhatsky, Andrew Morton, Russell King, Daniel Thompson, Jiri Kosina, Ingo Molnar, Thomas Gleixner, Chris Metcalf, linux-kernel, x86, linux-arm-kernel, adi-buildroot-devel, linux-cris-kernel, linux-mips, linuxppc-dev, linux-s390, linux-sh, sparclinux, Jan Kara, Ralf Baechle, Benjamin Herrenschmidt, Martin Schwidefsky, David Miller On Fri, 28 Apr 2017 15:58:10 +0200 Peter Zijlstra <peterz@infradead.org> wrote: > On Fri, Apr 28, 2017 at 03:44:23PM +0200, Petr Mladek wrote: > > On Fri 2017-04-28 11:02:26, Peter Zijlstra wrote: > > > On Thu, Apr 27, 2017 at 03:38:19PM +0200, Petr Mladek wrote: > > > > Also we need to look for alternatives. There is a chance > > > > to create crashdump and get the ftrace messages from it. > > > > Also this might be scenario when we might need to suggest > > > > the early_printk() patchset from Peter Zijlstra. > > > > > > I'd be happy to repost those. I still carry them in my tree. > > > > You do not need to if they are still the same as > > https://lkml.kernel.org/r/20161018170830.405990950@infradead.org > > More or less, I think I fixed an intermediate compile fail reported by > you and the 0-day thing and I wrote a few comments in the last patch. > > They live here fwiw: > > git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git debug/tmp > > > I rather do not promise anything but I would like to look at them > > within next few weeks (after the merge window). > > Sure, I'll post them again during/after the merge window. BTW, this is what I ended up using to get the debug information I needed. -- Steve ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI 2017-04-20 13:11 ` Petr Mladek 2017-04-21 1:57 ` Sergey Senozhatsky @ 2017-04-27 16:14 ` Steven Rostedt 2017-04-28 1:35 ` Sergey Senozhatsky 2017-04-28 1:25 ` Sergey Senozhatsky 2 siblings, 1 reply; 32+ messages in thread From: Steven Rostedt @ 2017-04-27 16:14 UTC (permalink / raw) To: Petr Mladek Cc: Sergey Senozhatsky, Andrew Morton, Peter Zijlstra, Russell King, Daniel Thompson, Jiri Kosina, Ingo Molnar, Thomas Gleixner, Chris Metcalf, linux-kernel, x86, linux-arm-kernel, adi-buildroot-devel, linux-cris-kernel, linux-mips, linuxppc-dev, linux-s390, linux-sh, sparclinux, Jan Kara, Ralf Baechle, Benjamin Herrenschmidt, Martin Schwidefsky, David Miller On Thu, 20 Apr 2017 15:11:54 +0200 Petr Mladek <pmladek@suse.com> wrote: > > >From c530d9dee91c74db5e6a198479e2e63b24cb84a2 Mon Sep 17 00:00:00 2001 > From: Petr Mladek <pmladek@suse.com> > Date: Thu, 20 Apr 2017 10:52:31 +0200 > Subject: [PATCH] printk: Use the main logbuf in NMI when logbuf_lock is > available I tried this patch. It's better because I get the end of the trace, but I do lose the beginning of it: ** 196358 printk messages dropped ** [ 102.321182] perf-5981 0.... 12983650us : d_path <-seq_path The way I tested it was by adding this: Index: linux-trace.git/kernel/trace/trace_functions.c =================================================================== --- linux-trace.git.orig/kernel/trace/trace_functions.c +++ linux-trace.git/kernel/trace/trace_functions.c @@ -469,8 +469,11 @@ ftrace_cpudump_probe(unsigned long ip, u struct trace_array *tr, struct ftrace_probe_ops *ops, void *data) { - if (update_count(ops, ip, data)) - ftrace_dump(DUMP_ORIG); + char *killer = NULL; + + panic_on_oops = 1; /* force panic */ + wmb(); + *killer = 1; } static int Then doing the following: # echo 1 > /proc/sys/kernel/ftrace_dump_on_oops # trace-cmd start -p function # echo nmi_handle:cpudump > /debug/tracing/set_ftrace_filter # perf record -c 100 -a sleep 1 And that triggers the crash. -- Steve ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI 2017-04-27 16:14 ` Steven Rostedt @ 2017-04-28 1:35 ` Sergey Senozhatsky 2017-04-28 12:57 ` Petr Mladek 0 siblings, 1 reply; 32+ messages in thread From: Sergey Senozhatsky @ 2017-04-28 1:35 UTC (permalink / raw) To: Steven Rostedt Cc: Petr Mladek, Sergey Senozhatsky, Andrew Morton, Peter Zijlstra, Russell King, Daniel Thompson, Jiri Kosina, Ingo Molnar, Thomas Gleixner, Chris Metcalf, linux-kernel, x86, linux-arm-kernel, adi-buildroot-devel, linux-cris-kernel, linux-mips, linuxppc-dev, linux-s390, linux-sh, sparclinux, Jan Kara, Ralf Baechle, Benjamin Herrenschmidt, Martin Schwidefsky, David Miller On (04/27/17 12:14), Steven Rostedt wrote: [..] > I tried this patch. It's better because I get the end of the trace, but > I do lose the beginning of it: > > ** 196358 printk messages dropped ** [ 102.321182] perf-5981 0.... 12983650us : d_path <-seq_path many thanks! so we now drop messages from logbuf, not from per-CPU buffers. that "queue printk_deferred irq_work on every online CPU when we bypass per-CPU buffers from NMI" idea *probably* might help here - we need someone to emit messages from the logbuf while we printk from NMI. there is still a possibility that we can drop messages, though, since log_store() from NMI CPU can be much-much faster than call_console_drivers() on other CPU. -ss ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI 2017-04-28 1:35 ` Sergey Senozhatsky @ 2017-04-28 12:57 ` Petr Mladek 2017-04-28 14:16 ` Steven Rostedt 0 siblings, 1 reply; 32+ messages in thread From: Petr Mladek @ 2017-04-28 12:57 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Steven Rostedt, Andrew Morton, Peter Zijlstra, Russell King, Daniel Thompson, Jiri Kosina, Ingo Molnar, Thomas Gleixner, Chris Metcalf, linux-kernel, x86, linux-arm-kernel, adi-buildroot-devel, linux-cris-kernel, linux-mips, linuxppc-dev, linux-s390, linux-sh, sparclinux, Jan Kara, Ralf Baechle, Benjamin Herrenschmidt, Martin Schwidefsky, David Miller On Fri 2017-04-28 10:35:32, Sergey Senozhatsky wrote: > On (04/27/17 12:14), Steven Rostedt wrote: > [..] > > I tried this patch. It's better because I get the end of the trace, but > > I do lose the beginning of it: > > > > ** 196358 printk messages dropped ** [ 102.321182] perf-5981 0.... 12983650us : d_path <-seq_path > > many thanks! > > so we now drop messages from logbuf, not from per-CPU buffers. that > "queue printk_deferred irq_work on every online CPU when we bypass per-CPU > buffers from NMI" idea *probably* might help here - we need someone to emit > messages from the logbuf while we printk from NMI. there is still a > possibility that we can drop messages, though, since log_store() from NMI > CPU can be much-much faster than call_console_drivers() on other CPU. ftrace log is dumped via trace_panic_notifier. It is done after smp_send_stop(). It means that only a single CPU is available and it is NMI context at the moment. One possibility might be to put printk into a special mode and drop the last messages instead of the first ones. But this would need to be configurable. Of course, if the problem is reproducible, the easiest solution is to use bigger main log buffer, for example boot with log_buf_len=32M. Best Regards, Petr ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI 2017-04-28 12:57 ` Petr Mladek @ 2017-04-28 14:16 ` Steven Rostedt 0 siblings, 0 replies; 32+ messages in thread From: Steven Rostedt @ 2017-04-28 14:16 UTC (permalink / raw) To: Petr Mladek Cc: Sergey Senozhatsky, Andrew Morton, Peter Zijlstra, Russell King, Daniel Thompson, Jiri Kosina, Ingo Molnar, Thomas Gleixner, Chris Metcalf, linux-kernel, x86, linux-arm-kernel, adi-buildroot-devel, linux-cris-kernel, linux-mips, linuxppc-dev, linux-s390, linux-sh, sparclinux, Jan Kara, Ralf Baechle, Benjamin Herrenschmidt, Martin Schwidefsky, David Miller On Fri, 28 Apr 2017 14:57:25 +0200 Petr Mladek <pmladek@suse.com> wrote: > Of course, if the problem is reproducible, the easiest solution > is to use bigger main log buffer, for example boot with > log_buf_len=32M. Of course that may not be enough. Especially when I have a machine with 240 CPUs. But it also has a ton of RAM, I could easily do log_buf_len=32G -- Steve ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI 2017-04-20 13:11 ` Petr Mladek 2017-04-21 1:57 ` Sergey Senozhatsky 2017-04-27 16:14 ` Steven Rostedt @ 2017-04-28 1:25 ` Sergey Senozhatsky 2017-04-28 12:38 ` Petr Mladek 2 siblings, 1 reply; 32+ messages in thread From: Sergey Senozhatsky @ 2017-04-28 1:25 UTC (permalink / raw) To: Petr Mladek Cc: Sergey Senozhatsky, Steven Rostedt, Andrew Morton, Peter Zijlstra, Russell King, Daniel Thompson, Jiri Kosina, Ingo Molnar, Thomas Gleixner, Chris Metcalf, linux-kernel, x86, linux-arm-kernel, adi-buildroot-devel, linux-cris-kernel, linux-mips, linuxppc-dev, linux-s390, linux-sh, sparclinux, Jan Kara, Ralf Baechle, Benjamin Herrenschmidt, Martin Schwidefsky, David Miller On (04/20/17 15:11), Petr Mladek wrote: [..] > void printk_nmi_enter(void) > { > - this_cpu_or(printk_context, PRINTK_NMI_CONTEXT_MASK); > + /* > + * The size of the extra per-CPU buffer is limited. Use it > + * only when really needed. > + */ > + if (this_cpu_read(printk_context) & PRINTK_SAFE_CONTEXT_MASK || > + raw_spin_is_locked(&logbuf_lock)) { can we please have && here? [..] > diff --git a/lib/nmi_backtrace.c b/lib/nmi_backtrace.c > index 4e8a30d1c22f..0bc0a3535a8a 100644 > --- a/lib/nmi_backtrace.c > +++ b/lib/nmi_backtrace.c > @@ -86,9 +86,11 @@ void nmi_trigger_cpumask_backtrace(const cpumask_t *mask, > > bool nmi_cpu_backtrace(struct pt_regs *regs) > { > + static arch_spinlock_t lock = __ARCH_SPIN_LOCK_UNLOCKED; > int cpu = smp_processor_id(); > > if (cpumask_test_cpu(cpu, to_cpumask(backtrace_mask))) { > + arch_spin_lock(&lock); > if (regs && cpu_in_idle(instruction_pointer(regs))) { > pr_warn("NMI backtrace for cpu %d skipped: idling at pc %#lx\n", > cpu, instruction_pointer(regs)); > @@ -99,6 +101,7 @@ bool nmi_cpu_backtrace(struct pt_regs *regs) > else > dump_stack(); > } > + arch_spin_unlock(&lock); > cpumask_clear_cpu(cpu, to_cpumask(backtrace_mask)); > return true; > } can the nmi_backtrace part be a patch on its own? -ss ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI 2017-04-28 1:25 ` Sergey Senozhatsky @ 2017-04-28 12:38 ` Petr Mladek 0 siblings, 0 replies; 32+ messages in thread From: Petr Mladek @ 2017-04-28 12:38 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Steven Rostedt, Andrew Morton, Peter Zijlstra, Russell King, Daniel Thompson, Jiri Kosina, Ingo Molnar, Thomas Gleixner, Chris Metcalf, linux-kernel, x86, linux-arm-kernel, adi-buildroot-devel, linux-cris-kernel, linux-mips, linuxppc-dev, linux-s390, linux-sh, sparclinux, Jan Kara, Ralf Baechle, Benjamin Herrenschmidt, Martin Schwidefsky, David Miller On Fri 2017-04-28 10:25:30, Sergey Senozhatsky wrote: > > On (04/20/17 15:11), Petr Mladek wrote: > [..] > > void printk_nmi_enter(void) > > { > > - this_cpu_or(printk_context, PRINTK_NMI_CONTEXT_MASK); > > + /* > > + * The size of the extra per-CPU buffer is limited. Use it > > + * only when really needed. > > + */ > > + if (this_cpu_read(printk_context) & PRINTK_SAFE_CONTEXT_MASK || > > + raw_spin_is_locked(&logbuf_lock)) { > > can we please have && here? OK, it sounds reasonable after all. > [..] > > diff --git a/lib/nmi_backtrace.c b/lib/nmi_backtrace.c > > index 4e8a30d1c22f..0bc0a3535a8a 100644 > > --- a/lib/nmi_backtrace.c > > +++ b/lib/nmi_backtrace.c > > @@ -86,9 +86,11 @@ void nmi_trigger_cpumask_backtrace(const cpumask_t *mask, > > > > bool nmi_cpu_backtrace(struct pt_regs *regs) > > { > > + static arch_spinlock_t lock = __ARCH_SPIN_LOCK_UNLOCKED; > > int cpu = smp_processor_id(); > > > > if (cpumask_test_cpu(cpu, to_cpumask(backtrace_mask))) { > > + arch_spin_lock(&lock); > > if (regs && cpu_in_idle(instruction_pointer(regs))) { > > pr_warn("NMI backtrace for cpu %d skipped: idling at pc %#lx\n", > > cpu, instruction_pointer(regs)); > > @@ -99,6 +101,7 @@ bool nmi_cpu_backtrace(struct pt_regs *regs) > > else > > dump_stack(); > > } > > + arch_spin_unlock(&lock); > > cpumask_clear_cpu(cpu, to_cpumask(backtrace_mask)); > > return true; > > } > > can the nmi_backtrace part be a patch on its own? I would prefer to keep it in the same patch. The backtrace from all CPUs is completely unusable when all CPUs push to the global log buffer in parallel. Single patch might safe hair of some poor bisectors. Best Regards, Petr ^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH v5 2/4] printk/nmi: warn when some message has been lost in NMI context 2016-04-21 11:48 [PATCH v5 0/4] Cleaning printk stuff in NMI context Petr Mladek 2016-04-21 11:48 ` [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI Petr Mladek @ 2016-04-21 11:48 ` Petr Mladek 2016-04-27 9:34 ` Russell King - ARM Linux 2016-04-21 11:48 ` [PATCH v5 3/4] printk/nmi: increase the size of NMI buffer and make it configurable Petr Mladek ` (2 subsequent siblings) 4 siblings, 1 reply; 32+ messages in thread From: Petr Mladek @ 2016-04-21 11:48 UTC (permalink / raw) To: Andrew Morton Cc: Peter Zijlstra, Steven Rostedt, Russell King, Daniel Thompson, Jiri Kosina, Ingo Molnar, Thomas Gleixner, Sergey Senozhatsky, Chris Metcalf, linux-kernel, x86, linux-arm-kernel, adi-buildroot-devel, linux-cris-kernel, linux-mips, linuxppc-dev, linux-s390, linux-sh, sparclinux, Petr Mladek, Jan Kara, Ralf Baechle, Benjamin Herrenschmidt, Martin Schwidefsky, David Miller We could not resize the temporary buffer in NMI context. Let's warn if a message is lost. This is rather theoretical. printk() should not be used in NMI. The only sensible use is when we want to print backtrace from all CPUs. The current buffer should be enough for this purpose. [akpm@linux-foundation.org: whitespace fixlet] Signed-off-by: Petr Mladek <pmladek@suse.com> Cc: Jan Kara <jack@suse.cz> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Russell King <rmk+kernel@arm.linux.org.uk> Cc: Daniel Thompson <daniel.thompson@linaro.org> Cc: Jiri Kosina <jkosina@suse.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: David Miller <davem@davemloft.net> Cc: Daniel Thompson <daniel.thompson@linaro.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- kernel/printk/internal.h | 11 +++++++++++ kernel/printk/nmi.c | 5 ++++- kernel/printk/printk.c | 10 ++++++++++ 3 files changed, 25 insertions(+), 1 deletion(-) diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h index 2de99faedfc1..341bedccc065 100644 --- a/kernel/printk/internal.h +++ b/kernel/printk/internal.h @@ -34,6 +34,12 @@ static inline __printf(1, 0) int vprintk_func(const char *fmt, va_list args) return this_cpu_read(printk_func)(fmt, args); } +extern atomic_t nmi_message_lost; +static inline int get_nmi_message_lost(void) +{ + return atomic_xchg(&nmi_message_lost, 0); +} + #else /* CONFIG_PRINTK_NMI */ static inline __printf(1, 0) int vprintk_func(const char *fmt, va_list args) @@ -41,4 +47,9 @@ static inline __printf(1, 0) int vprintk_func(const char *fmt, va_list args) return vprintk_default(fmt, args); } +static inline int get_nmi_message_lost(void) +{ + return 0; +} + #endif /* CONFIG_PRINTK_NMI */ diff --git a/kernel/printk/nmi.c b/kernel/printk/nmi.c index 303cf0d15e57..572f94922230 100644 --- a/kernel/printk/nmi.c +++ b/kernel/printk/nmi.c @@ -39,6 +39,7 @@ */ DEFINE_PER_CPU(printk_func_t, printk_func) = vprintk_default; static int printk_nmi_irq_ready; +atomic_t nmi_message_lost; #define NMI_LOG_BUF_LEN (4096 - sizeof(atomic_t) - sizeof(struct irq_work)) @@ -64,8 +65,10 @@ static int vprintk_nmi(const char *fmt, va_list args) again: len = atomic_read(&s->len); - if (len >= sizeof(s->buffer)) + if (len >= sizeof(s->buffer)) { + atomic_inc(&nmi_message_lost); return 0; + } /* * Make sure that all old data have been read before the buffer was diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index 71eba0607034..e38579d730f4 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -1617,6 +1617,7 @@ asmlinkage int vprintk_emit(int facility, int level, unsigned long flags; int this_cpu; int printed_len = 0; + int nmi_message_lost; bool in_sched = false; /* cpu currently holding logbuf_lock in this function */ static unsigned int logbuf_cpu = UINT_MAX; @@ -1667,6 +1668,15 @@ asmlinkage int vprintk_emit(int facility, int level, strlen(recursion_msg)); } + nmi_message_lost = get_nmi_message_lost(); + if (unlikely(nmi_message_lost)) { + text_len = scnprintf(textbuf, sizeof(textbuf), + "BAD LUCK: lost %d message(s) from NMI context!", + nmi_message_lost); + printed_len += log_store(0, 2, LOG_PREFIX|LOG_NEWLINE, 0, + NULL, 0, textbuf, text_len); + } + /* * The printf needs to come first; we need the syslog * prefix which might be passed-in as a parameter. -- 1.8.5.6 ^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: [PATCH v5 2/4] printk/nmi: warn when some message has been lost in NMI context 2016-04-21 11:48 ` [PATCH v5 2/4] printk/nmi: warn when some message has been lost in NMI context Petr Mladek @ 2016-04-27 9:34 ` Russell King - ARM Linux 0 siblings, 0 replies; 32+ messages in thread From: Russell King - ARM Linux @ 2016-04-27 9:34 UTC (permalink / raw) To: Petr Mladek Cc: Andrew Morton, Peter Zijlstra, Steven Rostedt, Daniel Thompson, Jiri Kosina, Ingo Molnar, Thomas Gleixner, Sergey Senozhatsky, Chris Metcalf, linux-kernel, x86, linux-arm-kernel, adi-buildroot-devel, linux-cris-kernel, linux-mips, linuxppc-dev, linux-s390, linux-sh, sparclinux, Jan Kara, Ralf Baechle, Benjamin Herrenschmidt, Martin Schwidefsky, David Miller On Thu, Apr 21, 2016 at 01:48:43PM +0200, Petr Mladek wrote: > @@ -64,8 +65,10 @@ static int vprintk_nmi(const char *fmt, va_list args) > again: > len = atomic_read(&s->len); > > - if (len >= sizeof(s->buffer)) > + if (len >= sizeof(s->buffer)) { > + atomic_inc(&nmi_message_lost); This should be fine - I think its reasonable to expect that no one is using a spinlock implementation for their atomic ops for this... -- RMK's Patch system: http://www.arm.linux.org.uk/developer/patches/ FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net. ^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH v5 3/4] printk/nmi: increase the size of NMI buffer and make it configurable 2016-04-21 11:48 [PATCH v5 0/4] Cleaning printk stuff in NMI context Petr Mladek 2016-04-21 11:48 ` [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI Petr Mladek 2016-04-21 11:48 ` [PATCH v5 2/4] printk/nmi: warn when some message has been lost in NMI context Petr Mladek @ 2016-04-21 11:48 ` Petr Mladek 2016-04-21 11:48 ` [PATCH v5 4/4] printk/nmi: flush NMI messages on the system panic Petr Mladek 2016-04-27 0:36 ` [PATCH v5 0/4] Cleaning printk stuff in NMI context Sergey Senozhatsky 4 siblings, 0 replies; 32+ messages in thread From: Petr Mladek @ 2016-04-21 11:48 UTC (permalink / raw) To: Andrew Morton Cc: Peter Zijlstra, Steven Rostedt, Russell King, Daniel Thompson, Jiri Kosina, Ingo Molnar, Thomas Gleixner, Sergey Senozhatsky, Chris Metcalf, linux-kernel, x86, linux-arm-kernel, adi-buildroot-devel, linux-cris-kernel, linux-mips, linuxppc-dev, linux-s390, linux-sh, sparclinux, Petr Mladek, Jan Kara, Ralf Baechle, Benjamin Herrenschmidt, Martin Schwidefsky, David Miller Testing has shown that the backtrace sometimes does not fit into the 4kB temporary buffer that is used in NMI context. The warnings are gone when I double the temporary buffer size. This patch doubles the buffer size and makes it configurable. Note that this problem existed even in the x86-specific implementation that was added by the commit a9edc8809328 ("x86/nmi: Perform a safe NMI stack trace on all CPUs"). Nobody noticed it because it did not print any warnings. Signed-off-by: Petr Mladek <pmladek@suse.com> Cc: Jan Kara <jack@suse.cz> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Russell King <rmk+kernel@arm.linux.org.uk> Cc: Daniel Thompson <daniel.thompson@linaro.org> Cc: Jiri Kosina <jkosina@suse.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: David Miller <davem@davemloft.net> Cc: Daniel Thompson <daniel.thompson@linaro.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- init/Kconfig | 22 ++++++++++++++++++++++ kernel/printk/nmi.c | 3 ++- 2 files changed, 24 insertions(+), 1 deletion(-) diff --git a/init/Kconfig b/init/Kconfig index 85c7a2bf1ea4..9dfd27394d43 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -862,6 +862,28 @@ config LOG_CPU_MAX_BUF_SHIFT 13 => 8 KB for each CPU 12 => 4 KB for each CPU +config NMI_LOG_BUF_SHIFT + int "Temporary per-CPU NMI log buffer size (12 => 4KB, 13 => 8KB)" + range 10 21 + default 13 + depends on PRINTK_NMI + help + Select the size of a per-CPU buffer where NMI messages are temporary + stored. They are copied to the main log buffer in a safe context + to avoid a deadlock. The value defines the size as a power of 2. + + NMI messages are rare and limited. The largest one is when + a backtrace is printed. It usually fits into 4KB. Select + 8KB if you want to be on the safe side. + + Examples: + 17 => 128 KB for each CPU + 16 => 64 KB for each CPU + 15 => 32 KB for each CPU + 14 => 16 KB for each CPU + 13 => 8 KB for each CPU + 12 => 4 KB for each CPU + # # Architectures with an unreliable sched_clock() should select this: # diff --git a/kernel/printk/nmi.c b/kernel/printk/nmi.c index 572f94922230..bf08557d7e3d 100644 --- a/kernel/printk/nmi.c +++ b/kernel/printk/nmi.c @@ -41,7 +41,8 @@ DEFINE_PER_CPU(printk_func_t, printk_func) = vprintk_default; static int printk_nmi_irq_ready; atomic_t nmi_message_lost; -#define NMI_LOG_BUF_LEN (4096 - sizeof(atomic_t) - sizeof(struct irq_work)) +#define NMI_LOG_BUF_LEN ((1 << CONFIG_NMI_LOG_BUF_SHIFT) - \ + sizeof(atomic_t) - sizeof(struct irq_work)) struct nmi_seq_buf { atomic_t len; /* length of written data */ -- 1.8.5.6 ^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH v5 4/4] printk/nmi: flush NMI messages on the system panic 2016-04-21 11:48 [PATCH v5 0/4] Cleaning printk stuff in NMI context Petr Mladek ` (2 preceding siblings ...) 2016-04-21 11:48 ` [PATCH v5 3/4] printk/nmi: increase the size of NMI buffer and make it configurable Petr Mladek @ 2016-04-21 11:48 ` Petr Mladek 2016-04-23 3:49 ` Sergey Senozhatsky 2016-04-27 0:36 ` [PATCH v5 0/4] Cleaning printk stuff in NMI context Sergey Senozhatsky 4 siblings, 1 reply; 32+ messages in thread From: Petr Mladek @ 2016-04-21 11:48 UTC (permalink / raw) To: Andrew Morton Cc: Peter Zijlstra, Steven Rostedt, Russell King, Daniel Thompson, Jiri Kosina, Ingo Molnar, Thomas Gleixner, Sergey Senozhatsky, Chris Metcalf, linux-kernel, x86, linux-arm-kernel, adi-buildroot-devel, linux-cris-kernel, linux-mips, linuxppc-dev, linux-s390, linux-sh, sparclinux, Petr Mladek In NMI context, printk() messages are stored into per-CPU buffers to avoid a possible deadlock. They are normally flushed to the main ring buffer via an IRQ work. But the work is never called when the system calls panic() in the very same NMI handler. This patch tries to flush NMI buffers before the crash dump is generated. In this case it does not risk a double release and bails out when the logbuf_lock is already taken. The aim is to get the messages into the main ring buffer when possible. It makes them better accessible in the vmcore. Then the patch tries to flush the buffers second time when other CPUs are down. It might be more aggressive and reset logbuf_lock. The aim is to get the messages available for the consequent kmsg_dump() and console_flush_on_panic() calls. The patch causes vprintk_emit() to be called even in NMI context again. But it is done via printk_deferred() so that the console handling is skipped. Consoles use internal locks and we could not prevent a deadlock easily. They are explicitly called later when the crash dump is not generated, see console_flush_on_panic(). Signed-off-by: Petr Mladek <pmladek@suse.com> --- include/linux/printk.h | 2 ++ kernel/kexec_core.c | 1 + kernel/panic.c | 6 +++++- kernel/printk/internal.h | 2 ++ kernel/printk/nmi.c | 39 ++++++++++++++++++++++++++++++++++++++- kernel/printk/printk.c | 2 +- 6 files changed, 49 insertions(+), 3 deletions(-) diff --git a/include/linux/printk.h b/include/linux/printk.h index 51dd6b824fe2..f4da695fd615 100644 --- a/include/linux/printk.h +++ b/include/linux/printk.h @@ -127,11 +127,13 @@ extern void printk_nmi_init(void); extern void printk_nmi_enter(void); extern void printk_nmi_exit(void); extern void printk_nmi_flush(void); +extern void printk_nmi_flush_on_panic(void); #else static inline void printk_nmi_init(void) { } static inline void printk_nmi_enter(void) { } static inline void printk_nmi_exit(void) { } static inline void printk_nmi_flush(void) { } +static inline void printk_nmi_flush_on_panic(void) { } #endif /* PRINTK_NMI */ #ifdef CONFIG_PRINTK diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c index 8d34308ea449..1dc3fe8495e0 100644 --- a/kernel/kexec_core.c +++ b/kernel/kexec_core.c @@ -893,6 +893,7 @@ void crash_kexec(struct pt_regs *regs) old_cpu = atomic_cmpxchg(&panic_cpu, PANIC_CPU_INVALID, this_cpu); if (old_cpu == PANIC_CPU_INVALID) { /* This is the 1st CPU which comes here, so go ahead. */ + printk_nmi_flush_on_panic(); __crash_kexec(regs); /* diff --git a/kernel/panic.c b/kernel/panic.c index 535c96510a44..8aa74497cc5a 100644 --- a/kernel/panic.c +++ b/kernel/panic.c @@ -160,8 +160,10 @@ void panic(const char *fmt, ...) * * Bypass the panic_cpu check and call __crash_kexec directly. */ - if (!crash_kexec_post_notifiers) + if (!crash_kexec_post_notifiers) { + printk_nmi_flush_on_panic(); __crash_kexec(NULL); + } /* * Note smp_send_stop is the usual smp shutdown function, which @@ -176,6 +178,8 @@ void panic(const char *fmt, ...) */ atomic_notifier_call_chain(&panic_notifier_list, 0, buf); + /* Call flush even twice. It tries harder with a single online CPU */ + printk_nmi_flush_on_panic(); kmsg_dump(KMSG_DUMP_PANIC); /* diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h index 341bedccc065..7fd2838fa417 100644 --- a/kernel/printk/internal.h +++ b/kernel/printk/internal.h @@ -22,6 +22,8 @@ int __printf(1, 0) vprintk_default(const char *fmt, va_list args); #ifdef CONFIG_PRINTK_NMI +extern raw_spinlock_t logbuf_lock; + /* * printk() could not take logbuf_lock in NMI context. Instead, * it temporary stores the strings into a per-CPU buffer. diff --git a/kernel/printk/nmi.c b/kernel/printk/nmi.c index bf08557d7e3d..b69eb8a2876f 100644 --- a/kernel/printk/nmi.c +++ b/kernel/printk/nmi.c @@ -17,6 +17,7 @@ #include <linux/preempt.h> #include <linux/spinlock.h> +#include <linux/debug_locks.h> #include <linux/smp.h> #include <linux/cpumask.h> #include <linux/irq_work.h> @@ -106,7 +107,16 @@ static void print_nmi_seq_line(struct nmi_seq_buf *s, int start, int end) { const char *buf = s->buffer + start; - printk("%.*s", (end - start) + 1, buf); + /* + * The buffers are flushed in NMI only on panic. The messages must + * go only into the ring buffer at this stage. Consoles will get + * explicitly called later when a crashdump is not generated. + */ + if (in_nmi()) + printk_deferred("%.*s", (end - start) + 1, buf); + else + printk("%.*s", (end - start) + 1, buf); + } /* @@ -194,6 +204,33 @@ void printk_nmi_flush(void) __printk_nmi_flush(&per_cpu(nmi_print_seq, cpu).work); } +/** + * printk_nmi_flush_on_panic - flush all per-cpu nmi buffers when the system + * goes down. + * + * Similar to printk_nmi_flush() but it can be called even in NMI context when + * the system goes down. It does the best effort to get NMI messages into + * the main ring buffer. + * + * Note that it could try harder when there is only one CPU online. + */ +void printk_nmi_flush_on_panic(void) +{ + /* + * Make sure that we could access the main ring buffer. + * Do not risk a double release when more CPUs are up. + */ + if (in_nmi() && raw_spin_is_locked(&logbuf_lock)) { + if (num_online_cpus() > 1) + return; + + debug_locks_off(); + raw_spin_lock_init(&logbuf_lock); + } + + printk_nmi_flush(); +} + void __init printk_nmi_init(void) { int cpu; diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index e38579d730f4..60cdf6386763 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -245,7 +245,7 @@ __packed __aligned(4) * within the scheduler's rq lock. It must be released before calling * console_unlock() or anything else that might wake up a process. */ -static DEFINE_RAW_SPINLOCK(logbuf_lock); +DEFINE_RAW_SPINLOCK(logbuf_lock); #ifdef CONFIG_PRINTK DECLARE_WAIT_QUEUE_HEAD(log_wait); -- 1.8.5.6 ^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: [PATCH v5 4/4] printk/nmi: flush NMI messages on the system panic 2016-04-21 11:48 ` [PATCH v5 4/4] printk/nmi: flush NMI messages on the system panic Petr Mladek @ 2016-04-23 3:49 ` Sergey Senozhatsky 2016-04-26 14:21 ` Petr Mladek 0 siblings, 1 reply; 32+ messages in thread From: Sergey Senozhatsky @ 2016-04-23 3:49 UTC (permalink / raw) To: Petr Mladek Cc: Andrew Morton, Peter Zijlstra, Steven Rostedt, Russell King, Daniel Thompson, Jiri Kosina, Ingo Molnar, Thomas Gleixner, Sergey Senozhatsky, Chris Metcalf, linux-kernel, x86, linux-arm-kernel, adi-buildroot-devel, linux-cris-kernel, linux-mips, linuxppc-dev, linux-s390, linux-sh, sparclinux Hello Petr, On (04/21/16 13:48), Petr Mladek wrote: > extern void printk_nmi_flush(void); > +extern void printk_nmi_flush_on_panic(void); > #else > static inline void printk_nmi_flush(void) { } > +static inline void printk_nmi_flush_on_panic(void) { } [..] > +void printk_nmi_flush_on_panic(void) > +{ > + /* > + * Make sure that we could access the main ring buffer. > + * Do not risk a double release when more CPUs are up. > + */ > + if (in_nmi() && raw_spin_is_locked(&logbuf_lock)) { > + if (num_online_cpus() > 1) > + return; > + > + debug_locks_off(); > + raw_spin_lock_init(&logbuf_lock); > + } > + > + printk_nmi_flush(); > +} [..] > -static DEFINE_RAW_SPINLOCK(logbuf_lock); > +DEFINE_RAW_SPINLOCK(logbuf_lock); just an idea, how about doing it a bit differently? move printk_nmi_flush_on_panic() to printk.c, and place it next to printk_flush_on_panic() (so we will have two printk "flush-on-panic" functions sitting together). /* printk_nmi_flush() is in printk.h, so it's visible to printk anyway */ it also will let us keep logbuf_lock static, it's a bit too internal to printk to expose it, I think. IOW, something like this? --- kernel/printk/internal.h | 2 -- kernel/printk/nmi.c | 27 --------------------------- kernel/printk/printk.c | 29 ++++++++++++++++++++++++++++- 3 files changed, 28 insertions(+), 30 deletions(-) diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h index 7fd2838..341bedc 100644 --- a/kernel/printk/internal.h +++ b/kernel/printk/internal.h @@ -22,8 +22,6 @@ int __printf(1, 0) vprintk_default(const char *fmt, va_list args); #ifdef CONFIG_PRINTK_NMI -extern raw_spinlock_t logbuf_lock; - /* * printk() could not take logbuf_lock in NMI context. Instead, * it temporary stores the strings into a per-CPU buffer. diff --git a/kernel/printk/nmi.c b/kernel/printk/nmi.c index b69eb8a..b68a9864 100644 --- a/kernel/printk/nmi.c +++ b/kernel/printk/nmi.c @@ -204,33 +204,6 @@ void printk_nmi_flush(void) __printk_nmi_flush(&per_cpu(nmi_print_seq, cpu).work); } -/** - * printk_nmi_flush_on_panic - flush all per-cpu nmi buffers when the system - * goes down. - * - * Similar to printk_nmi_flush() but it can be called even in NMI context when - * the system goes down. It does the best effort to get NMI messages into - * the main ring buffer. - * - * Note that it could try harder when there is only one CPU online. - */ -void printk_nmi_flush_on_panic(void) -{ - /* - * Make sure that we could access the main ring buffer. - * Do not risk a double release when more CPUs are up. - */ - if (in_nmi() && raw_spin_is_locked(&logbuf_lock)) { - if (num_online_cpus() > 1) - return; - - debug_locks_off(); - raw_spin_lock_init(&logbuf_lock); - } - - printk_nmi_flush(); -} - void __init printk_nmi_init(void) { int cpu; diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index 0a0e789..1509baa 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -245,7 +245,7 @@ __packed __aligned(4) * within the scheduler's rq lock. It must be released before calling * console_unlock() or anything else that might wake up a process. */ -DEFINE_RAW_SPINLOCK(logbuf_lock); +static DEFINE_RAW_SPINLOCK(logbuf_lock); #ifdef CONFIG_PRINTK DECLARE_WAIT_QUEUE_HEAD(log_wait); @@ -2447,6 +2447,33 @@ void console_unblank(void) } /** + * printk_nmi_flush_on_panic - flush all per-cpu nmi buffers when the system + * goes down. + * + * Similar to printk_nmi_flush() but it can be called even in NMI context when + * the system goes down. It does the best effort to get NMI messages into + * the main ring buffer. + * + * Note that it could try harder when there is only one CPU online. + */ +void printk_nmi_flush_on_panic(void) +{ + /* + * Make sure that we could access the main ring buffer. + * Do not risk a double release when more CPUs are up. + */ + if (in_nmi() && raw_spin_is_locked(&logbuf_lock)) { + if (num_online_cpus() > 1) + return; + + debug_locks_off(); + raw_spin_lock_init(&logbuf_lock); + } + + printk_nmi_flush(); +} + +/** * console_flush_on_panic - flush console content on panic * * Immediately output all pending messages no matter what. ^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: [PATCH v5 4/4] printk/nmi: flush NMI messages on the system panic 2016-04-23 3:49 ` Sergey Senozhatsky @ 2016-04-26 14:21 ` Petr Mladek 2016-04-27 0:34 ` Sergey Senozhatsky 0 siblings, 1 reply; 32+ messages in thread From: Petr Mladek @ 2016-04-26 14:21 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Andrew Morton, Peter Zijlstra, Steven Rostedt, Russell King, Daniel Thompson, Jiri Kosina, Ingo Molnar, Thomas Gleixner, Chris Metcalf, linux-kernel, x86, linux-arm-kernel, adi-buildroot-devel, linux-cris-kernel, linux-mips, linuxppc-dev, linux-s390, linux-sh, sparclinux On Sat 2016-04-23 12:49:24, Sergey Senozhatsky wrote: > Hello Petr, > > On (04/21/16 13:48), Petr Mladek wrote: > > extern void printk_nmi_flush(void); > > +extern void printk_nmi_flush_on_panic(void); > > #else > > static inline void printk_nmi_flush(void) { } > > +static inline void printk_nmi_flush_on_panic(void) { } > [..] > > +void printk_nmi_flush_on_panic(void) > > +{ > > + /* > > + * Make sure that we could access the main ring buffer. > > + * Do not risk a double release when more CPUs are up. > > + */ > > + if (in_nmi() && raw_spin_is_locked(&logbuf_lock)) { > > + if (num_online_cpus() > 1) > > + return; > > + > > + debug_locks_off(); > > + raw_spin_lock_init(&logbuf_lock); > > + } > > + > > + printk_nmi_flush(); > > +} > [..] > > -static DEFINE_RAW_SPINLOCK(logbuf_lock); > > +DEFINE_RAW_SPINLOCK(logbuf_lock); > > just an idea, > > how about doing it a bit differently? > > > move printk_nmi_flush_on_panic() to printk.c, and place it next to > printk_flush_on_panic() (so we will have two printk "flush-on-panic" > functions sitting together). /* printk_nmi_flush() is in printk.h, > so it's visible to printk anyway */ > > it also will let us keep logbuf_lock static, it's a bit too internal > to printk to expose it, I think. > > IOW, something like this? It is rather cosmetic change. I > --- > > kernel/printk/internal.h | 2 -- > kernel/printk/nmi.c | 27 --------------------------- > kernel/printk/printk.c | 29 ++++++++++++++++++++++++++++- > 3 files changed, 28 insertions(+), 30 deletions(-) > > diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h > index 7fd2838..341bedc 100644 > --- a/kernel/printk/internal.h > +++ b/kernel/printk/internal.h > @@ -22,8 +22,6 @@ int __printf(1, 0) vprintk_default(const char *fmt, va_list args); > > #ifdef CONFIG_PRINTK_NMI > > -extern raw_spinlock_t logbuf_lock; Well, it was exposed only in the internal.h header file. I consider this rather a cosmetic change and do not have strong opinion about it. :-) Anyway, thanks a lot for review. Best Regards, Petr ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v5 4/4] printk/nmi: flush NMI messages on the system panic 2016-04-26 14:21 ` Petr Mladek @ 2016-04-27 0:34 ` Sergey Senozhatsky 0 siblings, 0 replies; 32+ messages in thread From: Sergey Senozhatsky @ 2016-04-27 0:34 UTC (permalink / raw) To: Petr Mladek Cc: Sergey Senozhatsky, Andrew Morton, Peter Zijlstra, Steven Rostedt, Russell King, Daniel Thompson, Jiri Kosina, Ingo Molnar, Thomas Gleixner, Chris Metcalf, linux-kernel, x86, linux-arm-kernel, adi-buildroot-devel, linux-cris-kernel, linux-mips, linuxppc-dev, linux-s390, linux-sh, sparclinux On (04/26/16 16:21), Petr Mladek wrote: [..] > > move printk_nmi_flush_on_panic() to printk.c, and place it next to > > printk_flush_on_panic() (so we will have two printk "flush-on-panic" > > functions sitting together). /* printk_nmi_flush() is in printk.h, > > so it's visible to printk anyway */ > > > > it also will let us keep logbuf_lock static, it's a bit too internal > > to printk to expose it, I think. > > > > IOW, something like this? > > It is rather cosmetic change. I oh yes, it is. just to keep similar functionality (flush, zap locks) in one place (printk). -ss ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v5 0/4] Cleaning printk stuff in NMI context 2016-04-21 11:48 [PATCH v5 0/4] Cleaning printk stuff in NMI context Petr Mladek ` (3 preceding siblings ...) 2016-04-21 11:48 ` [PATCH v5 4/4] printk/nmi: flush NMI messages on the system panic Petr Mladek @ 2016-04-27 0:36 ` Sergey Senozhatsky 4 siblings, 0 replies; 32+ messages in thread From: Sergey Senozhatsky @ 2016-04-27 0:36 UTC (permalink / raw) To: Petr Mladek Cc: Andrew Morton, Peter Zijlstra, Steven Rostedt, Russell King, Daniel Thompson, Jiri Kosina, Ingo Molnar, Thomas Gleixner, Sergey Senozhatsky, Chris Metcalf, linux-kernel, x86, linux-arm-kernel, adi-buildroot-devel, linux-cris-kernel, linux-mips, linuxppc-dev, linux-s390, linux-sh, sparclinux On (04/21/16 13:48), Petr Mladek wrote: > v5 adds changes suggested by Sergey Senozhatsky. It should not longer > conflict with his async printk patchset. passes my tests. (apart from that cosmetic-nano-nit) the series looks good to me Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> -ss ^ permalink raw reply [flat|nested] 32+ messages in thread
end of thread, other threads:[~2017-04-28 14:47 UTC | newest] Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-04-21 11:48 [PATCH v5 0/4] Cleaning printk stuff in NMI context Petr Mladek 2016-04-21 11:48 ` [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI Petr Mladek 2016-04-27 9:31 ` Russell King - ARM Linux 2017-04-19 17:13 ` Steven Rostedt 2017-04-19 17:21 ` Peter Zijlstra 2017-04-20 3:31 ` Sergey Senozhatsky 2017-04-20 13:11 ` Petr Mladek 2017-04-21 1:57 ` Sergey Senozhatsky 2017-04-21 12:06 ` Petr Mladek 2017-04-24 2:17 ` Sergey Senozhatsky 2017-04-27 13:38 ` Petr Mladek 2017-04-27 14:31 ` Steven Rostedt 2017-04-27 15:28 ` Petr Mladek 2017-04-27 15:42 ` Steven Rostedt 2017-04-28 9:02 ` Peter Zijlstra 2017-04-28 13:44 ` Petr Mladek 2017-04-28 13:58 ` Peter Zijlstra 2017-04-28 14:47 ` Steven Rostedt 2017-04-27 16:14 ` Steven Rostedt 2017-04-28 1:35 ` Sergey Senozhatsky 2017-04-28 12:57 ` Petr Mladek 2017-04-28 14:16 ` Steven Rostedt 2017-04-28 1:25 ` Sergey Senozhatsky 2017-04-28 12:38 ` Petr Mladek 2016-04-21 11:48 ` [PATCH v5 2/4] printk/nmi: warn when some message has been lost in NMI context Petr Mladek 2016-04-27 9:34 ` Russell King - ARM Linux 2016-04-21 11:48 ` [PATCH v5 3/4] printk/nmi: increase the size of NMI buffer and make it configurable Petr Mladek 2016-04-21 11:48 ` [PATCH v5 4/4] printk/nmi: flush NMI messages on the system panic Petr Mladek 2016-04-23 3:49 ` Sergey Senozhatsky 2016-04-26 14:21 ` Petr Mladek 2016-04-27 0:34 ` Sergey Senozhatsky 2016-04-27 0:36 ` [PATCH v5 0/4] Cleaning printk stuff in NMI context Sergey Senozhatsky
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).