All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH printk v2 0/5] printk: remove safe buffers
@ 2021-03-30 15:35 ` John Ogness
  0 siblings, 0 replies; 31+ messages in thread
From: John Ogness @ 2021-03-30 15:35 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Steven Rostedt,
	Thomas Gleixner, linux-kernel, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, Eric Biederman,
	Christophe Leroy, Nicholas Piggin, Alistair Popple,
	Jordan Niethe, Peter Zijlstra, Aneesh Kumar K.V, Andrew Morton,
	Kees Cook, Tiezhu Yang, Alexey Kardashevskiy, Yue Hu,
	Rafael Aquini, Guilherme G. Piccoli, Paul E. McKenney,
	linuxppc-dev, kexec, Russell King, Ingo Molnar, Marc Zyngier,
	Valentin Schneider, Mike Rapoport, Geert Uytterhoeven,
	Anshuman Khandual, Leonardo Bras, Andy Lutomirski, Will Deacon,
	Frederic Weisbecker, Masahiro Yamada, Nathan Chancellor,
	Alexei Starovoitov, Nick Terrell, David Howells, Johannes Weiner,
	linux-arm-kernel

Hi,

Here is v2 of a series to remove the safe buffers. v1 can be
found here [0]. The safe buffers are no longer needed because
messages can be stored directly into the log buffer from any
context.

However, the safe buffers also provided a form of recursion
protection. For that reason, explicit recursion protection is
also implemented for this series.

And finally, with the removal of the safe buffers, there is no
need for extra NMI enter/exit tracking. So this is also removed
(which includes removing config option CONFIG_PRINTK_NMI).

This series is based on the printk-rework branch of
printk/linux.git:

    commit acebb5597ff1 ("kernel/printk.c: Fixed mundane typos")

Changes since v1:

- remove the printk nmi enter/exit tracking

- remove CONFIG_PRINTK_NMI config option

- use in_nmi() to detect NMI context

- remove unused printk_safe_enter/exit macros

- after switching to the dynamic buffer, copy over NMI records
  that may have arrived during the switch window

- use local_irq_*() instead of printk_safe_*() for console
  spinning

- use separate variables rather than arrays for the per-cpu
  recursion tracking

- make @syslog_lock a mutex instead of a spin_lock

- close the wait-read window for SYSLOG_ACTION_READ

- adjust various comments and commit messages as requested

John Ogness

[0] https://lore.kernel.org/lkml/20210316233326.10778-1-john.ogness@linutronix.de

John Ogness (5):
  printk: track/limit recursion
  printk: remove safe buffers
  printk: remove NMI tracking
  printk: convert @syslog_lock to mutex
  printk: syslog: close window between wait and read

 arch/arm/kernel/smp.c          |   2 -
 arch/powerpc/kernel/traps.c    |   1 -
 arch/powerpc/kernel/watchdog.c |   5 -
 arch/powerpc/kexec/crash.c     |   3 -
 include/linux/hardirq.h        |   2 -
 include/linux/printk.h         |  22 --
 init/Kconfig                   |   5 -
 kernel/kexec_core.c            |   1 -
 kernel/panic.c                 |   3 -
 kernel/printk/internal.h       |  23 ---
 kernel/printk/printk.c         | 281 +++++++++++++++----------
 kernel/printk/printk_safe.c    | 362 +--------------------------------
 kernel/trace/trace.c           |   2 -
 lib/nmi_backtrace.c            |   6 -
 14 files changed, 171 insertions(+), 547 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH printk v2 0/5] printk: remove safe buffers
@ 2021-03-30 15:35 ` John Ogness
  0 siblings, 0 replies; 31+ messages in thread
From: John Ogness @ 2021-03-30 15:35 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Alexey Kardashevskiy, Alexei Starovoitov,
	David Howells, Paul Mackerras, Tiezhu Yang, Will Deacon,
	Rafael Aquini, Marc Zyngier, Aneesh Kumar K.V, Masahiro Yamada,
	Russell King, Peter Zijlstra, Yue Hu, Ingo Molnar,
	Geert Uytterhoeven, Jordan Niethe, Valentin Schneider, Kees Cook,
	Paul E. McKenney, Anshuman Khandual, Alistair Popple,
	Frederic Weisbecker, Guilherme G. Piccoli, Nicholas Piggin,
	Nathan Chancellor, Nick Terrell, Steven Rostedt, Andy Lutomirski,
	Leonardo Bras, Thomas Gleixner, linux-arm-kernel, kexec,
	linux-kernel, Johannes Weiner, Sergey Senozhatsky,
	Eric Biederman, Andrew Morton, linuxppc-dev, Mike Rapoport

Hi,

Here is v2 of a series to remove the safe buffers. v1 can be
found here [0]. The safe buffers are no longer needed because
messages can be stored directly into the log buffer from any
context.

However, the safe buffers also provided a form of recursion
protection. For that reason, explicit recursion protection is
also implemented for this series.

And finally, with the removal of the safe buffers, there is no
need for extra NMI enter/exit tracking. So this is also removed
(which includes removing config option CONFIG_PRINTK_NMI).

This series is based on the printk-rework branch of
printk/linux.git:

    commit acebb5597ff1 ("kernel/printk.c: Fixed mundane typos")

Changes since v1:

- remove the printk nmi enter/exit tracking

- remove CONFIG_PRINTK_NMI config option

- use in_nmi() to detect NMI context

- remove unused printk_safe_enter/exit macros

- after switching to the dynamic buffer, copy over NMI records
  that may have arrived during the switch window

- use local_irq_*() instead of printk_safe_*() for console
  spinning

- use separate variables rather than arrays for the per-cpu
  recursion tracking

- make @syslog_lock a mutex instead of a spin_lock

- close the wait-read window for SYSLOG_ACTION_READ

- adjust various comments and commit messages as requested

John Ogness

[0] https://lore.kernel.org/lkml/20210316233326.10778-1-john.ogness@linutronix.de

John Ogness (5):
  printk: track/limit recursion
  printk: remove safe buffers
  printk: remove NMI tracking
  printk: convert @syslog_lock to mutex
  printk: syslog: close window between wait and read

 arch/arm/kernel/smp.c          |   2 -
 arch/powerpc/kernel/traps.c    |   1 -
 arch/powerpc/kernel/watchdog.c |   5 -
 arch/powerpc/kexec/crash.c     |   3 -
 include/linux/hardirq.h        |   2 -
 include/linux/printk.h         |  22 --
 init/Kconfig                   |   5 -
 kernel/kexec_core.c            |   1 -
 kernel/panic.c                 |   3 -
 kernel/printk/internal.h       |  23 ---
 kernel/printk/printk.c         | 281 +++++++++++++++----------
 kernel/printk/printk_safe.c    | 362 +--------------------------------
 kernel/trace/trace.c           |   2 -
 lib/nmi_backtrace.c            |   6 -
 14 files changed, 171 insertions(+), 547 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH printk v2 0/5] printk: remove safe buffers
@ 2021-03-30 15:35 ` John Ogness
  0 siblings, 0 replies; 31+ messages in thread
From: John Ogness @ 2021-03-30 15:35 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Steven Rostedt,
	Thomas Gleixner, linux-kernel, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, Eric Biederman,
	Christophe Leroy, Nicholas Piggin, Alistair Popple,
	Jordan Niethe, Peter Zijlstra, Aneesh Kumar K.V, Andrew Morton,
	Kees Cook, Tiezhu Yang, Alexey Kardashevskiy, Yue Hu,
	Rafael Aquini, Guilherme G. Piccoli, Paul E. McKenney,
	linuxppc-dev, kexec, Russell King, Ingo Molnar, Marc Zyngier,
	Valentin Schneider, Mike Rapoport, Geert Uytterhoeven,
	Anshuman Khandual, Leonardo Bras, Andy Lutomirski, Will Deacon,
	Frederic Weisbecker, Masahiro Yamada, Nathan Chancellor,
	Alexei Starovoitov, Nick Terrell, David Howells, Johannes Weiner,
	linux-arm-kernel

Hi,

Here is v2 of a series to remove the safe buffers. v1 can be
found here [0]. The safe buffers are no longer needed because
messages can be stored directly into the log buffer from any
context.

However, the safe buffers also provided a form of recursion
protection. For that reason, explicit recursion protection is
also implemented for this series.

And finally, with the removal of the safe buffers, there is no
need for extra NMI enter/exit tracking. So this is also removed
(which includes removing config option CONFIG_PRINTK_NMI).

This series is based on the printk-rework branch of
printk/linux.git:

    commit acebb5597ff1 ("kernel/printk.c: Fixed mundane typos")

Changes since v1:

- remove the printk nmi enter/exit tracking

- remove CONFIG_PRINTK_NMI config option

- use in_nmi() to detect NMI context

- remove unused printk_safe_enter/exit macros

- after switching to the dynamic buffer, copy over NMI records
  that may have arrived during the switch window

- use local_irq_*() instead of printk_safe_*() for console
  spinning

- use separate variables rather than arrays for the per-cpu
  recursion tracking

- make @syslog_lock a mutex instead of a spin_lock

- close the wait-read window for SYSLOG_ACTION_READ

- adjust various comments and commit messages as requested

John Ogness

[0] https://lore.kernel.org/lkml/20210316233326.10778-1-john.ogness@linutronix.de

John Ogness (5):
  printk: track/limit recursion
  printk: remove safe buffers
  printk: remove NMI tracking
  printk: convert @syslog_lock to mutex
  printk: syslog: close window between wait and read

 arch/arm/kernel/smp.c          |   2 -
 arch/powerpc/kernel/traps.c    |   1 -
 arch/powerpc/kernel/watchdog.c |   5 -
 arch/powerpc/kexec/crash.c     |   3 -
 include/linux/hardirq.h        |   2 -
 include/linux/printk.h         |  22 --
 init/Kconfig                   |   5 -
 kernel/kexec_core.c            |   1 -
 kernel/panic.c                 |   3 -
 kernel/printk/internal.h       |  23 ---
 kernel/printk/printk.c         | 281 +++++++++++++++----------
 kernel/printk/printk_safe.c    | 362 +--------------------------------
 kernel/trace/trace.c           |   2 -
 lib/nmi_backtrace.c            |   6 -
 14 files changed, 171 insertions(+), 547 deletions(-)

-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH printk v2 1/5] printk: track/limit recursion
  2021-03-30 15:35 ` John Ogness
  (?)
  (?)
@ 2021-03-30 15:35 ` John Ogness
  2021-04-01 10:00   ` Petr Mladek
  -1 siblings, 1 reply; 31+ messages in thread
From: John Ogness @ 2021-03-30 15:35 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Steven Rostedt,
	Thomas Gleixner, linux-kernel

Currently the printk safe buffers provide a form of recursion
protection by redirecting to the safe buffers whenever printk() is
recursively called.

In preparation for removal of the safe buffers, provide an alternate
explicit recursion protection. Recursion is limited to 3 levels
per-CPU and per-context.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
---
 kernel/printk/printk.c | 83 ++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 80 insertions(+), 3 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 421c35571797..e971c0a9ec9e 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1940,6 +1940,74 @@ static void call_console_drivers(const char *ext_text, size_t ext_len,
 	}
 }
 
+/*
+ * Recursion is tracked separately on each CPU. If NMIs are supported, an
+ * additional NMI context per CPU is also separately tracked. Until per-CPU
+ * is available, a separate "early tracking" is performed.
+ */
+static DEFINE_PER_CPU(u8, printk_count);
+static u8 printk_count_early;
+#ifdef CONFIG_HAVE_NMI
+static DEFINE_PER_CPU(u8, printk_count_nmi);
+static u8 printk_count_nmi_early;
+#endif
+
+/*
+ * Recursion is limited to keep the output sane. printk() should not require
+ * more than 1 level of recursion (allowing, for example, printk() to trigger
+ * a WARN), but a higher value is used in case some printk-internal errors
+ * exist, such as the ringbuffer validation checks failing.
+ */
+#define PRINTK_MAX_RECURSION 3
+
+/*
+ * Return a pointer to the dedicated counter for the CPU+context of the
+ * caller.
+ */
+static u8 *printk_recursion_counter(void)
+{
+#ifdef CONFIG_HAVE_NMI
+	if (in_nmi()) {
+		if (printk_percpu_data_ready())
+			return this_cpu_ptr(&printk_count_nmi);
+		return &printk_count_nmi_early;
+	}
+#endif
+	if (printk_percpu_data_ready())
+		return this_cpu_ptr(&printk_count);
+	return &printk_count_early;
+}
+
+/*
+ * Enter recursion tracking. Interrupts are disabled to simplify tracking.
+ * The caller must check the return value to see if the recursion is allowed.
+ * On failure, interrupts are not disabled.
+ */
+static bool printk_enter_irqsave(unsigned long *flags)
+{
+	u8 *count;
+
+	local_irq_save(*flags);
+	count = printk_recursion_counter();
+	if (*count > PRINTK_MAX_RECURSION) {
+		local_irq_restore(*flags);
+		return false;
+	}
+	(*count)++;
+
+	return true;
+}
+
+/* Exit recursion tracking, restoring interrupts. */
+static void printk_exit_irqrestore(unsigned long flags)
+{
+	u8 *count;
+
+	count = printk_recursion_counter();
+	(*count)--;
+	local_irq_restore(flags);
+}
+
 int printk_delay_msec __read_mostly;
 
 static inline void printk_delay(void)
@@ -2040,11 +2108,13 @@ int vprintk_store(int facility, int level,
 	struct prb_reserved_entry e;
 	enum log_flags lflags = 0;
 	struct printk_record r;
+	unsigned long irqflags;
 	u16 trunc_msg_len = 0;
 	char prefix_buf[8];
 	u16 reserve_size;
 	va_list args2;
 	u16 text_len;
+	int ret = 0;
 	u64 ts_nsec;
 
 	/*
@@ -2055,6 +2125,9 @@ int vprintk_store(int facility, int level,
 	 */
 	ts_nsec = local_clock();
 
+	if (!printk_enter_irqsave(&irqflags))
+		return 0;
+
 	/*
 	 * The sprintf needs to come first since the syslog prefix might be
 	 * passed in as a parameter. An extra byte must be reserved so that
@@ -2092,7 +2165,8 @@ int vprintk_store(int facility, int level,
 				prb_commit(&e);
 			}
 
-			return text_len;
+			ret = text_len;
+			goto out;
 		}
 	}
 
@@ -2108,7 +2182,7 @@ int vprintk_store(int facility, int level,
 
 		prb_rec_init_wr(&r, reserve_size + trunc_msg_len);
 		if (!prb_reserve(&e, prb, &r))
-			return 0;
+			goto out;
 	}
 
 	/* fill message */
@@ -2130,7 +2204,10 @@ int vprintk_store(int facility, int level,
 	else
 		prb_final_commit(&e);
 
-	return (text_len + trunc_msg_len);
+	ret = text_len + trunc_msg_len;
+out:
+	printk_exit_irqrestore(irqflags);
+	return ret;
 }
 
 asmlinkage int vprintk_emit(int facility, int level,
-- 
2.20.1


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH printk v2 2/5] printk: remove safe buffers
  2021-03-30 15:35 ` John Ogness
@ 2021-03-30 15:35   ` John Ogness
  -1 siblings, 0 replies; 31+ messages in thread
From: John Ogness @ 2021-03-30 15:35 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Steven Rostedt,
	Thomas Gleixner, linux-kernel, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, Eric Biederman,
	Christophe Leroy, Nicholas Piggin, Alistair Popple,
	Jordan Niethe, Peter Zijlstra, Aneesh Kumar K.V, Andrew Morton,
	Kees Cook, Tiezhu Yang, Alexey Kardashevskiy, Yue Hu,
	Rafael Aquini, Guilherme G. Piccoli, Paul E. McKenney,
	linuxppc-dev, kexec

With @logbuf_lock removed, the high level printk functions for
storing messages are lockless. Messages can be stored from any
context, so there is no need for the NMI and safe buffers anymore.
Remove the NMI and safe buffers.

Although the safe buffers are removed, the NMI and safe context
tracking is still in place. In these contexts, store the message
immediately but still use irq_work to defer the console printing.

Since printk recursion tracking is in place, safe context tracking
for most of printk is not needed. Remove it. Only safe context
tracking relating to the console lock is left in place. This is
because the console lock is needed for the actual printing.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
---
 Note: The follow-up patch removes the NMI tracking.

 arch/powerpc/kernel/traps.c    |   1 -
 arch/powerpc/kernel/watchdog.c |   5 -
 include/linux/printk.h         |  10 -
 kernel/kexec_core.c            |   1 -
 kernel/panic.c                 |   3 -
 kernel/printk/internal.h       |  17 --
 kernel/printk/printk.c         | 137 +++++---------
 kernel/printk/printk_safe.c    | 333 +--------------------------------
 lib/nmi_backtrace.c            |   6 -
 9 files changed, 56 insertions(+), 457 deletions(-)

diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 3ec7b443fe6b..7d2b339afcb0 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -170,7 +170,6 @@ extern void panic_flush_kmsg_start(void)
 
 extern void panic_flush_kmsg_end(void)
 {
-	printk_safe_flush_on_panic();
 	kmsg_dump(KMSG_DUMP_PANIC);
 	bust_spinlocks(0);
 	debug_locks_off();
diff --git a/arch/powerpc/kernel/watchdog.c b/arch/powerpc/kernel/watchdog.c
index af3c15a1d41e..8ae46c5945d0 100644
--- a/arch/powerpc/kernel/watchdog.c
+++ b/arch/powerpc/kernel/watchdog.c
@@ -181,11 +181,6 @@ static void watchdog_smp_panic(int cpu, u64 tb)
 
 	wd_smp_unlock(&flags);
 
-	printk_safe_flush();
-	/*
-	 * printk_safe_flush() seems to require another print
-	 * before anything actually goes out to console.
-	 */
 	if (sysctl_hardlockup_all_cpu_backtrace)
 		trigger_allbutself_cpu_backtrace();
 
diff --git a/include/linux/printk.h b/include/linux/printk.h
index fe7eb2351610..2476796c1150 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -207,8 +207,6 @@ __printf(1, 2) void dump_stack_set_arch_desc(const char *fmt, ...);
 void dump_stack_print_info(const char *log_lvl);
 void show_regs_print_info(const char *log_lvl);
 extern asmlinkage void dump_stack(void) __cold;
-extern void printk_safe_flush(void);
-extern void printk_safe_flush_on_panic(void);
 #else
 static inline __printf(1, 0)
 int vprintk(const char *s, va_list args)
@@ -272,14 +270,6 @@ static inline void show_regs_print_info(const char *log_lvl)
 static inline void dump_stack(void)
 {
 }
-
-static inline void printk_safe_flush(void)
-{
-}
-
-static inline void printk_safe_flush_on_panic(void)
-{
-}
 #endif
 
 extern int kptr_restrict;
diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index a0b6780740c8..480d5f77ef4f 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -977,7 +977,6 @@ void crash_kexec(struct pt_regs *regs)
 	old_cpu = atomic_cmpxchg(&panic_cpu, PANIC_CPU_INVALID, this_cpu);
 	if (old_cpu == PANIC_CPU_INVALID) {
 		/* This is the 1st CPU which comes here, so go ahead. */
-		printk_safe_flush_on_panic();
 		__crash_kexec(regs);
 
 		/*
diff --git a/kernel/panic.c b/kernel/panic.c
index 332736a72a58..1f0df42f8d0c 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -247,7 +247,6 @@ void panic(const char *fmt, ...)
 	 * Bypass the panic_cpu check and call __crash_kexec directly.
 	 */
 	if (!_crash_kexec_post_notifiers) {
-		printk_safe_flush_on_panic();
 		__crash_kexec(NULL);
 
 		/*
@@ -271,8 +270,6 @@ void panic(const char *fmt, ...)
 	 */
 	atomic_notifier_call_chain(&panic_notifier_list, 0, buf);
 
-	/* Call flush even twice. It tries harder with a single online CPU */
-	printk_safe_flush_on_panic();
 	kmsg_dump(KMSG_DUMP_PANIC);
 
 	/*
diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h
index 51615c909b2f..6cc35c5de890 100644
--- a/kernel/printk/internal.h
+++ b/kernel/printk/internal.h
@@ -22,7 +22,6 @@ __printf(1, 0) int vprintk_deferred(const char *fmt, va_list args);
 void __printk_safe_enter(void);
 void __printk_safe_exit(void);
 
-void printk_safe_init(void);
 bool printk_percpu_data_ready(void);
 
 #define printk_safe_enter_irqsave(flags)	\
@@ -37,18 +36,6 @@ bool printk_percpu_data_ready(void);
 		local_irq_restore(flags);	\
 	} while (0)
 
-#define printk_safe_enter_irq()		\
-	do {					\
-		local_irq_disable();		\
-		__printk_safe_enter();		\
-	} while (0)
-
-#define printk_safe_exit_irq()			\
-	do {					\
-		__printk_safe_exit();		\
-		local_irq_enable();		\
-	} while (0)
-
 void defer_console_output(void);
 
 #else
@@ -61,9 +48,5 @@ void defer_console_output(void);
 #define printk_safe_enter_irqsave(flags) local_irq_save(flags)
 #define printk_safe_exit_irqrestore(flags) local_irq_restore(flags)
 
-#define printk_safe_enter_irq() local_irq_disable()
-#define printk_safe_exit_irq() local_irq_enable()
-
-static inline void printk_safe_init(void) { }
 static inline bool printk_percpu_data_ready(void) { return false; }
 #endif /* CONFIG_PRINTK */
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index e971c0a9ec9e..f090d6a1b39e 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -732,27 +732,22 @@ static ssize_t devkmsg_read(struct file *file, char __user *buf,
 	if (ret)
 		return ret;
 
-	printk_safe_enter_irq();
 	if (!prb_read_valid(prb, atomic64_read(&user->seq), r)) {
 		if (file->f_flags & O_NONBLOCK) {
 			ret = -EAGAIN;
-			printk_safe_exit_irq();
 			goto out;
 		}
 
-		printk_safe_exit_irq();
 		ret = wait_event_interruptible(log_wait,
 				prb_read_valid(prb, atomic64_read(&user->seq), r));
 		if (ret)
 			goto out;
-		printk_safe_enter_irq();
 	}
 
 	if (r->info->seq != atomic64_read(&user->seq)) {
 		/* our last seen message is gone, return error and reset */
 		atomic64_set(&user->seq, r->info->seq);
 		ret = -EPIPE;
-		printk_safe_exit_irq();
 		goto out;
 	}
 
@@ -762,7 +757,6 @@ static ssize_t devkmsg_read(struct file *file, char __user *buf,
 				  &r->info->dev_info);
 
 	atomic64_set(&user->seq, r->info->seq + 1);
-	printk_safe_exit_irq();
 
 	if (len > count) {
 		ret = -EINVAL;
@@ -797,7 +791,6 @@ static loff_t devkmsg_llseek(struct file *file, loff_t offset, int whence)
 	if (offset)
 		return -ESPIPE;
 
-	printk_safe_enter_irq();
 	switch (whence) {
 	case SEEK_SET:
 		/* the first record */
@@ -818,7 +811,6 @@ static loff_t devkmsg_llseek(struct file *file, loff_t offset, int whence)
 	default:
 		ret = -EINVAL;
 	}
-	printk_safe_exit_irq();
 	return ret;
 }
 
@@ -833,7 +825,6 @@ static __poll_t devkmsg_poll(struct file *file, poll_table *wait)
 
 	poll_wait(file, &log_wait, wait);
 
-	printk_safe_enter_irq();
 	if (prb_read_valid_info(prb, atomic64_read(&user->seq), &info, NULL)) {
 		/* return error when data has vanished underneath us */
 		if (info.seq != atomic64_read(&user->seq))
@@ -841,7 +832,6 @@ static __poll_t devkmsg_poll(struct file *file, poll_table *wait)
 		else
 			ret = EPOLLIN|EPOLLRDNORM;
 	}
-	printk_safe_exit_irq();
 
 	return ret;
 }
@@ -874,9 +864,7 @@ static int devkmsg_open(struct inode *inode, struct file *file)
 	prb_rec_init_rd(&user->record, &user->info,
 			&user->text_buf[0], sizeof(user->text_buf));
 
-	printk_safe_enter_irq();
 	atomic64_set(&user->seq, prb_first_valid_seq(prb));
-	printk_safe_exit_irq();
 
 	file->private_data = user;
 	return 0;
@@ -1042,9 +1030,6 @@ static inline void log_buf_add_cpu(void) {}
 
 static void __init set_percpu_data_ready(void)
 {
-	printk_safe_init();
-	/* Make sure we set this flag only after printk_safe() init is done */
-	barrier();
 	__printk_percpu_data_ready = true;
 }
 
@@ -1082,6 +1067,7 @@ void __init setup_log_buf(int early)
 	struct prb_desc *new_descs;
 	struct printk_info info;
 	struct printk_record r;
+	unsigned int text_size;
 	size_t new_descs_size;
 	size_t new_infos_size;
 	unsigned long flags;
@@ -1142,24 +1128,37 @@ void __init setup_log_buf(int early)
 		 new_descs, ilog2(new_descs_count),
 		 new_infos);
 
-	printk_safe_enter_irqsave(flags);
+	local_irq_save(flags);
 
 	log_buf_len = new_log_buf_len;
 	log_buf = new_log_buf;
 	new_log_buf_len = 0;
 
 	free = __LOG_BUF_LEN;
-	prb_for_each_record(0, &printk_rb_static, seq, &r)
-		free -= add_to_rb(&printk_rb_dynamic, &r);
+	prb_for_each_record(0, &printk_rb_static, seq, &r) {
+		text_size = add_to_rb(&printk_rb_dynamic, &r);
+		if (text_size > free)
+			free = 0;
+		else
+			free -= text_size;
+	}
 
-	/*
-	 * This is early enough that everything is still running on the
-	 * boot CPU and interrupts are disabled. So no new messages will
-	 * appear during the transition to the dynamic buffer.
-	 */
 	prb = &printk_rb_dynamic;
 
-	printk_safe_exit_irqrestore(flags);
+	local_irq_restore(flags);
+
+	/*
+	 * Copy any remaining messages that might have appeared from
+	 * NMI context after copying but before switching to the
+	 * dynamic buffer.
+	 */
+	prb_for_each_record(seq, &printk_rb_static, seq, &r) {
+		text_size = add_to_rb(&printk_rb_dynamic, &r);
+		if (text_size > free)
+			free = 0;
+		else
+			free -= text_size;
+	}
 
 	if (seq != prb_next_seq(&printk_rb_static)) {
 		pr_err("dropped %llu messages\n",
@@ -1498,11 +1497,9 @@ static int syslog_print(char __user *buf, int size)
 		size_t n;
 		size_t skip;
 
-		printk_safe_enter_irq();
-		raw_spin_lock(&syslog_lock);
+		raw_spin_lock_irq(&syslog_lock);
 		if (!prb_read_valid(prb, syslog_seq, &r)) {
-			raw_spin_unlock(&syslog_lock);
-			printk_safe_exit_irq();
+			raw_spin_unlock_irq(&syslog_lock);
 			break;
 		}
 		if (r.info->seq != syslog_seq) {
@@ -1531,8 +1528,7 @@ static int syslog_print(char __user *buf, int size)
 			syslog_partial += n;
 		} else
 			n = 0;
-		raw_spin_unlock(&syslog_lock);
-		printk_safe_exit_irq();
+		raw_spin_unlock_irq(&syslog_lock);
 
 		if (!n)
 			break;
@@ -1566,7 +1562,6 @@ static int syslog_print_all(char __user *buf, int size, bool clear)
 		return -ENOMEM;
 
 	time = printk_time;
-	printk_safe_enter_irq();
 	/*
 	 * Find first record that fits, including all following records,
 	 * into the user-provided buffer for this dump.
@@ -1587,23 +1582,20 @@ static int syslog_print_all(char __user *buf, int size, bool clear)
 			break;
 		}
 
-		printk_safe_exit_irq();
 		if (copy_to_user(buf + len, text, textlen))
 			len = -EFAULT;
 		else
 			len += textlen;
-		printk_safe_enter_irq();
 
 		if (len < 0)
 			break;
 	}
 
 	if (clear) {
-		raw_spin_lock(&syslog_lock);
+		raw_spin_lock_irq(&syslog_lock);
 		latched_seq_write(&clear_seq, seq);
-		raw_spin_unlock(&syslog_lock);
+		raw_spin_unlock_irq(&syslog_lock);
 	}
-	printk_safe_exit_irq();
 
 	kfree(text);
 	return len;
@@ -1611,11 +1603,9 @@ static int syslog_print_all(char __user *buf, int size, bool clear)
 
 static void syslog_clear(void)
 {
-	printk_safe_enter_irq();
-	raw_spin_lock(&syslog_lock);
+	raw_spin_lock_irq(&syslog_lock);
 	latched_seq_write(&clear_seq, prb_next_seq(prb));
-	raw_spin_unlock(&syslog_lock);
-	printk_safe_exit_irq();
+	raw_spin_unlock_irq(&syslog_lock);
 }
 
 /* Return a consistent copy of @syslog_seq. */
@@ -1703,12 +1693,10 @@ int do_syslog(int type, char __user *buf, int len, int source)
 		break;
 	/* Number of chars in the log buffer */
 	case SYSLOG_ACTION_SIZE_UNREAD:
-		printk_safe_enter_irq();
-		raw_spin_lock(&syslog_lock);
+		raw_spin_lock_irq(&syslog_lock);
 		if (!prb_read_valid_info(prb, syslog_seq, &info, NULL)) {
 			/* No unread messages. */
-			raw_spin_unlock(&syslog_lock);
-			printk_safe_exit_irq();
+			raw_spin_unlock_irq(&syslog_lock);
 			return 0;
 		}
 		if (info.seq != syslog_seq) {
@@ -1736,8 +1724,7 @@ int do_syslog(int type, char __user *buf, int len, int source)
 			}
 			error -= syslog_partial;
 		}
-		raw_spin_unlock(&syslog_lock);
-		printk_safe_exit_irq();
+		raw_spin_unlock_irq(&syslog_lock);
 		break;
 	/* Size of the log buffer */
 	case SYSLOG_ACTION_SIZE_BUFFER:
@@ -1772,16 +1759,21 @@ static struct task_struct *console_owner;
 static bool console_waiter;
 
 /**
- * console_lock_spinning_enable - mark beginning of code where another
+ * console_lock_spinning_enable_irqsave - mark beginning of code where another
  *	thread might safely busy wait
  *
  * This basically converts console_lock into a spinlock. This marks
  * the section where the console_lock owner can not sleep, because
  * there may be a waiter spinning (like a spinlock). Also it must be
  * ready to hand over the lock at the end of the section.
+ *
+ * This disables interrupts because the hand over to a waiter must not be
+ * interrupted until the hand over is completed (@console_waiter is cleared).
  */
-static void console_lock_spinning_enable(void)
+static void console_lock_spinning_enable_irqsave(unsigned long *flags)
 {
+	local_irq_save(*flags);
+
 	raw_spin_lock(&console_owner_lock);
 	console_owner = current;
 	raw_spin_unlock(&console_owner_lock);
@@ -1791,8 +1783,8 @@ static void console_lock_spinning_enable(void)
 }
 
 /**
- * console_lock_spinning_disable_and_check - mark end of code where another
- *	thread was able to busy wait and check if there is a waiter
+ * console_lock_spinning_disable_and_check_irqrestore - mark end of code where
+ *	another thread was able to busy wait and check if there is a waiter
  *
  * This is called at the end of the section where spinning is allowed.
  * It has two functions. First, it is a signal that it is no longer
@@ -1805,7 +1797,7 @@ static void console_lock_spinning_enable(void)
  *
  * Return: 1 if the lock rights were passed, 0 otherwise.
  */
-static int console_lock_spinning_disable_and_check(void)
+static int console_lock_spinning_disable_and_check_irqrestore(unsigned long flags)
 {
 	int waiter;
 
@@ -1816,6 +1808,7 @@ static int console_lock_spinning_disable_and_check(void)
 
 	if (!waiter) {
 		spin_release(&console_owner_dep_map, _THIS_IP_);
+		local_irq_restore(flags);
 		return 0;
 	}
 
@@ -1829,6 +1822,7 @@ static int console_lock_spinning_disable_and_check(void)
 	 * the up(). After this, the waiter is the console_lock owner.
 	 */
 	mutex_release(&console_lock_dep_map, _THIS_IP_);
+	local_irq_restore(flags);
 	return 1;
 }
 
@@ -1852,7 +1846,7 @@ static int console_trylock_spinning(void)
 	if (console_trylock())
 		return 1;
 
-	printk_safe_enter_irqsave(flags);
+	local_irq_save(flags);
 
 	raw_spin_lock(&console_owner_lock);
 	owner = READ_ONCE(console_owner);
@@ -1873,7 +1867,7 @@ static int console_trylock_spinning(void)
 	 * that active printer isn't us (recursive printk?).
 	 */
 	if (!spin) {
-		printk_safe_exit_irqrestore(flags);
+		local_irq_restore(flags);
 		return 0;
 	}
 
@@ -1884,7 +1878,7 @@ static int console_trylock_spinning(void)
 		cpu_relax();
 	spin_release(&console_owner_dep_map, _THIS_IP_);
 
-	printk_safe_exit_irqrestore(flags);
+	local_irq_restore(flags);
 	/*
 	 * The owner passed the console lock to us.
 	 * Since we did not spin on console lock, annotate
@@ -2216,7 +2210,6 @@ asmlinkage int vprintk_emit(int facility, int level,
 {
 	int printed_len;
 	bool in_sched = false;
-	unsigned long flags;
 
 	/* Suppress unimportant messages after panic happens */
 	if (unlikely(suppress_printk))
@@ -2230,9 +2223,7 @@ asmlinkage int vprintk_emit(int facility, int level,
 	boot_delay_msec(level);
 	printk_delay();
 
-	printk_safe_enter_irqsave(flags);
 	printed_len = vprintk_store(facility, level, dev_info, fmt, args);
-	printk_safe_exit_irqrestore(flags);
 
 	/* If called from the scheduler, we can not call up(). */
 	if (!in_sched) {
@@ -2663,7 +2654,6 @@ void console_unlock(void)
 		size_t ext_len = 0;
 		size_t len;
 
-		printk_safe_enter_irqsave(flags);
 skip:
 		if (!prb_read_valid(prb, console_seq, &r))
 			break;
@@ -2714,18 +2704,14 @@ void console_unlock(void)
 		 * finish. This task can not be preempted if there is a
 		 * waiter waiting to take over.
 		 */
-		console_lock_spinning_enable();
+		console_lock_spinning_enable_irqsave(&flags);
 
 		stop_critical_timings();	/* don't trace print latency */
 		call_console_drivers(ext_text, ext_len, text, len);
 		start_critical_timings();
 
-		if (console_lock_spinning_disable_and_check()) {
-			printk_safe_exit_irqrestore(flags);
+		if (console_lock_spinning_disable_and_check_irqrestore(flags))
 			return;
-		}
-
-		printk_safe_exit_irqrestore(flags);
 
 		if (do_cond_resched)
 			cond_resched();
@@ -2742,8 +2728,6 @@ void console_unlock(void)
 	 * flush, no worries.
 	 */
 	retry = prb_read_valid(prb, console_seq, NULL);
-	printk_safe_exit_irqrestore(flags);
-
 	if (retry && console_trylock())
 		goto again;
 }
@@ -2805,13 +2789,8 @@ void console_flush_on_panic(enum con_flush_mode mode)
 	console_trylock();
 	console_may_schedule = 0;
 
-	if (mode == CONSOLE_REPLAY_ALL) {
-		unsigned long flags;
-
-		printk_safe_enter_irqsave(flags);
+	if (mode == CONSOLE_REPLAY_ALL)
 		console_seq = prb_first_valid_seq(prb);
-		printk_safe_exit_irqrestore(flags);
-	}
 	console_unlock();
 }
 
@@ -3463,14 +3442,12 @@ bool kmsg_dump_get_line(struct kmsg_dump_iter *iter, bool syslog,
 	struct printk_info info;
 	unsigned int line_count;
 	struct printk_record r;
-	unsigned long flags;
 	size_t l = 0;
 	bool ret = false;
 
 	if (iter->cur_seq < min_seq)
 		iter->cur_seq = min_seq;
 
-	printk_safe_enter_irqsave(flags);
 	prb_rec_init_rd(&r, &info, line, size);
 
 	/* Read text or count text lines? */
@@ -3491,7 +3468,6 @@ bool kmsg_dump_get_line(struct kmsg_dump_iter *iter, bool syslog,
 	iter->cur_seq = r.info->seq + 1;
 	ret = true;
 out:
-	printk_safe_exit_irqrestore(flags);
 	if (len)
 		*len = l;
 	return ret;
@@ -3523,7 +3499,6 @@ bool kmsg_dump_get_buffer(struct kmsg_dump_iter *iter, bool syslog,
 	u64 min_seq = latched_seq_read_nolock(&clear_seq);
 	struct printk_info info;
 	struct printk_record r;
-	unsigned long flags;
 	u64 seq;
 	u64 next_seq;
 	size_t len = 0;
@@ -3536,7 +3511,6 @@ bool kmsg_dump_get_buffer(struct kmsg_dump_iter *iter, bool syslog,
 	if (iter->cur_seq < min_seq)
 		iter->cur_seq = min_seq;
 
-	printk_safe_enter_irqsave(flags);
 	if (prb_read_valid_info(prb, iter->cur_seq, &info, NULL)) {
 		if (info.seq != iter->cur_seq) {
 			/* messages are gone, move to first available one */
@@ -3545,10 +3519,8 @@ bool kmsg_dump_get_buffer(struct kmsg_dump_iter *iter, bool syslog,
 	}
 
 	/* last entry */
-	if (iter->cur_seq >= iter->next_seq) {
-		printk_safe_exit_irqrestore(flags);
+	if (iter->cur_seq >= iter->next_seq)
 		goto out;
-	}
 
 	/*
 	 * Find first record that fits, including all following records,
@@ -3580,7 +3552,6 @@ bool kmsg_dump_get_buffer(struct kmsg_dump_iter *iter, bool syslog,
 
 	iter->next_seq = next_seq;
 	ret = true;
-	printk_safe_exit_irqrestore(flags);
 out:
 	if (len_out)
 		*len_out = len;
@@ -3598,12 +3569,8 @@ EXPORT_SYMBOL_GPL(kmsg_dump_get_buffer);
  */
 void kmsg_dump_rewind(struct kmsg_dump_iter *iter)
 {
-	unsigned long flags;
-
-	printk_safe_enter_irqsave(flags);
 	iter->cur_seq = latched_seq_read_nolock(&clear_seq);
 	iter->next_seq = prb_next_seq(prb);
-	printk_safe_exit_irqrestore(flags);
 }
 EXPORT_SYMBOL_GPL(kmsg_dump_rewind);
 
diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c
index 7a1414622051..4b5df5c27334 100644
--- a/kernel/printk/printk_safe.c
+++ b/kernel/printk/printk_safe.c
@@ -15,286 +15,9 @@
 
 #include "internal.h"
 
-/*
- * In NMI and safe mode, printk() avoids taking locks. Instead,
- * it uses an alternative implementation that temporary stores
- * the strings into a per-CPU buffer. The content of the buffer
- * is later flushed into the main ring buffer via IRQ work.
- *
- * The alternative implementation is chosen transparently
- * by examining current printk() context mask stored in @printk_context
- * per-CPU variable.
- *
- * The implementation allows to flush the strings also from another CPU.
- * There are situations when we want to make sure that all buffers
- * were handled or when IRQs are blocked.
- */
-
-#define SAFE_LOG_BUF_LEN ((1 << CONFIG_PRINTK_SAFE_LOG_BUF_SHIFT) -	\
-				sizeof(atomic_t) -			\
-				sizeof(atomic_t) -			\
-				sizeof(struct irq_work))
-
-struct printk_safe_seq_buf {
-	atomic_t		len;	/* length of written data */
-	atomic_t		message_lost;
-	struct irq_work		work;	/* IRQ work that flushes the buffer */
-	unsigned char		buffer[SAFE_LOG_BUF_LEN];
-};
-
-static DEFINE_PER_CPU(struct printk_safe_seq_buf, safe_print_seq);
 static DEFINE_PER_CPU(int, printk_context);
 
-static DEFINE_RAW_SPINLOCK(safe_read_lock);
-
-#ifdef CONFIG_PRINTK_NMI
-static DEFINE_PER_CPU(struct printk_safe_seq_buf, nmi_print_seq);
-#endif
-
-/* Get flushed in a more safe context. */
-static void queue_flush_work(struct printk_safe_seq_buf *s)
-{
-	if (printk_percpu_data_ready())
-		irq_work_queue(&s->work);
-}
-
-/*
- * Add a message to per-CPU context-dependent buffer. NMI and printk-safe
- * have dedicated buffers, because otherwise printk-safe preempted by
- * NMI-printk would have overwritten the NMI messages.
- *
- * The messages are flushed from irq work (or from panic()), possibly,
- * from other CPU, concurrently with printk_safe_log_store(). Should this
- * happen, printk_safe_log_store() will notice the buffer->len mismatch
- * and repeat the write.
- */
-static __printf(2, 0) int printk_safe_log_store(struct printk_safe_seq_buf *s,
-						const char *fmt, va_list args)
-{
-	int add;
-	size_t len;
-	va_list ap;
-
-again:
-	len = atomic_read(&s->len);
-
-	/* The trailing '\0' is not counted into len. */
-	if (len >= sizeof(s->buffer) - 1) {
-		atomic_inc(&s->message_lost);
-		queue_flush_work(s);
-		return 0;
-	}
-
-	/*
-	 * Make sure that all old data have been read before the buffer
-	 * was reset. This is not needed when we just append data.
-	 */
-	if (!len)
-		smp_rmb();
-
-	va_copy(ap, args);
-	add = vscnprintf(s->buffer + len, sizeof(s->buffer) - len, fmt, ap);
-	va_end(ap);
-	if (!add)
-		return 0;
-
-	/*
-	 * Do it once again if the buffer has been flushed in the meantime.
-	 * Note that atomic_cmpxchg() is an implicit memory barrier that
-	 * makes sure that the data were written before updating s->len.
-	 */
-	if (atomic_cmpxchg(&s->len, len, len + add) != len)
-		goto again;
-
-	queue_flush_work(s);
-	return add;
-}
-
-static inline void printk_safe_flush_line(const char *text, int len)
-{
-	/*
-	 * Avoid any console drivers calls from here, because we may be
-	 * in NMI or printk_safe context (when in panic). The messages
-	 * must go only into the ring buffer at this stage.  Consoles will
-	 * get explicitly called later when a crashdump is not generated.
-	 */
-	printk_deferred("%.*s", len, text);
-}
-
-/* printk part of the temporary buffer line by line */
-static int printk_safe_flush_buffer(const char *start, size_t len)
-{
-	const char *c, *end;
-	bool header;
-
-	c = start;
-	end = start + len;
-	header = true;
-
-	/* Print line by line. */
-	while (c < end) {
-		if (*c == '\n') {
-			printk_safe_flush_line(start, c - start + 1);
-			start = ++c;
-			header = true;
-			continue;
-		}
-
-		/* Handle continuous lines or missing new line. */
-		if ((c + 1 < end) && printk_get_level(c)) {
-			if (header) {
-				c = printk_skip_level(c);
-				continue;
-			}
-
-			printk_safe_flush_line(start, c - start);
-			start = c++;
-			header = true;
-			continue;
-		}
-
-		header = false;
-		c++;
-	}
-
-	/* Check if there was a partial line. Ignore pure header. */
-	if (start < end && !header) {
-		static const char newline[] = KERN_CONT "\n";
-
-		printk_safe_flush_line(start, end - start);
-		printk_safe_flush_line(newline, strlen(newline));
-	}
-
-	return len;
-}
-
-static void report_message_lost(struct printk_safe_seq_buf *s)
-{
-	int lost = atomic_xchg(&s->message_lost, 0);
-
-	if (lost)
-		printk_deferred("Lost %d message(s)!\n", lost);
-}
-
-/*
- * Flush data from the associated per-CPU buffer. The function
- * can be called either via IRQ work or independently.
- */
-static void __printk_safe_flush(struct irq_work *work)
-{
-	struct printk_safe_seq_buf *s =
-		container_of(work, struct printk_safe_seq_buf, work);
-	unsigned long flags;
-	size_t len;
-	int i;
-
-	/*
-	 * The lock has two functions. First, one reader has to flush all
-	 * available message to make the lockless synchronization with
-	 * writers easier. Second, we do not want to mix messages from
-	 * different CPUs. This is especially important when printing
-	 * a backtrace.
-	 */
-	raw_spin_lock_irqsave(&safe_read_lock, flags);
-
-	i = 0;
-more:
-	len = atomic_read(&s->len);
-
-	/*
-	 * This is just a paranoid check that nobody has manipulated
-	 * the buffer an unexpected way. If we printed something then
-	 * @len must only increase. Also it should never overflow the
-	 * buffer size.
-	 */
-	if ((i && i >= len) || len > sizeof(s->buffer)) {
-		const char *msg = "printk_safe_flush: internal error\n";
-
-		printk_safe_flush_line(msg, strlen(msg));
-		len = 0;
-	}
-
-	if (!len)
-		goto out; /* Someone else has already flushed the buffer. */
-
-	/* Make sure that data has been written up to the @len */
-	smp_rmb();
-	i += printk_safe_flush_buffer(s->buffer + i, len - i);
-
-	/*
-	 * Check that nothing has got added in the meantime and truncate
-	 * the buffer. Note that atomic_cmpxchg() is an implicit memory
-	 * barrier that makes sure that the data were copied before
-	 * updating s->len.
-	 */
-	if (atomic_cmpxchg(&s->len, len, 0) != len)
-		goto more;
-
-out:
-	report_message_lost(s);
-	raw_spin_unlock_irqrestore(&safe_read_lock, flags);
-}
-
-/**
- * printk_safe_flush - flush all per-cpu nmi buffers.
- *
- * The buffers are flushed automatically via IRQ work. This function
- * is useful only when someone wants to be sure that all buffers have
- * been flushed at some point.
- */
-void printk_safe_flush(void)
-{
-	int cpu;
-
-	for_each_possible_cpu(cpu) {
-#ifdef CONFIG_PRINTK_NMI
-		__printk_safe_flush(&per_cpu(nmi_print_seq, cpu).work);
-#endif
-		__printk_safe_flush(&per_cpu(safe_print_seq, cpu).work);
-	}
-}
-
-/**
- * printk_safe_flush_on_panic - flush all per-cpu nmi buffers when the system
- *	goes down.
- *
- * Similar to printk_safe_flush() but it can be called even in NMI context when
- * the system goes down. It does the best effort to get NMI messages into
- * the main ring buffer.
- *
- * Note that it could try harder when there is only one CPU online.
- */
-void printk_safe_flush_on_panic(void)
-{
-	/*
-	 * Make sure that we could access the safe buffers.
-	 * Do not risk a double release when more CPUs are up.
-	 */
-	if (raw_spin_is_locked(&safe_read_lock)) {
-		if (num_online_cpus() > 1)
-			return;
-
-		debug_locks_off();
-		raw_spin_lock_init(&safe_read_lock);
-	}
-
-	printk_safe_flush();
-}
-
 #ifdef CONFIG_PRINTK_NMI
-/*
- * Safe printk() for NMI context. It uses a per-CPU buffer to
- * store the message. NMIs are not nested, so there is always only
- * one writer running. But the buffer might get flushed from another
- * CPU, so we need to be careful.
- */
-static __printf(1, 0) int vprintk_nmi(const char *fmt, va_list args)
-{
-	struct printk_safe_seq_buf *s = this_cpu_ptr(&nmi_print_seq);
-
-	return printk_safe_log_store(s, fmt, args);
-}
-
 void noinstr printk_nmi_enter(void)
 {
 	this_cpu_add(printk_context, PRINTK_NMI_CONTEXT_OFFSET);
@@ -309,9 +32,6 @@ void noinstr printk_nmi_exit(void)
  * Marks a code that might produce many messages in NMI context
  * and the risk of losing them is more critical than eventual
  * reordering.
- *
- * It has effect only when called in NMI context. Then printk()
- * will store the messages into the main logbuf directly.
  */
 void printk_nmi_direct_enter(void)
 {
@@ -324,27 +44,8 @@ void printk_nmi_direct_exit(void)
 	this_cpu_and(printk_context, ~PRINTK_NMI_DIRECT_CONTEXT_MASK);
 }
 
-#else
-
-static __printf(1, 0) int vprintk_nmi(const char *fmt, va_list args)
-{
-	return 0;
-}
-
 #endif /* CONFIG_PRINTK_NMI */
 
-/*
- * Lock-less printk(), to avoid deadlocks should the printk() recurse
- * into itself. It uses a per-CPU buffer to store the message, just like
- * NMI.
- */
-static __printf(1, 0) int vprintk_safe(const char *fmt, va_list args)
-{
-	struct printk_safe_seq_buf *s = this_cpu_ptr(&safe_print_seq);
-
-	return printk_safe_log_store(s, fmt, args);
-}
-
 /* Can be preempted by NMI. */
 void __printk_safe_enter(void)
 {
@@ -369,7 +70,10 @@ asmlinkage int vprintk(const char *fmt, va_list args)
 	 * Use the main logbuf even in NMI. But avoid calling console
 	 * drivers that might have their own locks.
 	 */
-	if ((this_cpu_read(printk_context) & PRINTK_NMI_DIRECT_CONTEXT_MASK)) {
+	if (this_cpu_read(printk_context) &
+	    (PRINTK_NMI_DIRECT_CONTEXT_MASK |
+	     PRINTK_NMI_CONTEXT_MASK |
+	     PRINTK_SAFE_CONTEXT_MASK)) {
 		unsigned long flags;
 		int len;
 
@@ -380,35 +84,6 @@ asmlinkage int vprintk(const char *fmt, va_list args)
 		return len;
 	}
 
-	/* Use extra buffer in NMI. */
-	if (this_cpu_read(printk_context) & PRINTK_NMI_CONTEXT_MASK)
-		return vprintk_nmi(fmt, args);
-
-	/* Use extra buffer to prevent a recursion deadlock in safe mode. */
-	if (this_cpu_read(printk_context) & PRINTK_SAFE_CONTEXT_MASK)
-		return vprintk_safe(fmt, args);
-
 	/* No obstacles. */
 	return vprintk_default(fmt, args);
 }
-
-void __init printk_safe_init(void)
-{
-	int cpu;
-
-	for_each_possible_cpu(cpu) {
-		struct printk_safe_seq_buf *s;
-
-		s = &per_cpu(safe_print_seq, cpu);
-		init_irq_work(&s->work, __printk_safe_flush);
-
-#ifdef CONFIG_PRINTK_NMI
-		s = &per_cpu(nmi_print_seq, cpu);
-		init_irq_work(&s->work, __printk_safe_flush);
-#endif
-	}
-
-	/* Flush pending messages that did not have scheduled IRQ works. */
-	printk_safe_flush();
-}
-EXPORT_SYMBOL(vprintk);
diff --git a/lib/nmi_backtrace.c b/lib/nmi_backtrace.c
index 8abe1870dba4..b09a490f5f70 100644
--- a/lib/nmi_backtrace.c
+++ b/lib/nmi_backtrace.c
@@ -75,12 +75,6 @@ void nmi_trigger_cpumask_backtrace(const cpumask_t *mask,
 		touch_softlockup_watchdog();
 	}
 
-	/*
-	 * Force flush any remote buffers that might be stuck in IRQ context
-	 * and therefore could not run their irq_work.
-	 */
-	printk_safe_flush();
-
 	clear_bit_unlock(0, &backtrace_flag);
 	put_cpu();
 }
-- 
2.20.1


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH printk v2 2/5] printk: remove safe buffers
@ 2021-03-30 15:35   ` John Ogness
  0 siblings, 0 replies; 31+ messages in thread
From: John Ogness @ 2021-03-30 15:35 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Alexey Kardashevskiy, Paul Mackerras,
	Tiezhu Yang, Rafael Aquini, Aneesh Kumar K.V, Peter Zijlstra,
	Yue Hu, Jordan Niethe, Kees Cook, Paul E. McKenney,
	Alistair Popple, Guilherme G. Piccoli, Nicholas Piggin,
	Steven Rostedt, Thomas Gleixner, kexec, linux-kernel,
	Sergey Senozhatsky, Eric Biederman, Andrew Morton, linuxppc-dev

With @logbuf_lock removed, the high level printk functions for
storing messages are lockless. Messages can be stored from any
context, so there is no need for the NMI and safe buffers anymore.
Remove the NMI and safe buffers.

Although the safe buffers are removed, the NMI and safe context
tracking is still in place. In these contexts, store the message
immediately but still use irq_work to defer the console printing.

Since printk recursion tracking is in place, safe context tracking
for most of printk is not needed. Remove it. Only safe context
tracking relating to the console lock is left in place. This is
because the console lock is needed for the actual printing.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
---
 Note: The follow-up patch removes the NMI tracking.

 arch/powerpc/kernel/traps.c    |   1 -
 arch/powerpc/kernel/watchdog.c |   5 -
 include/linux/printk.h         |  10 -
 kernel/kexec_core.c            |   1 -
 kernel/panic.c                 |   3 -
 kernel/printk/internal.h       |  17 --
 kernel/printk/printk.c         | 137 +++++---------
 kernel/printk/printk_safe.c    | 333 +--------------------------------
 lib/nmi_backtrace.c            |   6 -
 9 files changed, 56 insertions(+), 457 deletions(-)

diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 3ec7b443fe6b..7d2b339afcb0 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -170,7 +170,6 @@ extern void panic_flush_kmsg_start(void)
 
 extern void panic_flush_kmsg_end(void)
 {
-	printk_safe_flush_on_panic();
 	kmsg_dump(KMSG_DUMP_PANIC);
 	bust_spinlocks(0);
 	debug_locks_off();
diff --git a/arch/powerpc/kernel/watchdog.c b/arch/powerpc/kernel/watchdog.c
index af3c15a1d41e..8ae46c5945d0 100644
--- a/arch/powerpc/kernel/watchdog.c
+++ b/arch/powerpc/kernel/watchdog.c
@@ -181,11 +181,6 @@ static void watchdog_smp_panic(int cpu, u64 tb)
 
 	wd_smp_unlock(&flags);
 
-	printk_safe_flush();
-	/*
-	 * printk_safe_flush() seems to require another print
-	 * before anything actually goes out to console.
-	 */
 	if (sysctl_hardlockup_all_cpu_backtrace)
 		trigger_allbutself_cpu_backtrace();
 
diff --git a/include/linux/printk.h b/include/linux/printk.h
index fe7eb2351610..2476796c1150 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -207,8 +207,6 @@ __printf(1, 2) void dump_stack_set_arch_desc(const char *fmt, ...);
 void dump_stack_print_info(const char *log_lvl);
 void show_regs_print_info(const char *log_lvl);
 extern asmlinkage void dump_stack(void) __cold;
-extern void printk_safe_flush(void);
-extern void printk_safe_flush_on_panic(void);
 #else
 static inline __printf(1, 0)
 int vprintk(const char *s, va_list args)
@@ -272,14 +270,6 @@ static inline void show_regs_print_info(const char *log_lvl)
 static inline void dump_stack(void)
 {
 }
-
-static inline void printk_safe_flush(void)
-{
-}
-
-static inline void printk_safe_flush_on_panic(void)
-{
-}
 #endif
 
 extern int kptr_restrict;
diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index a0b6780740c8..480d5f77ef4f 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -977,7 +977,6 @@ void crash_kexec(struct pt_regs *regs)
 	old_cpu = atomic_cmpxchg(&panic_cpu, PANIC_CPU_INVALID, this_cpu);
 	if (old_cpu == PANIC_CPU_INVALID) {
 		/* This is the 1st CPU which comes here, so go ahead. */
-		printk_safe_flush_on_panic();
 		__crash_kexec(regs);
 
 		/*
diff --git a/kernel/panic.c b/kernel/panic.c
index 332736a72a58..1f0df42f8d0c 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -247,7 +247,6 @@ void panic(const char *fmt, ...)
 	 * Bypass the panic_cpu check and call __crash_kexec directly.
 	 */
 	if (!_crash_kexec_post_notifiers) {
-		printk_safe_flush_on_panic();
 		__crash_kexec(NULL);
 
 		/*
@@ -271,8 +270,6 @@ void panic(const char *fmt, ...)
 	 */
 	atomic_notifier_call_chain(&panic_notifier_list, 0, buf);
 
-	/* Call flush even twice. It tries harder with a single online CPU */
-	printk_safe_flush_on_panic();
 	kmsg_dump(KMSG_DUMP_PANIC);
 
 	/*
diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h
index 51615c909b2f..6cc35c5de890 100644
--- a/kernel/printk/internal.h
+++ b/kernel/printk/internal.h
@@ -22,7 +22,6 @@ __printf(1, 0) int vprintk_deferred(const char *fmt, va_list args);
 void __printk_safe_enter(void);
 void __printk_safe_exit(void);
 
-void printk_safe_init(void);
 bool printk_percpu_data_ready(void);
 
 #define printk_safe_enter_irqsave(flags)	\
@@ -37,18 +36,6 @@ bool printk_percpu_data_ready(void);
 		local_irq_restore(flags);	\
 	} while (0)
 
-#define printk_safe_enter_irq()		\
-	do {					\
-		local_irq_disable();		\
-		__printk_safe_enter();		\
-	} while (0)
-
-#define printk_safe_exit_irq()			\
-	do {					\
-		__printk_safe_exit();		\
-		local_irq_enable();		\
-	} while (0)
-
 void defer_console_output(void);
 
 #else
@@ -61,9 +48,5 @@ void defer_console_output(void);
 #define printk_safe_enter_irqsave(flags) local_irq_save(flags)
 #define printk_safe_exit_irqrestore(flags) local_irq_restore(flags)
 
-#define printk_safe_enter_irq() local_irq_disable()
-#define printk_safe_exit_irq() local_irq_enable()
-
-static inline void printk_safe_init(void) { }
 static inline bool printk_percpu_data_ready(void) { return false; }
 #endif /* CONFIG_PRINTK */
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index e971c0a9ec9e..f090d6a1b39e 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -732,27 +732,22 @@ static ssize_t devkmsg_read(struct file *file, char __user *buf,
 	if (ret)
 		return ret;
 
-	printk_safe_enter_irq();
 	if (!prb_read_valid(prb, atomic64_read(&user->seq), r)) {
 		if (file->f_flags & O_NONBLOCK) {
 			ret = -EAGAIN;
-			printk_safe_exit_irq();
 			goto out;
 		}
 
-		printk_safe_exit_irq();
 		ret = wait_event_interruptible(log_wait,
 				prb_read_valid(prb, atomic64_read(&user->seq), r));
 		if (ret)
 			goto out;
-		printk_safe_enter_irq();
 	}
 
 	if (r->info->seq != atomic64_read(&user->seq)) {
 		/* our last seen message is gone, return error and reset */
 		atomic64_set(&user->seq, r->info->seq);
 		ret = -EPIPE;
-		printk_safe_exit_irq();
 		goto out;
 	}
 
@@ -762,7 +757,6 @@ static ssize_t devkmsg_read(struct file *file, char __user *buf,
 				  &r->info->dev_info);
 
 	atomic64_set(&user->seq, r->info->seq + 1);
-	printk_safe_exit_irq();
 
 	if (len > count) {
 		ret = -EINVAL;
@@ -797,7 +791,6 @@ static loff_t devkmsg_llseek(struct file *file, loff_t offset, int whence)
 	if (offset)
 		return -ESPIPE;
 
-	printk_safe_enter_irq();
 	switch (whence) {
 	case SEEK_SET:
 		/* the first record */
@@ -818,7 +811,6 @@ static loff_t devkmsg_llseek(struct file *file, loff_t offset, int whence)
 	default:
 		ret = -EINVAL;
 	}
-	printk_safe_exit_irq();
 	return ret;
 }
 
@@ -833,7 +825,6 @@ static __poll_t devkmsg_poll(struct file *file, poll_table *wait)
 
 	poll_wait(file, &log_wait, wait);
 
-	printk_safe_enter_irq();
 	if (prb_read_valid_info(prb, atomic64_read(&user->seq), &info, NULL)) {
 		/* return error when data has vanished underneath us */
 		if (info.seq != atomic64_read(&user->seq))
@@ -841,7 +832,6 @@ static __poll_t devkmsg_poll(struct file *file, poll_table *wait)
 		else
 			ret = EPOLLIN|EPOLLRDNORM;
 	}
-	printk_safe_exit_irq();
 
 	return ret;
 }
@@ -874,9 +864,7 @@ static int devkmsg_open(struct inode *inode, struct file *file)
 	prb_rec_init_rd(&user->record, &user->info,
 			&user->text_buf[0], sizeof(user->text_buf));
 
-	printk_safe_enter_irq();
 	atomic64_set(&user->seq, prb_first_valid_seq(prb));
-	printk_safe_exit_irq();
 
 	file->private_data = user;
 	return 0;
@@ -1042,9 +1030,6 @@ static inline void log_buf_add_cpu(void) {}
 
 static void __init set_percpu_data_ready(void)
 {
-	printk_safe_init();
-	/* Make sure we set this flag only after printk_safe() init is done */
-	barrier();
 	__printk_percpu_data_ready = true;
 }
 
@@ -1082,6 +1067,7 @@ void __init setup_log_buf(int early)
 	struct prb_desc *new_descs;
 	struct printk_info info;
 	struct printk_record r;
+	unsigned int text_size;
 	size_t new_descs_size;
 	size_t new_infos_size;
 	unsigned long flags;
@@ -1142,24 +1128,37 @@ void __init setup_log_buf(int early)
 		 new_descs, ilog2(new_descs_count),
 		 new_infos);
 
-	printk_safe_enter_irqsave(flags);
+	local_irq_save(flags);
 
 	log_buf_len = new_log_buf_len;
 	log_buf = new_log_buf;
 	new_log_buf_len = 0;
 
 	free = __LOG_BUF_LEN;
-	prb_for_each_record(0, &printk_rb_static, seq, &r)
-		free -= add_to_rb(&printk_rb_dynamic, &r);
+	prb_for_each_record(0, &printk_rb_static, seq, &r) {
+		text_size = add_to_rb(&printk_rb_dynamic, &r);
+		if (text_size > free)
+			free = 0;
+		else
+			free -= text_size;
+	}
 
-	/*
-	 * This is early enough that everything is still running on the
-	 * boot CPU and interrupts are disabled. So no new messages will
-	 * appear during the transition to the dynamic buffer.
-	 */
 	prb = &printk_rb_dynamic;
 
-	printk_safe_exit_irqrestore(flags);
+	local_irq_restore(flags);
+
+	/*
+	 * Copy any remaining messages that might have appeared from
+	 * NMI context after copying but before switching to the
+	 * dynamic buffer.
+	 */
+	prb_for_each_record(seq, &printk_rb_static, seq, &r) {
+		text_size = add_to_rb(&printk_rb_dynamic, &r);
+		if (text_size > free)
+			free = 0;
+		else
+			free -= text_size;
+	}
 
 	if (seq != prb_next_seq(&printk_rb_static)) {
 		pr_err("dropped %llu messages\n",
@@ -1498,11 +1497,9 @@ static int syslog_print(char __user *buf, int size)
 		size_t n;
 		size_t skip;
 
-		printk_safe_enter_irq();
-		raw_spin_lock(&syslog_lock);
+		raw_spin_lock_irq(&syslog_lock);
 		if (!prb_read_valid(prb, syslog_seq, &r)) {
-			raw_spin_unlock(&syslog_lock);
-			printk_safe_exit_irq();
+			raw_spin_unlock_irq(&syslog_lock);
 			break;
 		}
 		if (r.info->seq != syslog_seq) {
@@ -1531,8 +1528,7 @@ static int syslog_print(char __user *buf, int size)
 			syslog_partial += n;
 		} else
 			n = 0;
-		raw_spin_unlock(&syslog_lock);
-		printk_safe_exit_irq();
+		raw_spin_unlock_irq(&syslog_lock);
 
 		if (!n)
 			break;
@@ -1566,7 +1562,6 @@ static int syslog_print_all(char __user *buf, int size, bool clear)
 		return -ENOMEM;
 
 	time = printk_time;
-	printk_safe_enter_irq();
 	/*
 	 * Find first record that fits, including all following records,
 	 * into the user-provided buffer for this dump.
@@ -1587,23 +1582,20 @@ static int syslog_print_all(char __user *buf, int size, bool clear)
 			break;
 		}
 
-		printk_safe_exit_irq();
 		if (copy_to_user(buf + len, text, textlen))
 			len = -EFAULT;
 		else
 			len += textlen;
-		printk_safe_enter_irq();
 
 		if (len < 0)
 			break;
 	}
 
 	if (clear) {
-		raw_spin_lock(&syslog_lock);
+		raw_spin_lock_irq(&syslog_lock);
 		latched_seq_write(&clear_seq, seq);
-		raw_spin_unlock(&syslog_lock);
+		raw_spin_unlock_irq(&syslog_lock);
 	}
-	printk_safe_exit_irq();
 
 	kfree(text);
 	return len;
@@ -1611,11 +1603,9 @@ static int syslog_print_all(char __user *buf, int size, bool clear)
 
 static void syslog_clear(void)
 {
-	printk_safe_enter_irq();
-	raw_spin_lock(&syslog_lock);
+	raw_spin_lock_irq(&syslog_lock);
 	latched_seq_write(&clear_seq, prb_next_seq(prb));
-	raw_spin_unlock(&syslog_lock);
-	printk_safe_exit_irq();
+	raw_spin_unlock_irq(&syslog_lock);
 }
 
 /* Return a consistent copy of @syslog_seq. */
@@ -1703,12 +1693,10 @@ int do_syslog(int type, char __user *buf, int len, int source)
 		break;
 	/* Number of chars in the log buffer */
 	case SYSLOG_ACTION_SIZE_UNREAD:
-		printk_safe_enter_irq();
-		raw_spin_lock(&syslog_lock);
+		raw_spin_lock_irq(&syslog_lock);
 		if (!prb_read_valid_info(prb, syslog_seq, &info, NULL)) {
 			/* No unread messages. */
-			raw_spin_unlock(&syslog_lock);
-			printk_safe_exit_irq();
+			raw_spin_unlock_irq(&syslog_lock);
 			return 0;
 		}
 		if (info.seq != syslog_seq) {
@@ -1736,8 +1724,7 @@ int do_syslog(int type, char __user *buf, int len, int source)
 			}
 			error -= syslog_partial;
 		}
-		raw_spin_unlock(&syslog_lock);
-		printk_safe_exit_irq();
+		raw_spin_unlock_irq(&syslog_lock);
 		break;
 	/* Size of the log buffer */
 	case SYSLOG_ACTION_SIZE_BUFFER:
@@ -1772,16 +1759,21 @@ static struct task_struct *console_owner;
 static bool console_waiter;
 
 /**
- * console_lock_spinning_enable - mark beginning of code where another
+ * console_lock_spinning_enable_irqsave - mark beginning of code where another
  *	thread might safely busy wait
  *
  * This basically converts console_lock into a spinlock. This marks
  * the section where the console_lock owner can not sleep, because
  * there may be a waiter spinning (like a spinlock). Also it must be
  * ready to hand over the lock at the end of the section.
+ *
+ * This disables interrupts because the hand over to a waiter must not be
+ * interrupted until the hand over is completed (@console_waiter is cleared).
  */
-static void console_lock_spinning_enable(void)
+static void console_lock_spinning_enable_irqsave(unsigned long *flags)
 {
+	local_irq_save(*flags);
+
 	raw_spin_lock(&console_owner_lock);
 	console_owner = current;
 	raw_spin_unlock(&console_owner_lock);
@@ -1791,8 +1783,8 @@ static void console_lock_spinning_enable(void)
 }
 
 /**
- * console_lock_spinning_disable_and_check - mark end of code where another
- *	thread was able to busy wait and check if there is a waiter
+ * console_lock_spinning_disable_and_check_irqrestore - mark end of code where
+ *	another thread was able to busy wait and check if there is a waiter
  *
  * This is called at the end of the section where spinning is allowed.
  * It has two functions. First, it is a signal that it is no longer
@@ -1805,7 +1797,7 @@ static void console_lock_spinning_enable(void)
  *
  * Return: 1 if the lock rights were passed, 0 otherwise.
  */
-static int console_lock_spinning_disable_and_check(void)
+static int console_lock_spinning_disable_and_check_irqrestore(unsigned long flags)
 {
 	int waiter;
 
@@ -1816,6 +1808,7 @@ static int console_lock_spinning_disable_and_check(void)
 
 	if (!waiter) {
 		spin_release(&console_owner_dep_map, _THIS_IP_);
+		local_irq_restore(flags);
 		return 0;
 	}
 
@@ -1829,6 +1822,7 @@ static int console_lock_spinning_disable_and_check(void)
 	 * the up(). After this, the waiter is the console_lock owner.
 	 */
 	mutex_release(&console_lock_dep_map, _THIS_IP_);
+	local_irq_restore(flags);
 	return 1;
 }
 
@@ -1852,7 +1846,7 @@ static int console_trylock_spinning(void)
 	if (console_trylock())
 		return 1;
 
-	printk_safe_enter_irqsave(flags);
+	local_irq_save(flags);
 
 	raw_spin_lock(&console_owner_lock);
 	owner = READ_ONCE(console_owner);
@@ -1873,7 +1867,7 @@ static int console_trylock_spinning(void)
 	 * that active printer isn't us (recursive printk?).
 	 */
 	if (!spin) {
-		printk_safe_exit_irqrestore(flags);
+		local_irq_restore(flags);
 		return 0;
 	}
 
@@ -1884,7 +1878,7 @@ static int console_trylock_spinning(void)
 		cpu_relax();
 	spin_release(&console_owner_dep_map, _THIS_IP_);
 
-	printk_safe_exit_irqrestore(flags);
+	local_irq_restore(flags);
 	/*
 	 * The owner passed the console lock to us.
 	 * Since we did not spin on console lock, annotate
@@ -2216,7 +2210,6 @@ asmlinkage int vprintk_emit(int facility, int level,
 {
 	int printed_len;
 	bool in_sched = false;
-	unsigned long flags;
 
 	/* Suppress unimportant messages after panic happens */
 	if (unlikely(suppress_printk))
@@ -2230,9 +2223,7 @@ asmlinkage int vprintk_emit(int facility, int level,
 	boot_delay_msec(level);
 	printk_delay();
 
-	printk_safe_enter_irqsave(flags);
 	printed_len = vprintk_store(facility, level, dev_info, fmt, args);
-	printk_safe_exit_irqrestore(flags);
 
 	/* If called from the scheduler, we can not call up(). */
 	if (!in_sched) {
@@ -2663,7 +2654,6 @@ void console_unlock(void)
 		size_t ext_len = 0;
 		size_t len;
 
-		printk_safe_enter_irqsave(flags);
 skip:
 		if (!prb_read_valid(prb, console_seq, &r))
 			break;
@@ -2714,18 +2704,14 @@ void console_unlock(void)
 		 * finish. This task can not be preempted if there is a
 		 * waiter waiting to take over.
 		 */
-		console_lock_spinning_enable();
+		console_lock_spinning_enable_irqsave(&flags);
 
 		stop_critical_timings();	/* don't trace print latency */
 		call_console_drivers(ext_text, ext_len, text, len);
 		start_critical_timings();
 
-		if (console_lock_spinning_disable_and_check()) {
-			printk_safe_exit_irqrestore(flags);
+		if (console_lock_spinning_disable_and_check_irqrestore(flags))
 			return;
-		}
-
-		printk_safe_exit_irqrestore(flags);
 
 		if (do_cond_resched)
 			cond_resched();
@@ -2742,8 +2728,6 @@ void console_unlock(void)
 	 * flush, no worries.
 	 */
 	retry = prb_read_valid(prb, console_seq, NULL);
-	printk_safe_exit_irqrestore(flags);
-
 	if (retry && console_trylock())
 		goto again;
 }
@@ -2805,13 +2789,8 @@ void console_flush_on_panic(enum con_flush_mode mode)
 	console_trylock();
 	console_may_schedule = 0;
 
-	if (mode == CONSOLE_REPLAY_ALL) {
-		unsigned long flags;
-
-		printk_safe_enter_irqsave(flags);
+	if (mode == CONSOLE_REPLAY_ALL)
 		console_seq = prb_first_valid_seq(prb);
-		printk_safe_exit_irqrestore(flags);
-	}
 	console_unlock();
 }
 
@@ -3463,14 +3442,12 @@ bool kmsg_dump_get_line(struct kmsg_dump_iter *iter, bool syslog,
 	struct printk_info info;
 	unsigned int line_count;
 	struct printk_record r;
-	unsigned long flags;
 	size_t l = 0;
 	bool ret = false;
 
 	if (iter->cur_seq < min_seq)
 		iter->cur_seq = min_seq;
 
-	printk_safe_enter_irqsave(flags);
 	prb_rec_init_rd(&r, &info, line, size);
 
 	/* Read text or count text lines? */
@@ -3491,7 +3468,6 @@ bool kmsg_dump_get_line(struct kmsg_dump_iter *iter, bool syslog,
 	iter->cur_seq = r.info->seq + 1;
 	ret = true;
 out:
-	printk_safe_exit_irqrestore(flags);
 	if (len)
 		*len = l;
 	return ret;
@@ -3523,7 +3499,6 @@ bool kmsg_dump_get_buffer(struct kmsg_dump_iter *iter, bool syslog,
 	u64 min_seq = latched_seq_read_nolock(&clear_seq);
 	struct printk_info info;
 	struct printk_record r;
-	unsigned long flags;
 	u64 seq;
 	u64 next_seq;
 	size_t len = 0;
@@ -3536,7 +3511,6 @@ bool kmsg_dump_get_buffer(struct kmsg_dump_iter *iter, bool syslog,
 	if (iter->cur_seq < min_seq)
 		iter->cur_seq = min_seq;
 
-	printk_safe_enter_irqsave(flags);
 	if (prb_read_valid_info(prb, iter->cur_seq, &info, NULL)) {
 		if (info.seq != iter->cur_seq) {
 			/* messages are gone, move to first available one */
@@ -3545,10 +3519,8 @@ bool kmsg_dump_get_buffer(struct kmsg_dump_iter *iter, bool syslog,
 	}
 
 	/* last entry */
-	if (iter->cur_seq >= iter->next_seq) {
-		printk_safe_exit_irqrestore(flags);
+	if (iter->cur_seq >= iter->next_seq)
 		goto out;
-	}
 
 	/*
 	 * Find first record that fits, including all following records,
@@ -3580,7 +3552,6 @@ bool kmsg_dump_get_buffer(struct kmsg_dump_iter *iter, bool syslog,
 
 	iter->next_seq = next_seq;
 	ret = true;
-	printk_safe_exit_irqrestore(flags);
 out:
 	if (len_out)
 		*len_out = len;
@@ -3598,12 +3569,8 @@ EXPORT_SYMBOL_GPL(kmsg_dump_get_buffer);
  */
 void kmsg_dump_rewind(struct kmsg_dump_iter *iter)
 {
-	unsigned long flags;
-
-	printk_safe_enter_irqsave(flags);
 	iter->cur_seq = latched_seq_read_nolock(&clear_seq);
 	iter->next_seq = prb_next_seq(prb);
-	printk_safe_exit_irqrestore(flags);
 }
 EXPORT_SYMBOL_GPL(kmsg_dump_rewind);
 
diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c
index 7a1414622051..4b5df5c27334 100644
--- a/kernel/printk/printk_safe.c
+++ b/kernel/printk/printk_safe.c
@@ -15,286 +15,9 @@
 
 #include "internal.h"
 
-/*
- * In NMI and safe mode, printk() avoids taking locks. Instead,
- * it uses an alternative implementation that temporary stores
- * the strings into a per-CPU buffer. The content of the buffer
- * is later flushed into the main ring buffer via IRQ work.
- *
- * The alternative implementation is chosen transparently
- * by examining current printk() context mask stored in @printk_context
- * per-CPU variable.
- *
- * The implementation allows to flush the strings also from another CPU.
- * There are situations when we want to make sure that all buffers
- * were handled or when IRQs are blocked.
- */
-
-#define SAFE_LOG_BUF_LEN ((1 << CONFIG_PRINTK_SAFE_LOG_BUF_SHIFT) -	\
-				sizeof(atomic_t) -			\
-				sizeof(atomic_t) -			\
-				sizeof(struct irq_work))
-
-struct printk_safe_seq_buf {
-	atomic_t		len;	/* length of written data */
-	atomic_t		message_lost;
-	struct irq_work		work;	/* IRQ work that flushes the buffer */
-	unsigned char		buffer[SAFE_LOG_BUF_LEN];
-};
-
-static DEFINE_PER_CPU(struct printk_safe_seq_buf, safe_print_seq);
 static DEFINE_PER_CPU(int, printk_context);
 
-static DEFINE_RAW_SPINLOCK(safe_read_lock);
-
-#ifdef CONFIG_PRINTK_NMI
-static DEFINE_PER_CPU(struct printk_safe_seq_buf, nmi_print_seq);
-#endif
-
-/* Get flushed in a more safe context. */
-static void queue_flush_work(struct printk_safe_seq_buf *s)
-{
-	if (printk_percpu_data_ready())
-		irq_work_queue(&s->work);
-}
-
-/*
- * Add a message to per-CPU context-dependent buffer. NMI and printk-safe
- * have dedicated buffers, because otherwise printk-safe preempted by
- * NMI-printk would have overwritten the NMI messages.
- *
- * The messages are flushed from irq work (or from panic()), possibly,
- * from other CPU, concurrently with printk_safe_log_store(). Should this
- * happen, printk_safe_log_store() will notice the buffer->len mismatch
- * and repeat the write.
- */
-static __printf(2, 0) int printk_safe_log_store(struct printk_safe_seq_buf *s,
-						const char *fmt, va_list args)
-{
-	int add;
-	size_t len;
-	va_list ap;
-
-again:
-	len = atomic_read(&s->len);
-
-	/* The trailing '\0' is not counted into len. */
-	if (len >= sizeof(s->buffer) - 1) {
-		atomic_inc(&s->message_lost);
-		queue_flush_work(s);
-		return 0;
-	}
-
-	/*
-	 * Make sure that all old data have been read before the buffer
-	 * was reset. This is not needed when we just append data.
-	 */
-	if (!len)
-		smp_rmb();
-
-	va_copy(ap, args);
-	add = vscnprintf(s->buffer + len, sizeof(s->buffer) - len, fmt, ap);
-	va_end(ap);
-	if (!add)
-		return 0;
-
-	/*
-	 * Do it once again if the buffer has been flushed in the meantime.
-	 * Note that atomic_cmpxchg() is an implicit memory barrier that
-	 * makes sure that the data were written before updating s->len.
-	 */
-	if (atomic_cmpxchg(&s->len, len, len + add) != len)
-		goto again;
-
-	queue_flush_work(s);
-	return add;
-}
-
-static inline void printk_safe_flush_line(const char *text, int len)
-{
-	/*
-	 * Avoid any console drivers calls from here, because we may be
-	 * in NMI or printk_safe context (when in panic). The messages
-	 * must go only into the ring buffer at this stage.  Consoles will
-	 * get explicitly called later when a crashdump is not generated.
-	 */
-	printk_deferred("%.*s", len, text);
-}
-
-/* printk part of the temporary buffer line by line */
-static int printk_safe_flush_buffer(const char *start, size_t len)
-{
-	const char *c, *end;
-	bool header;
-
-	c = start;
-	end = start + len;
-	header = true;
-
-	/* Print line by line. */
-	while (c < end) {
-		if (*c == '\n') {
-			printk_safe_flush_line(start, c - start + 1);
-			start = ++c;
-			header = true;
-			continue;
-		}
-
-		/* Handle continuous lines or missing new line. */
-		if ((c + 1 < end) && printk_get_level(c)) {
-			if (header) {
-				c = printk_skip_level(c);
-				continue;
-			}
-
-			printk_safe_flush_line(start, c - start);
-			start = c++;
-			header = true;
-			continue;
-		}
-
-		header = false;
-		c++;
-	}
-
-	/* Check if there was a partial line. Ignore pure header. */
-	if (start < end && !header) {
-		static const char newline[] = KERN_CONT "\n";
-
-		printk_safe_flush_line(start, end - start);
-		printk_safe_flush_line(newline, strlen(newline));
-	}
-
-	return len;
-}
-
-static void report_message_lost(struct printk_safe_seq_buf *s)
-{
-	int lost = atomic_xchg(&s->message_lost, 0);
-
-	if (lost)
-		printk_deferred("Lost %d message(s)!\n", lost);
-}
-
-/*
- * Flush data from the associated per-CPU buffer. The function
- * can be called either via IRQ work or independently.
- */
-static void __printk_safe_flush(struct irq_work *work)
-{
-	struct printk_safe_seq_buf *s =
-		container_of(work, struct printk_safe_seq_buf, work);
-	unsigned long flags;
-	size_t len;
-	int i;
-
-	/*
-	 * The lock has two functions. First, one reader has to flush all
-	 * available message to make the lockless synchronization with
-	 * writers easier. Second, we do not want to mix messages from
-	 * different CPUs. This is especially important when printing
-	 * a backtrace.
-	 */
-	raw_spin_lock_irqsave(&safe_read_lock, flags);
-
-	i = 0;
-more:
-	len = atomic_read(&s->len);
-
-	/*
-	 * This is just a paranoid check that nobody has manipulated
-	 * the buffer an unexpected way. If we printed something then
-	 * @len must only increase. Also it should never overflow the
-	 * buffer size.
-	 */
-	if ((i && i >= len) || len > sizeof(s->buffer)) {
-		const char *msg = "printk_safe_flush: internal error\n";
-
-		printk_safe_flush_line(msg, strlen(msg));
-		len = 0;
-	}
-
-	if (!len)
-		goto out; /* Someone else has already flushed the buffer. */
-
-	/* Make sure that data has been written up to the @len */
-	smp_rmb();
-	i += printk_safe_flush_buffer(s->buffer + i, len - i);
-
-	/*
-	 * Check that nothing has got added in the meantime and truncate
-	 * the buffer. Note that atomic_cmpxchg() is an implicit memory
-	 * barrier that makes sure that the data were copied before
-	 * updating s->len.
-	 */
-	if (atomic_cmpxchg(&s->len, len, 0) != len)
-		goto more;
-
-out:
-	report_message_lost(s);
-	raw_spin_unlock_irqrestore(&safe_read_lock, flags);
-}
-
-/**
- * printk_safe_flush - flush all per-cpu nmi buffers.
- *
- * The buffers are flushed automatically via IRQ work. This function
- * is useful only when someone wants to be sure that all buffers have
- * been flushed at some point.
- */
-void printk_safe_flush(void)
-{
-	int cpu;
-
-	for_each_possible_cpu(cpu) {
-#ifdef CONFIG_PRINTK_NMI
-		__printk_safe_flush(&per_cpu(nmi_print_seq, cpu).work);
-#endif
-		__printk_safe_flush(&per_cpu(safe_print_seq, cpu).work);
-	}
-}
-
-/**
- * printk_safe_flush_on_panic - flush all per-cpu nmi buffers when the system
- *	goes down.
- *
- * Similar to printk_safe_flush() but it can be called even in NMI context when
- * the system goes down. It does the best effort to get NMI messages into
- * the main ring buffer.
- *
- * Note that it could try harder when there is only one CPU online.
- */
-void printk_safe_flush_on_panic(void)
-{
-	/*
-	 * Make sure that we could access the safe buffers.
-	 * Do not risk a double release when more CPUs are up.
-	 */
-	if (raw_spin_is_locked(&safe_read_lock)) {
-		if (num_online_cpus() > 1)
-			return;
-
-		debug_locks_off();
-		raw_spin_lock_init(&safe_read_lock);
-	}
-
-	printk_safe_flush();
-}
-
 #ifdef CONFIG_PRINTK_NMI
-/*
- * Safe printk() for NMI context. It uses a per-CPU buffer to
- * store the message. NMIs are not nested, so there is always only
- * one writer running. But the buffer might get flushed from another
- * CPU, so we need to be careful.
- */
-static __printf(1, 0) int vprintk_nmi(const char *fmt, va_list args)
-{
-	struct printk_safe_seq_buf *s = this_cpu_ptr(&nmi_print_seq);
-
-	return printk_safe_log_store(s, fmt, args);
-}
-
 void noinstr printk_nmi_enter(void)
 {
 	this_cpu_add(printk_context, PRINTK_NMI_CONTEXT_OFFSET);
@@ -309,9 +32,6 @@ void noinstr printk_nmi_exit(void)
  * Marks a code that might produce many messages in NMI context
  * and the risk of losing them is more critical than eventual
  * reordering.
- *
- * It has effect only when called in NMI context. Then printk()
- * will store the messages into the main logbuf directly.
  */
 void printk_nmi_direct_enter(void)
 {
@@ -324,27 +44,8 @@ void printk_nmi_direct_exit(void)
 	this_cpu_and(printk_context, ~PRINTK_NMI_DIRECT_CONTEXT_MASK);
 }
 
-#else
-
-static __printf(1, 0) int vprintk_nmi(const char *fmt, va_list args)
-{
-	return 0;
-}
-
 #endif /* CONFIG_PRINTK_NMI */
 
-/*
- * Lock-less printk(), to avoid deadlocks should the printk() recurse
- * into itself. It uses a per-CPU buffer to store the message, just like
- * NMI.
- */
-static __printf(1, 0) int vprintk_safe(const char *fmt, va_list args)
-{
-	struct printk_safe_seq_buf *s = this_cpu_ptr(&safe_print_seq);
-
-	return printk_safe_log_store(s, fmt, args);
-}
-
 /* Can be preempted by NMI. */
 void __printk_safe_enter(void)
 {
@@ -369,7 +70,10 @@ asmlinkage int vprintk(const char *fmt, va_list args)
 	 * Use the main logbuf even in NMI. But avoid calling console
 	 * drivers that might have their own locks.
 	 */
-	if ((this_cpu_read(printk_context) & PRINTK_NMI_DIRECT_CONTEXT_MASK)) {
+	if (this_cpu_read(printk_context) &
+	    (PRINTK_NMI_DIRECT_CONTEXT_MASK |
+	     PRINTK_NMI_CONTEXT_MASK |
+	     PRINTK_SAFE_CONTEXT_MASK)) {
 		unsigned long flags;
 		int len;
 
@@ -380,35 +84,6 @@ asmlinkage int vprintk(const char *fmt, va_list args)
 		return len;
 	}
 
-	/* Use extra buffer in NMI. */
-	if (this_cpu_read(printk_context) & PRINTK_NMI_CONTEXT_MASK)
-		return vprintk_nmi(fmt, args);
-
-	/* Use extra buffer to prevent a recursion deadlock in safe mode. */
-	if (this_cpu_read(printk_context) & PRINTK_SAFE_CONTEXT_MASK)
-		return vprintk_safe(fmt, args);
-
 	/* No obstacles. */
 	return vprintk_default(fmt, args);
 }
-
-void __init printk_safe_init(void)
-{
-	int cpu;
-
-	for_each_possible_cpu(cpu) {
-		struct printk_safe_seq_buf *s;
-
-		s = &per_cpu(safe_print_seq, cpu);
-		init_irq_work(&s->work, __printk_safe_flush);
-
-#ifdef CONFIG_PRINTK_NMI
-		s = &per_cpu(nmi_print_seq, cpu);
-		init_irq_work(&s->work, __printk_safe_flush);
-#endif
-	}
-
-	/* Flush pending messages that did not have scheduled IRQ works. */
-	printk_safe_flush();
-}
-EXPORT_SYMBOL(vprintk);
diff --git a/lib/nmi_backtrace.c b/lib/nmi_backtrace.c
index 8abe1870dba4..b09a490f5f70 100644
--- a/lib/nmi_backtrace.c
+++ b/lib/nmi_backtrace.c
@@ -75,12 +75,6 @@ void nmi_trigger_cpumask_backtrace(const cpumask_t *mask,
 		touch_softlockup_watchdog();
 	}
 
-	/*
-	 * Force flush any remote buffers that might be stuck in IRQ context
-	 * and therefore could not run their irq_work.
-	 */
-	printk_safe_flush();
-
 	clear_bit_unlock(0, &backtrace_flag);
 	put_cpu();
 }
-- 
2.20.1


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH printk v2 3/5] printk: remove NMI tracking
  2021-03-30 15:35 ` John Ogness
  (?)
@ 2021-03-30 15:35   ` John Ogness
  -1 siblings, 0 replies; 31+ messages in thread
From: John Ogness @ 2021-03-30 15:35 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Steven Rostedt,
	Thomas Gleixner, linux-kernel, Russell King, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, Ingo Molnar,
	Marc Zyngier, Valentin Schneider, Andrew Morton, Mike Rapoport,
	Geert Uytterhoeven, Anshuman Khandual, Leonardo Bras,
	Peter Zijlstra, Andy Lutomirski, Will Deacon,
	Frederic Weisbecker, Paul E. McKenney, Masahiro Yamada,
	Nathan Chancellor, Alexei Starovoitov, Kees Cook, Nick Terrell,
	David Howells, Johannes Weiner, linux-arm-kernel, linuxppc-dev

All NMI contexts are handled the same as the safe context: store the
message and defer printing. There is no need to have special NMI
context tracking for this. Using in_nmi() is enough.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
---
 arch/arm/kernel/smp.c       |  2 --
 arch/powerpc/kexec/crash.c  |  3 ---
 include/linux/hardirq.h     |  2 --
 include/linux/printk.h      | 12 ------------
 init/Kconfig                |  5 -----
 kernel/printk/internal.h    |  6 ------
 kernel/printk/printk_safe.c | 37 +------------------------------------
 kernel/trace/trace.c        |  2 --
 8 files changed, 1 insertion(+), 68 deletions(-)

diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
index 5c48eb4fd0e5..77a720c1f402 100644
--- a/arch/arm/kernel/smp.c
+++ b/arch/arm/kernel/smp.c
@@ -671,9 +671,7 @@ static void do_handle_IPI(int ipinr)
 		break;
 
 	case IPI_CPU_BACKTRACE:
-		printk_nmi_enter();
 		nmi_cpu_backtrace(get_irq_regs());
-		printk_nmi_exit();
 		break;
 
 	default:
diff --git a/arch/powerpc/kexec/crash.c b/arch/powerpc/kexec/crash.c
index c9a889880214..d488311efab1 100644
--- a/arch/powerpc/kexec/crash.c
+++ b/arch/powerpc/kexec/crash.c
@@ -311,9 +311,6 @@ void default_machine_crash_shutdown(struct pt_regs *regs)
 	unsigned int i;
 	int (*old_handler)(struct pt_regs *regs);
 
-	/* Avoid hardlocking with irresponsive CPU holding logbuf_lock */
-	printk_nmi_enter();
-
 	/*
 	 * This function is only called after the system
 	 * has panicked or is otherwise in a critical state.
diff --git a/include/linux/hardirq.h b/include/linux/hardirq.h
index 7c9d6a2d7e90..0926e9ca4d85 100644
--- a/include/linux/hardirq.h
+++ b/include/linux/hardirq.h
@@ -115,7 +115,6 @@ extern void rcu_nmi_exit(void);
 	do {							\
 		lockdep_off();					\
 		arch_nmi_enter();				\
-		printk_nmi_enter();				\
 		BUG_ON(in_nmi() == NMI_MASK);			\
 		__preempt_count_add(NMI_OFFSET + HARDIRQ_OFFSET);	\
 	} while (0)
@@ -134,7 +133,6 @@ extern void rcu_nmi_exit(void);
 	do {							\
 		BUG_ON(!in_nmi());				\
 		__preempt_count_sub(NMI_OFFSET + HARDIRQ_OFFSET);	\
-		printk_nmi_exit();				\
 		arch_nmi_exit();				\
 		lockdep_on();					\
 	} while (0)
diff --git a/include/linux/printk.h b/include/linux/printk.h
index 2476796c1150..77f66625706e 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -149,18 +149,6 @@ static inline __printf(1, 2) __cold
 void early_printk(const char *s, ...) { }
 #endif
 
-#ifdef CONFIG_PRINTK_NMI
-extern void printk_nmi_enter(void);
-extern void printk_nmi_exit(void);
-extern void printk_nmi_direct_enter(void);
-extern void printk_nmi_direct_exit(void);
-#else
-static inline void printk_nmi_enter(void) { }
-static inline void printk_nmi_exit(void) { }
-static inline void printk_nmi_direct_enter(void) { }
-static inline void printk_nmi_direct_exit(void) { }
-#endif /* PRINTK_NMI */
-
 struct dev_printk_info;
 
 #ifdef CONFIG_PRINTK
diff --git a/init/Kconfig b/init/Kconfig
index 096e1af5c586..ea58c0d30a97 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1482,11 +1482,6 @@ config PRINTK
 	  very difficult to diagnose system problems, saying N here is
 	  strongly discouraged.
 
-config PRINTK_NMI
-	def_bool y
-	depends on PRINTK
-	depends on HAVE_NMI
-
 config BUG
 	bool "BUG() support" if EXPERT
 	default y
diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h
index 6cc35c5de890..ff890ae3ee6a 100644
--- a/kernel/printk/internal.h
+++ b/kernel/printk/internal.h
@@ -6,12 +6,6 @@
 
 #ifdef CONFIG_PRINTK
 
-#define PRINTK_SAFE_CONTEXT_MASK	0x007ffffff
-#define PRINTK_NMI_DIRECT_CONTEXT_MASK	0x008000000
-#define PRINTK_NMI_CONTEXT_MASK		0xff0000000
-
-#define PRINTK_NMI_CONTEXT_OFFSET	0x010000000
-
 __printf(4, 0)
 int vprintk_store(int facility, int level,
 		  const struct dev_printk_info *dev_info,
diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c
index 4b5df5c27334..4da1ab3332d6 100644
--- a/kernel/printk/printk_safe.c
+++ b/kernel/printk/printk_safe.c
@@ -4,12 +4,9 @@
  */
 
 #include <linux/preempt.h>
-#include <linux/spinlock.h>
-#include <linux/debug_locks.h>
 #include <linux/kdb.h>
 #include <linux/smp.h>
 #include <linux/cpumask.h>
-#include <linux/irq_work.h>
 #include <linux/printk.h>
 #include <linux/kprobes.h>
 
@@ -17,35 +14,6 @@
 
 static DEFINE_PER_CPU(int, printk_context);
 
-#ifdef CONFIG_PRINTK_NMI
-void noinstr printk_nmi_enter(void)
-{
-	this_cpu_add(printk_context, PRINTK_NMI_CONTEXT_OFFSET);
-}
-
-void noinstr printk_nmi_exit(void)
-{
-	this_cpu_sub(printk_context, PRINTK_NMI_CONTEXT_OFFSET);
-}
-
-/*
- * Marks a code that might produce many messages in NMI context
- * and the risk of losing them is more critical than eventual
- * reordering.
- */
-void printk_nmi_direct_enter(void)
-{
-	if (this_cpu_read(printk_context) & PRINTK_NMI_CONTEXT_MASK)
-		this_cpu_or(printk_context, PRINTK_NMI_DIRECT_CONTEXT_MASK);
-}
-
-void printk_nmi_direct_exit(void)
-{
-	this_cpu_and(printk_context, ~PRINTK_NMI_DIRECT_CONTEXT_MASK);
-}
-
-#endif /* CONFIG_PRINTK_NMI */
-
 /* Can be preempted by NMI. */
 void __printk_safe_enter(void)
 {
@@ -70,10 +38,7 @@ asmlinkage int vprintk(const char *fmt, va_list args)
 	 * Use the main logbuf even in NMI. But avoid calling console
 	 * drivers that might have their own locks.
 	 */
-	if (this_cpu_read(printk_context) &
-	    (PRINTK_NMI_DIRECT_CONTEXT_MASK |
-	     PRINTK_NMI_CONTEXT_MASK |
-	     PRINTK_SAFE_CONTEXT_MASK)) {
+	if (this_cpu_read(printk_context) || in_nmi()) {
 		unsigned long flags;
 		int len;
 
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index b5815a022ecc..83a3663d1dc0 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -9344,7 +9344,6 @@ void ftrace_dump(enum ftrace_dump_mode oops_dump_mode)
 	tracing_off();
 
 	local_irq_save(flags);
-	printk_nmi_direct_enter();
 
 	/* Simulate the iterator */
 	trace_init_global_iter(&iter);
@@ -9424,7 +9423,6 @@ void ftrace_dump(enum ftrace_dump_mode oops_dump_mode)
 		atomic_dec(&per_cpu_ptr(iter.array_buffer->data, cpu)->disabled);
 	}
 	atomic_dec(&dump_running);
-	printk_nmi_direct_exit();
 	local_irq_restore(flags);
 }
 EXPORT_SYMBOL_GPL(ftrace_dump);
-- 
2.20.1


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH printk v2 3/5] printk: remove NMI tracking
@ 2021-03-30 15:35   ` John Ogness
  0 siblings, 0 replies; 31+ messages in thread
From: John Ogness @ 2021-03-30 15:35 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Peter Zijlstra, Alexei Starovoitov,
	David Howells, Paul Mackerras, Will Deacon, Leonardo Bras,
	Marc Zyngier, Masahiro Yamada, Russell King, Ingo Molnar,
	Geert Uytterhoeven, Valentin Schneider, Kees Cook,
	Paul E. McKenney, Anshuman Khandual, Frederic Weisbecker,
	Johannes Weiner, Steven Rostedt, Nathan Chancellor, Nick Terrell,
	Andy Lutomirski, Thomas Gleixner, linux-arm-kernel, linux-kernel,
	Sergey Senozhatsky, Andrew Morton, linuxppc-dev, Mike Rapoport

All NMI contexts are handled the same as the safe context: store the
message and defer printing. There is no need to have special NMI
context tracking for this. Using in_nmi() is enough.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
---
 arch/arm/kernel/smp.c       |  2 --
 arch/powerpc/kexec/crash.c  |  3 ---
 include/linux/hardirq.h     |  2 --
 include/linux/printk.h      | 12 ------------
 init/Kconfig                |  5 -----
 kernel/printk/internal.h    |  6 ------
 kernel/printk/printk_safe.c | 37 +------------------------------------
 kernel/trace/trace.c        |  2 --
 8 files changed, 1 insertion(+), 68 deletions(-)

diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
index 5c48eb4fd0e5..77a720c1f402 100644
--- a/arch/arm/kernel/smp.c
+++ b/arch/arm/kernel/smp.c
@@ -671,9 +671,7 @@ static void do_handle_IPI(int ipinr)
 		break;
 
 	case IPI_CPU_BACKTRACE:
-		printk_nmi_enter();
 		nmi_cpu_backtrace(get_irq_regs());
-		printk_nmi_exit();
 		break;
 
 	default:
diff --git a/arch/powerpc/kexec/crash.c b/arch/powerpc/kexec/crash.c
index c9a889880214..d488311efab1 100644
--- a/arch/powerpc/kexec/crash.c
+++ b/arch/powerpc/kexec/crash.c
@@ -311,9 +311,6 @@ void default_machine_crash_shutdown(struct pt_regs *regs)
 	unsigned int i;
 	int (*old_handler)(struct pt_regs *regs);
 
-	/* Avoid hardlocking with irresponsive CPU holding logbuf_lock */
-	printk_nmi_enter();
-
 	/*
 	 * This function is only called after the system
 	 * has panicked or is otherwise in a critical state.
diff --git a/include/linux/hardirq.h b/include/linux/hardirq.h
index 7c9d6a2d7e90..0926e9ca4d85 100644
--- a/include/linux/hardirq.h
+++ b/include/linux/hardirq.h
@@ -115,7 +115,6 @@ extern void rcu_nmi_exit(void);
 	do {							\
 		lockdep_off();					\
 		arch_nmi_enter();				\
-		printk_nmi_enter();				\
 		BUG_ON(in_nmi() == NMI_MASK);			\
 		__preempt_count_add(NMI_OFFSET + HARDIRQ_OFFSET);	\
 	} while (0)
@@ -134,7 +133,6 @@ extern void rcu_nmi_exit(void);
 	do {							\
 		BUG_ON(!in_nmi());				\
 		__preempt_count_sub(NMI_OFFSET + HARDIRQ_OFFSET);	\
-		printk_nmi_exit();				\
 		arch_nmi_exit();				\
 		lockdep_on();					\
 	} while (0)
diff --git a/include/linux/printk.h b/include/linux/printk.h
index 2476796c1150..77f66625706e 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -149,18 +149,6 @@ static inline __printf(1, 2) __cold
 void early_printk(const char *s, ...) { }
 #endif
 
-#ifdef CONFIG_PRINTK_NMI
-extern void printk_nmi_enter(void);
-extern void printk_nmi_exit(void);
-extern void printk_nmi_direct_enter(void);
-extern void printk_nmi_direct_exit(void);
-#else
-static inline void printk_nmi_enter(void) { }
-static inline void printk_nmi_exit(void) { }
-static inline void printk_nmi_direct_enter(void) { }
-static inline void printk_nmi_direct_exit(void) { }
-#endif /* PRINTK_NMI */
-
 struct dev_printk_info;
 
 #ifdef CONFIG_PRINTK
diff --git a/init/Kconfig b/init/Kconfig
index 096e1af5c586..ea58c0d30a97 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1482,11 +1482,6 @@ config PRINTK
 	  very difficult to diagnose system problems, saying N here is
 	  strongly discouraged.
 
-config PRINTK_NMI
-	def_bool y
-	depends on PRINTK
-	depends on HAVE_NMI
-
 config BUG
 	bool "BUG() support" if EXPERT
 	default y
diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h
index 6cc35c5de890..ff890ae3ee6a 100644
--- a/kernel/printk/internal.h
+++ b/kernel/printk/internal.h
@@ -6,12 +6,6 @@
 
 #ifdef CONFIG_PRINTK
 
-#define PRINTK_SAFE_CONTEXT_MASK	0x007ffffff
-#define PRINTK_NMI_DIRECT_CONTEXT_MASK	0x008000000
-#define PRINTK_NMI_CONTEXT_MASK		0xff0000000
-
-#define PRINTK_NMI_CONTEXT_OFFSET	0x010000000
-
 __printf(4, 0)
 int vprintk_store(int facility, int level,
 		  const struct dev_printk_info *dev_info,
diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c
index 4b5df5c27334..4da1ab3332d6 100644
--- a/kernel/printk/printk_safe.c
+++ b/kernel/printk/printk_safe.c
@@ -4,12 +4,9 @@
  */
 
 #include <linux/preempt.h>
-#include <linux/spinlock.h>
-#include <linux/debug_locks.h>
 #include <linux/kdb.h>
 #include <linux/smp.h>
 #include <linux/cpumask.h>
-#include <linux/irq_work.h>
 #include <linux/printk.h>
 #include <linux/kprobes.h>
 
@@ -17,35 +14,6 @@
 
 static DEFINE_PER_CPU(int, printk_context);
 
-#ifdef CONFIG_PRINTK_NMI
-void noinstr printk_nmi_enter(void)
-{
-	this_cpu_add(printk_context, PRINTK_NMI_CONTEXT_OFFSET);
-}
-
-void noinstr printk_nmi_exit(void)
-{
-	this_cpu_sub(printk_context, PRINTK_NMI_CONTEXT_OFFSET);
-}
-
-/*
- * Marks a code that might produce many messages in NMI context
- * and the risk of losing them is more critical than eventual
- * reordering.
- */
-void printk_nmi_direct_enter(void)
-{
-	if (this_cpu_read(printk_context) & PRINTK_NMI_CONTEXT_MASK)
-		this_cpu_or(printk_context, PRINTK_NMI_DIRECT_CONTEXT_MASK);
-}
-
-void printk_nmi_direct_exit(void)
-{
-	this_cpu_and(printk_context, ~PRINTK_NMI_DIRECT_CONTEXT_MASK);
-}
-
-#endif /* CONFIG_PRINTK_NMI */
-
 /* Can be preempted by NMI. */
 void __printk_safe_enter(void)
 {
@@ -70,10 +38,7 @@ asmlinkage int vprintk(const char *fmt, va_list args)
 	 * Use the main logbuf even in NMI. But avoid calling console
 	 * drivers that might have their own locks.
 	 */
-	if (this_cpu_read(printk_context) &
-	    (PRINTK_NMI_DIRECT_CONTEXT_MASK |
-	     PRINTK_NMI_CONTEXT_MASK |
-	     PRINTK_SAFE_CONTEXT_MASK)) {
+	if (this_cpu_read(printk_context) || in_nmi()) {
 		unsigned long flags;
 		int len;
 
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index b5815a022ecc..83a3663d1dc0 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -9344,7 +9344,6 @@ void ftrace_dump(enum ftrace_dump_mode oops_dump_mode)
 	tracing_off();
 
 	local_irq_save(flags);
-	printk_nmi_direct_enter();
 
 	/* Simulate the iterator */
 	trace_init_global_iter(&iter);
@@ -9424,7 +9423,6 @@ void ftrace_dump(enum ftrace_dump_mode oops_dump_mode)
 		atomic_dec(&per_cpu_ptr(iter.array_buffer->data, cpu)->disabled);
 	}
 	atomic_dec(&dump_running);
-	printk_nmi_direct_exit();
 	local_irq_restore(flags);
 }
 EXPORT_SYMBOL_GPL(ftrace_dump);
-- 
2.20.1


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH printk v2 3/5] printk: remove NMI tracking
@ 2021-03-30 15:35   ` John Ogness
  0 siblings, 0 replies; 31+ messages in thread
From: John Ogness @ 2021-03-30 15:35 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Steven Rostedt,
	Thomas Gleixner, linux-kernel, Russell King, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, Ingo Molnar,
	Marc Zyngier, Valentin Schneider, Andrew Morton, Mike Rapoport,
	Geert Uytterhoeven, Anshuman Khandual, Leonardo Bras,
	Peter Zijlstra, Andy Lutomirski, Will Deacon,
	Frederic Weisbecker, Paul E. McKenney, Masahiro Yamada,
	Nathan Chancellor, Alexei Starovoitov, Kees Cook, Nick Terrell,
	David Howells, Johannes Weiner, linux-arm-kernel, linuxppc-dev

All NMI contexts are handled the same as the safe context: store the
message and defer printing. There is no need to have special NMI
context tracking for this. Using in_nmi() is enough.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
---
 arch/arm/kernel/smp.c       |  2 --
 arch/powerpc/kexec/crash.c  |  3 ---
 include/linux/hardirq.h     |  2 --
 include/linux/printk.h      | 12 ------------
 init/Kconfig                |  5 -----
 kernel/printk/internal.h    |  6 ------
 kernel/printk/printk_safe.c | 37 +------------------------------------
 kernel/trace/trace.c        |  2 --
 8 files changed, 1 insertion(+), 68 deletions(-)

diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
index 5c48eb4fd0e5..77a720c1f402 100644
--- a/arch/arm/kernel/smp.c
+++ b/arch/arm/kernel/smp.c
@@ -671,9 +671,7 @@ static void do_handle_IPI(int ipinr)
 		break;
 
 	case IPI_CPU_BACKTRACE:
-		printk_nmi_enter();
 		nmi_cpu_backtrace(get_irq_regs());
-		printk_nmi_exit();
 		break;
 
 	default:
diff --git a/arch/powerpc/kexec/crash.c b/arch/powerpc/kexec/crash.c
index c9a889880214..d488311efab1 100644
--- a/arch/powerpc/kexec/crash.c
+++ b/arch/powerpc/kexec/crash.c
@@ -311,9 +311,6 @@ void default_machine_crash_shutdown(struct pt_regs *regs)
 	unsigned int i;
 	int (*old_handler)(struct pt_regs *regs);
 
-	/* Avoid hardlocking with irresponsive CPU holding logbuf_lock */
-	printk_nmi_enter();
-
 	/*
 	 * This function is only called after the system
 	 * has panicked or is otherwise in a critical state.
diff --git a/include/linux/hardirq.h b/include/linux/hardirq.h
index 7c9d6a2d7e90..0926e9ca4d85 100644
--- a/include/linux/hardirq.h
+++ b/include/linux/hardirq.h
@@ -115,7 +115,6 @@ extern void rcu_nmi_exit(void);
 	do {							\
 		lockdep_off();					\
 		arch_nmi_enter();				\
-		printk_nmi_enter();				\
 		BUG_ON(in_nmi() == NMI_MASK);			\
 		__preempt_count_add(NMI_OFFSET + HARDIRQ_OFFSET);	\
 	} while (0)
@@ -134,7 +133,6 @@ extern void rcu_nmi_exit(void);
 	do {							\
 		BUG_ON(!in_nmi());				\
 		__preempt_count_sub(NMI_OFFSET + HARDIRQ_OFFSET);	\
-		printk_nmi_exit();				\
 		arch_nmi_exit();				\
 		lockdep_on();					\
 	} while (0)
diff --git a/include/linux/printk.h b/include/linux/printk.h
index 2476796c1150..77f66625706e 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -149,18 +149,6 @@ static inline __printf(1, 2) __cold
 void early_printk(const char *s, ...) { }
 #endif
 
-#ifdef CONFIG_PRINTK_NMI
-extern void printk_nmi_enter(void);
-extern void printk_nmi_exit(void);
-extern void printk_nmi_direct_enter(void);
-extern void printk_nmi_direct_exit(void);
-#else
-static inline void printk_nmi_enter(void) { }
-static inline void printk_nmi_exit(void) { }
-static inline void printk_nmi_direct_enter(void) { }
-static inline void printk_nmi_direct_exit(void) { }
-#endif /* PRINTK_NMI */
-
 struct dev_printk_info;
 
 #ifdef CONFIG_PRINTK
diff --git a/init/Kconfig b/init/Kconfig
index 096e1af5c586..ea58c0d30a97 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1482,11 +1482,6 @@ config PRINTK
 	  very difficult to diagnose system problems, saying N here is
 	  strongly discouraged.
 
-config PRINTK_NMI
-	def_bool y
-	depends on PRINTK
-	depends on HAVE_NMI
-
 config BUG
 	bool "BUG() support" if EXPERT
 	default y
diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h
index 6cc35c5de890..ff890ae3ee6a 100644
--- a/kernel/printk/internal.h
+++ b/kernel/printk/internal.h
@@ -6,12 +6,6 @@
 
 #ifdef CONFIG_PRINTK
 
-#define PRINTK_SAFE_CONTEXT_MASK	0x007ffffff
-#define PRINTK_NMI_DIRECT_CONTEXT_MASK	0x008000000
-#define PRINTK_NMI_CONTEXT_MASK		0xff0000000
-
-#define PRINTK_NMI_CONTEXT_OFFSET	0x010000000
-
 __printf(4, 0)
 int vprintk_store(int facility, int level,
 		  const struct dev_printk_info *dev_info,
diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c
index 4b5df5c27334..4da1ab3332d6 100644
--- a/kernel/printk/printk_safe.c
+++ b/kernel/printk/printk_safe.c
@@ -4,12 +4,9 @@
  */
 
 #include <linux/preempt.h>
-#include <linux/spinlock.h>
-#include <linux/debug_locks.h>
 #include <linux/kdb.h>
 #include <linux/smp.h>
 #include <linux/cpumask.h>
-#include <linux/irq_work.h>
 #include <linux/printk.h>
 #include <linux/kprobes.h>
 
@@ -17,35 +14,6 @@
 
 static DEFINE_PER_CPU(int, printk_context);
 
-#ifdef CONFIG_PRINTK_NMI
-void noinstr printk_nmi_enter(void)
-{
-	this_cpu_add(printk_context, PRINTK_NMI_CONTEXT_OFFSET);
-}
-
-void noinstr printk_nmi_exit(void)
-{
-	this_cpu_sub(printk_context, PRINTK_NMI_CONTEXT_OFFSET);
-}
-
-/*
- * Marks a code that might produce many messages in NMI context
- * and the risk of losing them is more critical than eventual
- * reordering.
- */
-void printk_nmi_direct_enter(void)
-{
-	if (this_cpu_read(printk_context) & PRINTK_NMI_CONTEXT_MASK)
-		this_cpu_or(printk_context, PRINTK_NMI_DIRECT_CONTEXT_MASK);
-}
-
-void printk_nmi_direct_exit(void)
-{
-	this_cpu_and(printk_context, ~PRINTK_NMI_DIRECT_CONTEXT_MASK);
-}
-
-#endif /* CONFIG_PRINTK_NMI */
-
 /* Can be preempted by NMI. */
 void __printk_safe_enter(void)
 {
@@ -70,10 +38,7 @@ asmlinkage int vprintk(const char *fmt, va_list args)
 	 * Use the main logbuf even in NMI. But avoid calling console
 	 * drivers that might have their own locks.
 	 */
-	if (this_cpu_read(printk_context) &
-	    (PRINTK_NMI_DIRECT_CONTEXT_MASK |
-	     PRINTK_NMI_CONTEXT_MASK |
-	     PRINTK_SAFE_CONTEXT_MASK)) {
+	if (this_cpu_read(printk_context) || in_nmi()) {
 		unsigned long flags;
 		int len;
 
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index b5815a022ecc..83a3663d1dc0 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -9344,7 +9344,6 @@ void ftrace_dump(enum ftrace_dump_mode oops_dump_mode)
 	tracing_off();
 
 	local_irq_save(flags);
-	printk_nmi_direct_enter();
 
 	/* Simulate the iterator */
 	trace_init_global_iter(&iter);
@@ -9424,7 +9423,6 @@ void ftrace_dump(enum ftrace_dump_mode oops_dump_mode)
 		atomic_dec(&per_cpu_ptr(iter.array_buffer->data, cpu)->disabled);
 	}
 	atomic_dec(&dump_running);
-	printk_nmi_direct_exit();
 	local_irq_restore(flags);
 }
 EXPORT_SYMBOL_GPL(ftrace_dump);
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH printk v2 4/5] printk: convert @syslog_lock to mutex
  2021-03-30 15:35 ` John Ogness
                   ` (4 preceding siblings ...)
  (?)
@ 2021-03-30 15:35 ` John Ogness
  2021-04-01 15:13   ` Petr Mladek
  -1 siblings, 1 reply; 31+ messages in thread
From: John Ogness @ 2021-03-30 15:35 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Steven Rostedt,
	Thomas Gleixner, linux-kernel

@syslog_lock was a raw_spin_lock to simplify the transition of
removing @logbuf_lock and the safe buffers. With that transition
complete, and since all uses of @syslog_lock are within sleepable
contexts, @syslog_lock can become a mutex.

Note that until now register_console() would disable interrupts
using irqsave, which implies that it may be called with interrupts
disabled. And indeed, there is one possible call chain on parisc
where this happens:

handle_interruption(code=1) /* High-priority machine check (HPMC) */
  pdc_console_restart()
    pdc_console_init_force()
      register_console()

However, register_console() calls console_lock(), which might sleep.
So it has never been allowed to call register_console() from an
atomic context and the above call chain is a bug.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
---
 Note: The removal of read_syslog_seq_irq() is technically a small
       step backwards. But the follow-up patch moves forward again
       and closes a window that existed with read_syslog_seq_irq()
       and @syslog_lock as a spin_lock.

 kernel/printk/printk.c | 49 +++++++++++++++++-------------------------
 1 file changed, 20 insertions(+), 29 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index f090d6a1b39e..b771aae46445 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -356,7 +356,7 @@ enum log_flags {
 };
 
 /* syslog_lock protects syslog_* variables and write access to clear_seq. */
-static DEFINE_RAW_SPINLOCK(syslog_lock);
+static DEFINE_MUTEX(syslog_lock);
 
 #ifdef CONFIG_PRINTK
 DECLARE_WAIT_QUEUE_HEAD(log_wait);
@@ -1497,9 +1497,9 @@ static int syslog_print(char __user *buf, int size)
 		size_t n;
 		size_t skip;
 
-		raw_spin_lock_irq(&syslog_lock);
+		mutex_lock(&syslog_lock);
 		if (!prb_read_valid(prb, syslog_seq, &r)) {
-			raw_spin_unlock_irq(&syslog_lock);
+			mutex_unlock(&syslog_lock);
 			break;
 		}
 		if (r.info->seq != syslog_seq) {
@@ -1528,7 +1528,7 @@ static int syslog_print(char __user *buf, int size)
 			syslog_partial += n;
 		} else
 			n = 0;
-		raw_spin_unlock_irq(&syslog_lock);
+		mutex_unlock(&syslog_lock);
 
 		if (!n)
 			break;
@@ -1592,9 +1592,9 @@ static int syslog_print_all(char __user *buf, int size, bool clear)
 	}
 
 	if (clear) {
-		raw_spin_lock_irq(&syslog_lock);
+		mutex_lock(&syslog_lock);
 		latched_seq_write(&clear_seq, seq);
-		raw_spin_unlock_irq(&syslog_lock);
+		mutex_unlock(&syslog_lock);
 	}
 
 	kfree(text);
@@ -1603,21 +1603,9 @@ static int syslog_print_all(char __user *buf, int size, bool clear)
 
 static void syslog_clear(void)
 {
-	raw_spin_lock_irq(&syslog_lock);
+	mutex_lock(&syslog_lock);
 	latched_seq_write(&clear_seq, prb_next_seq(prb));
-	raw_spin_unlock_irq(&syslog_lock);
-}
-
-/* Return a consistent copy of @syslog_seq. */
-static u64 read_syslog_seq_irq(void)
-{
-	u64 seq;
-
-	raw_spin_lock_irq(&syslog_lock);
-	seq = syslog_seq;
-	raw_spin_unlock_irq(&syslog_lock);
-
-	return seq;
+	mutex_unlock(&syslog_lock);
 }
 
 int do_syslog(int type, char __user *buf, int len, int source)
@@ -1626,6 +1614,7 @@ int do_syslog(int type, char __user *buf, int len, int source)
 	bool clear = false;
 	static int saved_console_loglevel = LOGLEVEL_DEFAULT;
 	int error;
+	u64 seq;
 
 	error = check_syslog_permissions(type, source);
 	if (error)
@@ -1644,8 +1633,12 @@ int do_syslog(int type, char __user *buf, int len, int source)
 		if (!access_ok(buf, len))
 			return -EFAULT;
 
-		error = wait_event_interruptible(log_wait,
-				prb_read_valid(prb, read_syslog_seq_irq(), NULL));
+		/* Get a consistent copy of @syslog_seq. */
+		mutex_lock(&syslog_lock);
+		seq = syslog_seq;
+		mutex_unlock(&syslog_lock);
+
+		error = wait_event_interruptible(log_wait, prb_read_valid(prb, seq, NULL));
 		if (error)
 			return error;
 		error = syslog_print(buf, len);
@@ -1693,10 +1686,10 @@ int do_syslog(int type, char __user *buf, int len, int source)
 		break;
 	/* Number of chars in the log buffer */
 	case SYSLOG_ACTION_SIZE_UNREAD:
-		raw_spin_lock_irq(&syslog_lock);
+		mutex_lock(&syslog_lock);
 		if (!prb_read_valid_info(prb, syslog_seq, &info, NULL)) {
 			/* No unread messages. */
-			raw_spin_unlock_irq(&syslog_lock);
+			mutex_unlock(&syslog_lock);
 			return 0;
 		}
 		if (info.seq != syslog_seq) {
@@ -1714,7 +1707,6 @@ int do_syslog(int type, char __user *buf, int len, int source)
 		} else {
 			bool time = syslog_partial ? syslog_time : printk_time;
 			unsigned int line_count;
-			u64 seq;
 
 			prb_for_each_info(syslog_seq, prb, seq, &info,
 					  &line_count) {
@@ -1724,7 +1716,7 @@ int do_syslog(int type, char __user *buf, int len, int source)
 			}
 			error -= syslog_partial;
 		}
-		raw_spin_unlock_irq(&syslog_lock);
+		mutex_unlock(&syslog_lock);
 		break;
 	/* Size of the log buffer */
 	case SYSLOG_ACTION_SIZE_BUFFER:
@@ -2925,7 +2917,6 @@ static int try_enable_new_console(struct console *newcon, bool user_specified)
  */
 void register_console(struct console *newcon)
 {
-	unsigned long flags;
 	struct console *bcon = NULL;
 	int err;
 
@@ -3030,9 +3021,9 @@ void register_console(struct console *newcon)
 		exclusive_console_stop_seq = console_seq;
 
 		/* Get a consistent copy of @syslog_seq. */
-		raw_spin_lock_irqsave(&syslog_lock, flags);
+		mutex_lock(&syslog_lock);
 		console_seq = syslog_seq;
-		raw_spin_unlock_irqrestore(&syslog_lock, flags);
+		mutex_unlock(&syslog_lock);
 	}
 	console_unlock();
 	console_sysfs_notify();
-- 
2.20.1


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH printk v2 5/5] printk: syslog: close window between wait and read
  2021-03-30 15:35 ` John Ogness
                   ` (5 preceding siblings ...)
  (?)
@ 2021-03-30 15:35 ` John Ogness
  -1 siblings, 0 replies; 31+ messages in thread
From: John Ogness @ 2021-03-30 15:35 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Steven Rostedt,
	Thomas Gleixner, linux-kernel

Syslog's SYSLOG_ACTION_READ is supposed to block until the next
syslog record can be read, and then it should read that record.
However, because @syslog_lock is not held between waking up and
reading the record, another reader could read the record first,
thus causing SYSLOG_ACTION_READ to return with a value of 0, never
having read _anything_.

By holding @syslog_lock between waking up and reading, it can be
guaranteed that SYSLOG_ACTION_READ blocks until it successfully
reads a syslog record (or a real error occurs).

Signed-off-by: John Ogness <john.ogness@linutronix.de>
---
 kernel/printk/printk.c | 50 +++++++++++++++++++++++++++++++-----------
 1 file changed, 37 insertions(+), 13 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index b771aae46445..bd23f00ebc32 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1486,6 +1486,7 @@ static int syslog_print(char __user *buf, int size)
 	struct printk_record r;
 	char *text;
 	int len = 0;
+	u64 seq;
 
 	text = kmalloc(CONSOLE_LOG_MAX, GFP_KERNEL);
 	if (!text)
@@ -1493,11 +1494,38 @@ static int syslog_print(char __user *buf, int size)
 
 	prb_rec_init_rd(&r, &info, text, CONSOLE_LOG_MAX);
 
-	while (size > 0) {
+	/* Get a consistent copy of @syslog_seq. */
+	mutex_lock(&syslog_lock);
+	seq = syslog_seq;
+	mutex_unlock(&syslog_lock);
+
+	/* Wait for the @syslog_seq record to be available. */
+	for (;;) {
+		len = wait_event_interruptible(log_wait, prb_read_valid(prb, seq, NULL));
+		if (len)
+			goto out;
+
+		/*
+		 * @syslog_seq may have changed while waiting. If so, wait
+		 * for the new @syslog_seq record.
+		 */
+
+		mutex_lock(&syslog_lock);
+		if (syslog_seq == seq)
+			break;
+		seq = syslog_seq;
+		mutex_unlock(&syslog_lock);
+	}
+
+	/*
+	 * @syslog_lock is held when entering the read loop to prevent
+	 * another reader from modifying @syslog_seq.
+	 */
+
+	for (;;) {
 		size_t n;
 		size_t skip;
 
-		mutex_lock(&syslog_lock);
 		if (!prb_read_valid(prb, syslog_seq, &r)) {
 			mutex_unlock(&syslog_lock);
 			break;
@@ -1542,8 +1570,13 @@ static int syslog_print(char __user *buf, int size)
 		len += n;
 		size -= n;
 		buf += n;
-	}
 
+		if (!size)
+			break;
+
+		mutex_lock(&syslog_lock);
+	}
+out:
 	kfree(text);
 	return len;
 }
@@ -1614,7 +1647,6 @@ int do_syslog(int type, char __user *buf, int len, int source)
 	bool clear = false;
 	static int saved_console_loglevel = LOGLEVEL_DEFAULT;
 	int error;
-	u64 seq;
 
 	error = check_syslog_permissions(type, source);
 	if (error)
@@ -1632,15 +1664,6 @@ int do_syslog(int type, char __user *buf, int len, int source)
 			return 0;
 		if (!access_ok(buf, len))
 			return -EFAULT;
-
-		/* Get a consistent copy of @syslog_seq. */
-		mutex_lock(&syslog_lock);
-		seq = syslog_seq;
-		mutex_unlock(&syslog_lock);
-
-		error = wait_event_interruptible(log_wait, prb_read_valid(prb, seq, NULL));
-		if (error)
-			return error;
 		error = syslog_print(buf, len);
 		break;
 	/* Read/clear last kernel messages */
@@ -1707,6 +1730,7 @@ int do_syslog(int type, char __user *buf, int len, int source)
 		} else {
 			bool time = syslog_partial ? syslog_time : printk_time;
 			unsigned int line_count;
+			u64 seq;
 
 			prb_for_each_info(syslog_seq, prb, seq, &info,
 					  &line_count) {
-- 
2.20.1


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH printk v2 2/5] printk: remove safe buffers
  2021-03-30 15:35   ` John Ogness
@ 2021-03-31  7:59     ` John Ogness
  -1 siblings, 0 replies; 31+ messages in thread
From: John Ogness @ 2021-03-31  7:59 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Steven Rostedt,
	Thomas Gleixner, linux-kernel, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, Eric Biederman,
	Christophe Leroy, Nicholas Piggin, Alistair Popple,
	Jordan Niethe, Peter Zijlstra, Aneesh Kumar K.V, Andrew Morton,
	Kees Cook, Tiezhu Yang, Alexey Kardashevskiy, Yue Hu,
	Rafael Aquini, Guilherme G. Piccoli, Paul E. McKenney,
	linuxppc-dev, kexec

On 2021-03-30, John Ogness <john.ogness@linutronix.de> wrote:
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index e971c0a9ec9e..f090d6a1b39e 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -1772,16 +1759,21 @@ static struct task_struct *console_owner;
>  static bool console_waiter;
>  
>  /**
> - * console_lock_spinning_enable - mark beginning of code where another
> + * console_lock_spinning_enable_irqsave - mark beginning of code where another
>   *	thread might safely busy wait
>   *
>   * This basically converts console_lock into a spinlock. This marks
>   * the section where the console_lock owner can not sleep, because
>   * there may be a waiter spinning (like a spinlock). Also it must be
>   * ready to hand over the lock at the end of the section.
> + *
> + * This disables interrupts because the hand over to a waiter must not be
> + * interrupted until the hand over is completed (@console_waiter is cleared).
>   */
> -static void console_lock_spinning_enable(void)
> +static void console_lock_spinning_enable_irqsave(unsigned long *flags)

I missed the prototype change for the !CONFIG_PRINTK case, resulting in:

linux/kernel/printk/printk.c:2707:3: error: implicit declaration of function ‘console_lock_spinning_enable_irqsave’; did you mean ‘console_lock_spinning_enable’? [-Werror=implicit-function-declaration]
   console_lock_spinning_enable_irqsave(&flags);
   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   console_lock_spinning_enable

Will be fixed for v3.

(I have now officially added !CONFIG_PRINTK to my CI tests.)

John Ogness

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH printk v2 2/5] printk: remove safe buffers
@ 2021-03-31  7:59     ` John Ogness
  0 siblings, 0 replies; 31+ messages in thread
From: John Ogness @ 2021-03-31  7:59 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Alexey Kardashevskiy, Paul Mackerras,
	Tiezhu Yang, Rafael Aquini, Aneesh Kumar K.V, Peter Zijlstra,
	Yue Hu, Jordan Niethe, Kees Cook, Paul E. McKenney,
	Alistair Popple, Guilherme G. Piccoli, Nicholas Piggin,
	Steven Rostedt, Thomas Gleixner, kexec, linux-kernel,
	Sergey Senozhatsky, Eric Biederman, Andrew Morton, linuxppc-dev

On 2021-03-30, John Ogness <john.ogness@linutronix.de> wrote:
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index e971c0a9ec9e..f090d6a1b39e 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -1772,16 +1759,21 @@ static struct task_struct *console_owner;
>  static bool console_waiter;
>  
>  /**
> - * console_lock_spinning_enable - mark beginning of code where another
> + * console_lock_spinning_enable_irqsave - mark beginning of code where another
>   *	thread might safely busy wait
>   *
>   * This basically converts console_lock into a spinlock. This marks
>   * the section where the console_lock owner can not sleep, because
>   * there may be a waiter spinning (like a spinlock). Also it must be
>   * ready to hand over the lock at the end of the section.
> + *
> + * This disables interrupts because the hand over to a waiter must not be
> + * interrupted until the hand over is completed (@console_waiter is cleared).
>   */
> -static void console_lock_spinning_enable(void)
> +static void console_lock_spinning_enable_irqsave(unsigned long *flags)

I missed the prototype change for the !CONFIG_PRINTK case, resulting in:

linux/kernel/printk/printk.c:2707:3: error: implicit declaration of function ‘console_lock_spinning_enable_irqsave’; did you mean ‘console_lock_spinning_enable’? [-Werror=implicit-function-declaration]
   console_lock_spinning_enable_irqsave(&flags);
   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   console_lock_spinning_enable

Will be fixed for v3.

(I have now officially added !CONFIG_PRINTK to my CI tests.)

John Ogness

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH printk v2 1/5] printk: track/limit recursion
  2021-03-30 15:35 ` [PATCH printk v2 1/5] printk: track/limit recursion John Ogness
@ 2021-04-01 10:00   ` Petr Mladek
  2021-04-02  2:03     ` Sergey Senozhatsky
  0 siblings, 1 reply; 31+ messages in thread
From: Petr Mladek @ 2021-04-01 10:00 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Steven Rostedt,
	Thomas Gleixner, linux-kernel

On Tue 2021-03-30 17:35:08, John Ogness wrote:
> Currently the printk safe buffers provide a form of recursion
> protection by redirecting to the safe buffers whenever printk() is
> recursively called.
> 
> In preparation for removal of the safe buffers, provide an alternate
> explicit recursion protection. Recursion is limited to 3 levels
> per-CPU and per-context.
> 
> Signed-off-by: John Ogness <john.ogness@linutronix.de>

Reviewed-by: Petr Mladek <pmladek@suse.com>

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH printk v2 2/5] printk: remove safe buffers
  2021-03-30 15:35   ` John Ogness
@ 2021-04-01 12:21     ` Petr Mladek
  -1 siblings, 0 replies; 31+ messages in thread
From: Petr Mladek @ 2021-04-01 12:21 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Steven Rostedt,
	Thomas Gleixner, linux-kernel, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, Eric Biederman,
	Christophe Leroy, Nicholas Piggin, Alistair Popple,
	Jordan Niethe, Peter Zijlstra, Aneesh Kumar K.V, Andrew Morton,
	Kees Cook, Tiezhu Yang, Alexey Kardashevskiy, Yue Hu,
	Rafael Aquini, Guilherme G. Piccoli, Paul E. McKenney,
	linuxppc-dev, kexec

On Tue 2021-03-30 17:35:09, John Ogness wrote:
> With @logbuf_lock removed, the high level printk functions for
> storing messages are lockless. Messages can be stored from any
> context, so there is no need for the NMI and safe buffers anymore.
> Remove the NMI and safe buffers.
> 
> Although the safe buffers are removed, the NMI and safe context
> tracking is still in place. In these contexts, store the message
> immediately but still use irq_work to defer the console printing.
> 
> Since printk recursion tracking is in place, safe context tracking
> for most of printk is not needed. Remove it. Only safe context
> tracking relating to the console lock is left in place. This is
> because the console lock is needed for the actual printing.

> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -1142,24 +1128,37 @@ void __init setup_log_buf(int early)
>  		 new_descs, ilog2(new_descs_count),
>  		 new_infos);
>  
> -	printk_safe_enter_irqsave(flags);
> +	local_irq_save(flags);

IMHO, we actually do not have to disable IRQ here. We already copy
messages that might appear in the small race window in NMI. It would work
the same way also for IRQ context.

>  	log_buf_len = new_log_buf_len;
>  	log_buf = new_log_buf;
>  	new_log_buf_len = 0;
>  
>  	free = __LOG_BUF_LEN;
> -	prb_for_each_record(0, &printk_rb_static, seq, &r)
> -		free -= add_to_rb(&printk_rb_dynamic, &r);
> +	prb_for_each_record(0, &printk_rb_static, seq, &r) {
> +		text_size = add_to_rb(&printk_rb_dynamic, &r);
> +		if (text_size > free)
> +			free = 0;
> +		else
> +			free -= text_size;
> +	}
>  
> -	/*
> -	 * This is early enough that everything is still running on the
> -	 * boot CPU and interrupts are disabled. So no new messages will
> -	 * appear during the transition to the dynamic buffer.
> -	 */
>  	prb = &printk_rb_dynamic;
>  
> -	printk_safe_exit_irqrestore(flags);
> +	local_irq_restore(flags);
> +
> +	/*
> +	 * Copy any remaining messages that might have appeared from
> +	 * NMI context after copying but before switching to the
> +	 * dynamic buffer.

The above comment would need to get updated if we keep also normal
IRQ enabled when copying the log buffers.

> +	 */
> +	prb_for_each_record(seq, &printk_rb_static, seq, &r) {
> +		text_size = add_to_rb(&printk_rb_dynamic, &r);
> +		if (text_size > free)
> +			free = 0;
> +		else
> +			free -= text_size;
> +	}
>  
>  	if (seq != prb_next_seq(&printk_rb_static)) {
>  		pr_err("dropped %llu messages\n",

> --- a/lib/nmi_backtrace.c
> +++ b/lib/nmi_backtrace.c
> @@ -75,12 +75,6 @@ void nmi_trigger_cpumask_backtrace(const cpumask_t *mask,
>  		touch_softlockup_watchdog();
>  	}
>  
> -	/*
> -	 * Force flush any remote buffers that might be stuck in IRQ context
> -	 * and therefore could not run their irq_work.
> -	 */
> -	printk_safe_flush();

Sigh, this reminds me that the nmi_safe buffers serialized backtraces
from all CPUs.

I am afraid that we have to put back the spinlock into
nmi_cpu_backtrace(). It has been repeatedly added and removed depending
on whether the backtrace was printed into the main log buffer
or into the per-CPU buffers. Last time it was removed by
the commit 03fc7f9c99c1e7ae2925d ("printk/nmi: Prevent deadlock
when accessing the main log buffer in NMI").

It should be safe because there should not be any other locks in the
code path. Note that only one backtrace might be triggered at the same
time, see @backtrace_flag in nmi_trigger_cpumask_backtrace().

We _must_ serialize it somehow[*]. The lock in nmi_cpu_backtrace()
looks less evil than the nmi_safe machinery. nmi_safe() shrinks
too long backtraces, lose timestamps, needs to be explicitely
flushed here and there, is a non-trivial code.

[*] Non-serialized bactraces are real mess. Caller-id is visible
    only on consoles or via syslogd interface. And it is not much
    convenient.

    I get this with "echo l >/proc/sysrq-trigger" and this patchset:

[   95.642793] sysrq: Show backtrace of all active CPUs
[   95.645202] NMI backtrace for cpu 3
[   95.646778] CPU: 3 PID: 5095 Comm: bash Kdump: loaded Tainted: G        W         5.11.0-default+ #231
[   95.650397] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014
[   95.656497] Call Trace:
[   95.657937]  dump_stack+0x88/0xab
[   95.659888]  nmi_cpu_backtrace+0xa4/0xc0
[   95.661744]  ? lapic_can_unplug_cpu+0xa0/0xa0
[   95.663658]  nmi_trigger_cpumask_backtrace+0xe6/0x120
[   95.665657]  arch_trigger_cpumask_backtrace+0x19/0x20
[   95.667720]  sysrq_handle_showallcpus+0x17/0x20
[   95.670218]  __handle_sysrq+0xe1/0x240
[   95.672190]  write_sysrq_trigger+0x51/0x60
[   95.673993]  proc_reg_write+0x62/0x90
[   95.675319]  vfs_write+0xed/0x380
[   95.676636]  ksys_write+0xad/0xf0
[   95.677835]  __x64_sys_write+0x1a/0x20
[   95.678722]  do_syscall_64+0x37/0x50
[   95.679525]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[   95.680571] RIP: 0033:0x7f3cbc2b3d44
[   95.681380] Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 80 00 00 00 00 8b 05 ea fa 2c 00 48 63 ff 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 f3 c3 66 90 55 53 48 89 d5 48 89 f3 48 83
[   95.684456] RSP: 002b:00007ffe29f06018 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[   95.686029] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f3cbc2b3d44
[   95.687346] RDX: 0000000000000002 RSI: 000055ad7117b420 RDI: 0000000000000001
[   95.688690] RBP: 000055ad7117b420 R08: 000000000000000a R09: 0000000000000000
[   95.690071] R10: 000000000000000a R11: 0000000000000246 R12: 0000000000000002
[   95.691243] R13: 0000000000000001 R14: 00007f3cbc57f720 R15: 0000000000000002
[   95.692318] Sending NMI from CPU 3 to CPUs 0-2:
[   95.693014] NMI backtrace for cpu 2
[   95.693014] NMI backtrace for cpu 1
[   95.693016] CPU: 2 PID: 0 Comm: swapper/2 Kdump: loaded Tainted: G        W         5.11.0-default+ #231
[   95.693014] NMI backtrace for cpu 0 skipped: idling at native_safe_halt+0x12/0x20
[   95.693016] CPU: 1 PID: 448 Comm: systemd-journal Kdump: loaded Tainted: G        W         5.11.0-default+ #231
[   95.693018] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014
[   95.693019] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014
[   95.693020] RIP: 0010:ttwu_do_wakeup+0x1aa/0x220
[   95.693021] RIP: 0010:inode_permission+0x1d/0x150
[   95.693025] Code: f0 48 39 c1 72 1b 48 89 83 c8 0b 00 00 48 c7 83 c0 0b 00 00 00 00 00 00 5b 41 5c 41 5d 41 5e 5d c3 48 89 8b c8 0b 00 00 eb e3 <48> 8d 7b 18 be ff ff ff ff e8 68 95 c4 00 85 c0 75 80 0f 0b e9 79
[   95.693025] Code: ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 56 41 55 41 89 f5 41 54 53 41 83 e5 02 75 43 f6 47 02 01 <41> 89 f4 48 89 fb 0f 84 80 00 00 00 44 89 e6 48 89 df e8 dc fd ff
[   95.693027] RSP: 0018:ffffbae700120ef8 EFLAGS: 00000002
[   95.693028] RSP: 0018:ffffbae7003ebea0 EFLAGS: 00000202
[   95.693028] RAX: 0000000000000001 RBX: ffff9eb2ffbebec0 RCX: 0000000000000000
[   95.693030] RDX: 0000000000010003 RSI: ffffffffaa6a9860 RDI: ffff9eb2803351d0
[   95.693031] RAX: ffff9eb28229fa80 RBX: 0000000000000001 RCX: 0000000000000000
[   95.693031] RBP: ffffbae700120f18 R08: 0000000000000001 R09: 0000000000000001
[   95.693032] R10: ffff9eb282f00850 R11: 000000000000028d R12: ffff9eb282f00780
[   95.693032] RDX: 0000000000000001 RSI: 0000000000000010 RDI: ffff9eb286df5298
[   95.693033] R13: ffffbae700120f60 R14: ffffbae700120f60 R15: 0000000000000000
[   95.693034] RBP: ffffbae7003ebec0 R08: 0000000000000001 R09: 0000000000000001
[   95.693035] FS:  0000000000000000(0000) GS:ffff9eb2ffa00000(0000) knlGS:0000000000000000
[   95.693036] R10: ffffbae7003ebea8 R11: 0000000000000001 R12: 0000000000000000
[   95.693037] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   95.693037] R13: 0000000000000000 R14: 000055a9f3817900 R15: 0000000000000000
[   95.693038] CR2: 000055e2be208280 CR3: 0000000103838003 CR4: 0000000000370ee0
[   95.693039] FS:  00007f1682ccb1c0(0000) GS:ffff9eb2ff800000(0000) knlGS:0000000000000000
[   95.693041] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   95.693042] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   95.693043] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   95.693043] CR2: 00007f3cbbfb36a8 CR3: 0000000102708005 CR4: 0000000000370ee0
[   95.693044] Call Trace:
[   95.693045]  <IRQ>
[   95.693047] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   95.693047]  ttwu_do_activate+0x90/0x190
[   95.693049] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   95.693050] Call Trace:
[   95.693051]  sched_ttwu_pending+0xe6/0x180
[   95.693054]  do_faccessat+0xbb/0x260
[   95.693055]  flush_smp_call_function_queue+0x117/0x220
[   95.693058]  generic_smp_call_function_single_interrupt+0x13/0x20
[   95.693060]  __x64_sys_access+0x1d/0x20
[   95.693060]  __sysvec_call_function_single+0x47/0x190
[   95.693063]  asm_call_irq_on_stack+0xf/0x20
[   95.693064]  do_syscall_64+0x37/0x50
[   95.693066]  </IRQ>
[   95.693067]  sysvec_call_function_single+0x6d/0xb0
[   95.693068]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[   95.693070]  asm_sysvec_call_function_single+0x12/0x20
[   95.693071] RIP: 0033:0x7f1681d6be1a
[   95.693072] RIP: 0010:native_safe_halt+0x12/0x20
[   95.693074] Code: 48 8b 15 81 a0 2c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 63 f6 b8 15 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 06 f3 c3 0f 1f 40 00 48 8b 15 49 a0 2c 00 f7
[   95.693074] Code: 00 0f 00 2d 92 4f 48 00 f4 5d c3 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 55 48 89 e5 e9 07 00 00 00 0f 00 2d 72 4f 48 00 fb f4 <5d> c3 cc cc cc cc cc cc cc cc cc cc cc cc 0f 1f 44 00 00 55 65 8b
[   95.693076] RSP: 002b:00007fff61a07828 EFLAGS: 00000246
[   95.693078] RSP: 0018:ffffbae7000b7e90 EFLAGS: 00000206
[   95.693080] RAX: ffffffffa8d86240 RBX: 0000000000000002 RCX: 0000000000000000
[   95.693080] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffffa8d865f5
[   95.693078]  ORIG_RAX: 0000000000000015
[   95.693081] RBP: ffffbae7000b7e90 R08: 0000000000000001 R09: 0000000000000001
[   95.693082] R10: 0000000000000000 R11: 0000000000000001 R12: ffffffffaa9bdf60
[   95.693082] RAX: ffffffffffffffda RBX: 00007fff61a0a550 RCX: 00007f1681d6be1a
[   95.693083] R13: 0000000000000000 R14: 0000000000000000 R15: ffff9eb280334400
[   95.693084] RDX: 00007f16827d70e0 RSI: 0000000000000000 RDI: 000055a9f3817900
[   95.693085] RBP: 00007fff61a07870 R08: 0000000000000000 R09: 0000000000000000
[   95.693085]  ? __cpuidle_text_start+0x8/0x8
[   95.693086] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[   95.693087] R13: 0000000000000000 R14: 00007fff61a07970 R15: 0000000000000000
[   95.693087]  ? default_idle_call+0x45/0x200
[   95.693091]  default_idle+0xe/0x20
[   95.693093]  arch_cpu_idle+0x15/0x20
[   95.693094]  default_idle_call+0x6c/0x200
[   95.693096]  do_idle+0x1fb/0x2e0
[   95.693098]  ? do_idle+0x1d9/0x2e0
[   95.693100]  cpu_startup_entry+0x1d/0x20
[   95.693102]  start_secondary+0x12b/0x160
[   95.693105]  secondary_startup_64_no_verify+0xc2/0xcb


Otherwise, I really love this patch removing a lot of tricky code.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH printk v2 2/5] printk: remove safe buffers
@ 2021-04-01 12:21     ` Petr Mladek
  0 siblings, 0 replies; 31+ messages in thread
From: Petr Mladek @ 2021-04-01 12:21 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Alexey Kardashevskiy, Paul Mackerras,
	Tiezhu Yang, Rafael Aquini, Aneesh Kumar K.V, Peter Zijlstra,
	Yue Hu, Jordan Niethe, Kees Cook, Paul E. McKenney,
	Alistair Popple, Guilherme G. Piccoli, Nicholas Piggin,
	Steven Rostedt, Thomas Gleixner, kexec, linux-kernel,
	Sergey Senozhatsky, Eric Biederman, Andrew Morton, linuxppc-dev

On Tue 2021-03-30 17:35:09, John Ogness wrote:
> With @logbuf_lock removed, the high level printk functions for
> storing messages are lockless. Messages can be stored from any
> context, so there is no need for the NMI and safe buffers anymore.
> Remove the NMI and safe buffers.
> 
> Although the safe buffers are removed, the NMI and safe context
> tracking is still in place. In these contexts, store the message
> immediately but still use irq_work to defer the console printing.
> 
> Since printk recursion tracking is in place, safe context tracking
> for most of printk is not needed. Remove it. Only safe context
> tracking relating to the console lock is left in place. This is
> because the console lock is needed for the actual printing.

> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -1142,24 +1128,37 @@ void __init setup_log_buf(int early)
>  		 new_descs, ilog2(new_descs_count),
>  		 new_infos);
>  
> -	printk_safe_enter_irqsave(flags);
> +	local_irq_save(flags);

IMHO, we actually do not have to disable IRQ here. We already copy
messages that might appear in the small race window in NMI. It would work
the same way also for IRQ context.

>  	log_buf_len = new_log_buf_len;
>  	log_buf = new_log_buf;
>  	new_log_buf_len = 0;
>  
>  	free = __LOG_BUF_LEN;
> -	prb_for_each_record(0, &printk_rb_static, seq, &r)
> -		free -= add_to_rb(&printk_rb_dynamic, &r);
> +	prb_for_each_record(0, &printk_rb_static, seq, &r) {
> +		text_size = add_to_rb(&printk_rb_dynamic, &r);
> +		if (text_size > free)
> +			free = 0;
> +		else
> +			free -= text_size;
> +	}
>  
> -	/*
> -	 * This is early enough that everything is still running on the
> -	 * boot CPU and interrupts are disabled. So no new messages will
> -	 * appear during the transition to the dynamic buffer.
> -	 */
>  	prb = &printk_rb_dynamic;
>  
> -	printk_safe_exit_irqrestore(flags);
> +	local_irq_restore(flags);
> +
> +	/*
> +	 * Copy any remaining messages that might have appeared from
> +	 * NMI context after copying but before switching to the
> +	 * dynamic buffer.

The above comment would need to get updated if we keep also normal
IRQ enabled when copying the log buffers.

> +	 */
> +	prb_for_each_record(seq, &printk_rb_static, seq, &r) {
> +		text_size = add_to_rb(&printk_rb_dynamic, &r);
> +		if (text_size > free)
> +			free = 0;
> +		else
> +			free -= text_size;
> +	}
>  
>  	if (seq != prb_next_seq(&printk_rb_static)) {
>  		pr_err("dropped %llu messages\n",

> --- a/lib/nmi_backtrace.c
> +++ b/lib/nmi_backtrace.c
> @@ -75,12 +75,6 @@ void nmi_trigger_cpumask_backtrace(const cpumask_t *mask,
>  		touch_softlockup_watchdog();
>  	}
>  
> -	/*
> -	 * Force flush any remote buffers that might be stuck in IRQ context
> -	 * and therefore could not run their irq_work.
> -	 */
> -	printk_safe_flush();

Sigh, this reminds me that the nmi_safe buffers serialized backtraces
from all CPUs.

I am afraid that we have to put back the spinlock into
nmi_cpu_backtrace(). It has been repeatedly added and removed depending
on whether the backtrace was printed into the main log buffer
or into the per-CPU buffers. Last time it was removed by
the commit 03fc7f9c99c1e7ae2925d ("printk/nmi: Prevent deadlock
when accessing the main log buffer in NMI").

It should be safe because there should not be any other locks in the
code path. Note that only one backtrace might be triggered at the same
time, see @backtrace_flag in nmi_trigger_cpumask_backtrace().

We _must_ serialize it somehow[*]. The lock in nmi_cpu_backtrace()
looks less evil than the nmi_safe machinery. nmi_safe() shrinks
too long backtraces, lose timestamps, needs to be explicitely
flushed here and there, is a non-trivial code.

[*] Non-serialized bactraces are real mess. Caller-id is visible
    only on consoles or via syslogd interface. And it is not much
    convenient.

    I get this with "echo l >/proc/sysrq-trigger" and this patchset:

[   95.642793] sysrq: Show backtrace of all active CPUs
[   95.645202] NMI backtrace for cpu 3
[   95.646778] CPU: 3 PID: 5095 Comm: bash Kdump: loaded Tainted: G        W         5.11.0-default+ #231
[   95.650397] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014
[   95.656497] Call Trace:
[   95.657937]  dump_stack+0x88/0xab
[   95.659888]  nmi_cpu_backtrace+0xa4/0xc0
[   95.661744]  ? lapic_can_unplug_cpu+0xa0/0xa0
[   95.663658]  nmi_trigger_cpumask_backtrace+0xe6/0x120
[   95.665657]  arch_trigger_cpumask_backtrace+0x19/0x20
[   95.667720]  sysrq_handle_showallcpus+0x17/0x20
[   95.670218]  __handle_sysrq+0xe1/0x240
[   95.672190]  write_sysrq_trigger+0x51/0x60
[   95.673993]  proc_reg_write+0x62/0x90
[   95.675319]  vfs_write+0xed/0x380
[   95.676636]  ksys_write+0xad/0xf0
[   95.677835]  __x64_sys_write+0x1a/0x20
[   95.678722]  do_syscall_64+0x37/0x50
[   95.679525]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[   95.680571] RIP: 0033:0x7f3cbc2b3d44
[   95.681380] Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 80 00 00 00 00 8b 05 ea fa 2c 00 48 63 ff 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 f3 c3 66 90 55 53 48 89 d5 48 89 f3 48 83
[   95.684456] RSP: 002b:00007ffe29f06018 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[   95.686029] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f3cbc2b3d44
[   95.687346] RDX: 0000000000000002 RSI: 000055ad7117b420 RDI: 0000000000000001
[   95.688690] RBP: 000055ad7117b420 R08: 000000000000000a R09: 0000000000000000
[   95.690071] R10: 000000000000000a R11: 0000000000000246 R12: 0000000000000002
[   95.691243] R13: 0000000000000001 R14: 00007f3cbc57f720 R15: 0000000000000002
[   95.692318] Sending NMI from CPU 3 to CPUs 0-2:
[   95.693014] NMI backtrace for cpu 2
[   95.693014] NMI backtrace for cpu 1
[   95.693016] CPU: 2 PID: 0 Comm: swapper/2 Kdump: loaded Tainted: G        W         5.11.0-default+ #231
[   95.693014] NMI backtrace for cpu 0 skipped: idling at native_safe_halt+0x12/0x20
[   95.693016] CPU: 1 PID: 448 Comm: systemd-journal Kdump: loaded Tainted: G        W         5.11.0-default+ #231
[   95.693018] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014
[   95.693019] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014
[   95.693020] RIP: 0010:ttwu_do_wakeup+0x1aa/0x220
[   95.693021] RIP: 0010:inode_permission+0x1d/0x150
[   95.693025] Code: f0 48 39 c1 72 1b 48 89 83 c8 0b 00 00 48 c7 83 c0 0b 00 00 00 00 00 00 5b 41 5c 41 5d 41 5e 5d c3 48 89 8b c8 0b 00 00 eb e3 <48> 8d 7b 18 be ff ff ff ff e8 68 95 c4 00 85 c0 75 80 0f 0b e9 79
[   95.693025] Code: ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 56 41 55 41 89 f5 41 54 53 41 83 e5 02 75 43 f6 47 02 01 <41> 89 f4 48 89 fb 0f 84 80 00 00 00 44 89 e6 48 89 df e8 dc fd ff
[   95.693027] RSP: 0018:ffffbae700120ef8 EFLAGS: 00000002
[   95.693028] RSP: 0018:ffffbae7003ebea0 EFLAGS: 00000202
[   95.693028] RAX: 0000000000000001 RBX: ffff9eb2ffbebec0 RCX: 0000000000000000
[   95.693030] RDX: 0000000000010003 RSI: ffffffffaa6a9860 RDI: ffff9eb2803351d0
[   95.693031] RAX: ffff9eb28229fa80 RBX: 0000000000000001 RCX: 0000000000000000
[   95.693031] RBP: ffffbae700120f18 R08: 0000000000000001 R09: 0000000000000001
[   95.693032] R10: ffff9eb282f00850 R11: 000000000000028d R12: ffff9eb282f00780
[   95.693032] RDX: 0000000000000001 RSI: 0000000000000010 RDI: ffff9eb286df5298
[   95.693033] R13: ffffbae700120f60 R14: ffffbae700120f60 R15: 0000000000000000
[   95.693034] RBP: ffffbae7003ebec0 R08: 0000000000000001 R09: 0000000000000001
[   95.693035] FS:  0000000000000000(0000) GS:ffff9eb2ffa00000(0000) knlGS:0000000000000000
[   95.693036] R10: ffffbae7003ebea8 R11: 0000000000000001 R12: 0000000000000000
[   95.693037] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   95.693037] R13: 0000000000000000 R14: 000055a9f3817900 R15: 0000000000000000
[   95.693038] CR2: 000055e2be208280 CR3: 0000000103838003 CR4: 0000000000370ee0
[   95.693039] FS:  00007f1682ccb1c0(0000) GS:ffff9eb2ff800000(0000) knlGS:0000000000000000
[   95.693041] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   95.693042] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   95.693043] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   95.693043] CR2: 00007f3cbbfb36a8 CR3: 0000000102708005 CR4: 0000000000370ee0
[   95.693044] Call Trace:
[   95.693045]  <IRQ>
[   95.693047] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   95.693047]  ttwu_do_activate+0x90/0x190
[   95.693049] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   95.693050] Call Trace:
[   95.693051]  sched_ttwu_pending+0xe6/0x180
[   95.693054]  do_faccessat+0xbb/0x260
[   95.693055]  flush_smp_call_function_queue+0x117/0x220
[   95.693058]  generic_smp_call_function_single_interrupt+0x13/0x20
[   95.693060]  __x64_sys_access+0x1d/0x20
[   95.693060]  __sysvec_call_function_single+0x47/0x190
[   95.693063]  asm_call_irq_on_stack+0xf/0x20
[   95.693064]  do_syscall_64+0x37/0x50
[   95.693066]  </IRQ>
[   95.693067]  sysvec_call_function_single+0x6d/0xb0
[   95.693068]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[   95.693070]  asm_sysvec_call_function_single+0x12/0x20
[   95.693071] RIP: 0033:0x7f1681d6be1a
[   95.693072] RIP: 0010:native_safe_halt+0x12/0x20
[   95.693074] Code: 48 8b 15 81 a0 2c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 63 f6 b8 15 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 06 f3 c3 0f 1f 40 00 48 8b 15 49 a0 2c 00 f7
[   95.693074] Code: 00 0f 00 2d 92 4f 48 00 f4 5d c3 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 55 48 89 e5 e9 07 00 00 00 0f 00 2d 72 4f 48 00 fb f4 <5d> c3 cc cc cc cc cc cc cc cc cc cc cc cc 0f 1f 44 00 00 55 65 8b
[   95.693076] RSP: 002b:00007fff61a07828 EFLAGS: 00000246
[   95.693078] RSP: 0018:ffffbae7000b7e90 EFLAGS: 00000206
[   95.693080] RAX: ffffffffa8d86240 RBX: 0000000000000002 RCX: 0000000000000000
[   95.693080] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffffa8d865f5
[   95.693078]  ORIG_RAX: 0000000000000015
[   95.693081] RBP: ffffbae7000b7e90 R08: 0000000000000001 R09: 0000000000000001
[   95.693082] R10: 0000000000000000 R11: 0000000000000001 R12: ffffffffaa9bdf60
[   95.693082] RAX: ffffffffffffffda RBX: 00007fff61a0a550 RCX: 00007f1681d6be1a
[   95.693083] R13: 0000000000000000 R14: 0000000000000000 R15: ffff9eb280334400
[   95.693084] RDX: 00007f16827d70e0 RSI: 0000000000000000 RDI: 000055a9f3817900
[   95.693085] RBP: 00007fff61a07870 R08: 0000000000000000 R09: 0000000000000000
[   95.693085]  ? __cpuidle_text_start+0x8/0x8
[   95.693086] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[   95.693087] R13: 0000000000000000 R14: 00007fff61a07970 R15: 0000000000000000
[   95.693087]  ? default_idle_call+0x45/0x200
[   95.693091]  default_idle+0xe/0x20
[   95.693093]  arch_cpu_idle+0x15/0x20
[   95.693094]  default_idle_call+0x6c/0x200
[   95.693096]  do_idle+0x1fb/0x2e0
[   95.693098]  ? do_idle+0x1d9/0x2e0
[   95.693100]  cpu_startup_entry+0x1d/0x20
[   95.693102]  start_secondary+0x12b/0x160
[   95.693105]  secondary_startup_64_no_verify+0xc2/0xcb


Otherwise, I really love this patch removing a lot of tricky code.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH printk v2 2/5] printk: remove safe buffers
  2021-04-01 12:21     ` Petr Mladek
@ 2021-04-01 13:19       ` John Ogness
  -1 siblings, 0 replies; 31+ messages in thread
From: John Ogness @ 2021-04-01 13:19 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Steven Rostedt,
	Thomas Gleixner, linux-kernel, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, Eric Biederman,
	Christophe Leroy, Nicholas Piggin, Alistair Popple,
	Jordan Niethe, Peter Zijlstra, Aneesh Kumar K.V, Andrew Morton,
	Kees Cook, Tiezhu Yang, Alexey Kardashevskiy, Yue Hu,
	Rafael Aquini, Guilherme G. Piccoli, Paul E. McKenney,
	linuxppc-dev, kexec

On 2021-04-01, Petr Mladek <pmladek@suse.com> wrote:
>> --- a/kernel/printk/printk.c
>> +++ b/kernel/printk/printk.c
>> @@ -1142,24 +1128,37 @@ void __init setup_log_buf(int early)
>>  		 new_descs, ilog2(new_descs_count),
>>  		 new_infos);
>>  
>> -	printk_safe_enter_irqsave(flags);
>> +	local_irq_save(flags);
>
> IMHO, we actually do not have to disable IRQ here. We already copy
> messages that might appear in the small race window in NMI. It would
> work the same way also for IRQ context.

We do not have to, but why open up this window? We are still in early
boot and interrupts have always been disabled here. I am not happy that
this window even exists. I really prefer to keep it NMI-only.

>> --- a/lib/nmi_backtrace.c
>> +++ b/lib/nmi_backtrace.c
>> @@ -75,12 +75,6 @@ void nmi_trigger_cpumask_backtrace(const cpumask_t *mask,
>>  		touch_softlockup_watchdog();
>>  	}
>>  
>> -	/*
>> -	 * Force flush any remote buffers that might be stuck in IRQ context
>> -	 * and therefore could not run their irq_work.
>> -	 */
>> -	printk_safe_flush();
>
> Sigh, this reminds me that the nmi_safe buffers serialized backtraces
> from all CPUs.
>
> I am afraid that we have to put back the spinlock into
> nmi_cpu_backtrace().

Please no. That spinlock is a disaster. It can cause deadlocks with
other cpu-locks (such as in kdb) and it will cause a major problem for
atomic consoles. We need to be very careful about introducing locks
where NMIs are waiting on other CPUs.

> It has been repeatedly added and removed depending
> on whether the backtrace was printed into the main log buffer
> or into the per-CPU buffers. Last time it was removed by
> the commit 03fc7f9c99c1e7ae2925d ("printk/nmi: Prevent deadlock
> when accessing the main log buffer in NMI").
>
> It should be safe because there should not be any other locks in the
> code path. Note that only one backtrace might be triggered at the same
> time, see @backtrace_flag in nmi_trigger_cpumask_backtrace().

It is adding a lock around a lockless ringbuffer. For me that is a step
backwards.

> We _must_ serialize it somehow[*]. The lock in nmi_cpu_backtrace()
> looks less evil than the nmi_safe machinery. nmi_safe() shrinks
> too long backtraces, lose timestamps, needs to be explicitely
> flushed here and there, is a non-trivial code.
>
> [*] Non-serialized bactraces are real mess. Caller-id is visible
>     only on consoles or via syslogd interface. And it is not much
>     convenient.

Caller-id solves this problem and is easy to sort for anyone with
`grep'. Yes, it is a shame that `dmesg' does not show it, but directly
using any of the printk interfaces does show it (kmsg_dump, /dev/kmsg,
syslog, console).

>     I get this with "echo l >/proc/sysrq-trigger" and this patchset:

Of course. Without caller-id, it is a mess. But this has nothing to do
with NMI. The same problem exists for WARN_ON() on multiple CPUs
simultaneously. If the user is not using caller-id, they are
lost. Caller-id is the current solution to the interlaced logs.

For the long term, we should introduce a printk-context API that allows
callers to perfectly pack their multi-line output into a single
entry. We discussed [0][1] this back in August 2020.

John Ogness

[0] https://lore.kernel.org/lkml/472f2e553805b52d9834d64e4056db965edee329.camel@perches.com
[1] offlist message-id: 87d03k9ymz.fsf@jogness.linutronix.de

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH printk v2 2/5] printk: remove safe buffers
@ 2021-04-01 13:19       ` John Ogness
  0 siblings, 0 replies; 31+ messages in thread
From: John Ogness @ 2021-04-01 13:19 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Alexey Kardashevskiy, Paul Mackerras,
	Tiezhu Yang, Rafael Aquini, Aneesh Kumar K.V, Peter Zijlstra,
	Yue Hu, Jordan Niethe, Kees Cook, Paul E. McKenney,
	Alistair Popple, Guilherme G. Piccoli, Nicholas Piggin,
	Steven Rostedt, Thomas Gleixner, kexec, linux-kernel,
	Sergey Senozhatsky, Eric Biederman, Andrew Morton, linuxppc-dev

On 2021-04-01, Petr Mladek <pmladek@suse.com> wrote:
>> --- a/kernel/printk/printk.c
>> +++ b/kernel/printk/printk.c
>> @@ -1142,24 +1128,37 @@ void __init setup_log_buf(int early)
>>  		 new_descs, ilog2(new_descs_count),
>>  		 new_infos);
>>  
>> -	printk_safe_enter_irqsave(flags);
>> +	local_irq_save(flags);
>
> IMHO, we actually do not have to disable IRQ here. We already copy
> messages that might appear in the small race window in NMI. It would
> work the same way also for IRQ context.

We do not have to, but why open up this window? We are still in early
boot and interrupts have always been disabled here. I am not happy that
this window even exists. I really prefer to keep it NMI-only.

>> --- a/lib/nmi_backtrace.c
>> +++ b/lib/nmi_backtrace.c
>> @@ -75,12 +75,6 @@ void nmi_trigger_cpumask_backtrace(const cpumask_t *mask,
>>  		touch_softlockup_watchdog();
>>  	}
>>  
>> -	/*
>> -	 * Force flush any remote buffers that might be stuck in IRQ context
>> -	 * and therefore could not run their irq_work.
>> -	 */
>> -	printk_safe_flush();
>
> Sigh, this reminds me that the nmi_safe buffers serialized backtraces
> from all CPUs.
>
> I am afraid that we have to put back the spinlock into
> nmi_cpu_backtrace().

Please no. That spinlock is a disaster. It can cause deadlocks with
other cpu-locks (such as in kdb) and it will cause a major problem for
atomic consoles. We need to be very careful about introducing locks
where NMIs are waiting on other CPUs.

> It has been repeatedly added and removed depending
> on whether the backtrace was printed into the main log buffer
> or into the per-CPU buffers. Last time it was removed by
> the commit 03fc7f9c99c1e7ae2925d ("printk/nmi: Prevent deadlock
> when accessing the main log buffer in NMI").
>
> It should be safe because there should not be any other locks in the
> code path. Note that only one backtrace might be triggered at the same
> time, see @backtrace_flag in nmi_trigger_cpumask_backtrace().

It is adding a lock around a lockless ringbuffer. For me that is a step
backwards.

> We _must_ serialize it somehow[*]. The lock in nmi_cpu_backtrace()
> looks less evil than the nmi_safe machinery. nmi_safe() shrinks
> too long backtraces, lose timestamps, needs to be explicitely
> flushed here and there, is a non-trivial code.
>
> [*] Non-serialized bactraces are real mess. Caller-id is visible
>     only on consoles or via syslogd interface. And it is not much
>     convenient.

Caller-id solves this problem and is easy to sort for anyone with
`grep'. Yes, it is a shame that `dmesg' does not show it, but directly
using any of the printk interfaces does show it (kmsg_dump, /dev/kmsg,
syslog, console).

>     I get this with "echo l >/proc/sysrq-trigger" and this patchset:

Of course. Without caller-id, it is a mess. But this has nothing to do
with NMI. The same problem exists for WARN_ON() on multiple CPUs
simultaneously. If the user is not using caller-id, they are
lost. Caller-id is the current solution to the interlaced logs.

For the long term, we should introduce a printk-context API that allows
callers to perfectly pack their multi-line output into a single
entry. We discussed [0][1] this back in August 2020.

John Ogness

[0] https://lore.kernel.org/lkml/472f2e553805b52d9834d64e4056db965edee329.camel@perches.com
[1] offlist message-id: 87d03k9ymz.fsf@jogness.linutronix.de

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH printk v2 2/5] printk: remove safe buffers
  2021-04-01 13:19       ` John Ogness
@ 2021-04-01 14:17         ` Petr Mladek
  -1 siblings, 0 replies; 31+ messages in thread
From: Petr Mladek @ 2021-04-01 14:17 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Steven Rostedt,
	Thomas Gleixner, linux-kernel, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, Eric Biederman,
	Christophe Leroy, Nicholas Piggin, Alistair Popple,
	Jordan Niethe, Peter Zijlstra, Aneesh Kumar K.V, Andrew Morton,
	Kees Cook, Tiezhu Yang, Alexey Kardashevskiy, Yue Hu,
	Rafael Aquini, Guilherme G. Piccoli, Paul E. McKenney,
	linuxppc-dev, kexec

On Thu 2021-04-01 15:19:52, John Ogness wrote:
> On 2021-04-01, Petr Mladek <pmladek@suse.com> wrote:
> >> --- a/kernel/printk/printk.c
> >> +++ b/kernel/printk/printk.c
> >> @@ -1142,24 +1128,37 @@ void __init setup_log_buf(int early)
> >>  		 new_descs, ilog2(new_descs_count),
> >>  		 new_infos);
> >>  
> >> -	printk_safe_enter_irqsave(flags);
> >> +	local_irq_save(flags);
> >
> > IMHO, we actually do not have to disable IRQ here. We already copy
> > messages that might appear in the small race window in NMI. It would
> > work the same way also for IRQ context.
> 
> We do not have to, but why open up this window? We are still in early
> boot and interrupts have always been disabled here. I am not happy that
> this window even exists. I really prefer to keep it NMI-only.

Fair enough.

> >> --- a/lib/nmi_backtrace.c
> >> +++ b/lib/nmi_backtrace.c
> >> @@ -75,12 +75,6 @@ void nmi_trigger_cpumask_backtrace(const cpumask_t *mask,
> >>  		touch_softlockup_watchdog();
> >>  	}
> >>  
> >> -	/*
> >> -	 * Force flush any remote buffers that might be stuck in IRQ context
> >> -	 * and therefore could not run their irq_work.
> >> -	 */
> >> -	printk_safe_flush();
> >
> > Sigh, this reminds me that the nmi_safe buffers serialized backtraces
> > from all CPUs.
> >
> > I am afraid that we have to put back the spinlock into
> > nmi_cpu_backtrace().
> 
> Please no. That spinlock is a disaster. It can cause deadlocks with
> other cpu-locks (such as in kdb)

Could you please explain more the kdb case?
I am curious what locks might depend on each other here.

> and it will cause a major problem for atomic consoles.

AFAIK, you are going to add a special lock that would allow
nesting on the same CPU. It should possible and safe
to use is also for synchronizing the backtraces here.


> We need to be very careful about introducing locks
> where NMIs are waiting on other CPUs.

I agree.


> > It has been repeatedly added and removed depending
> > on whether the backtrace was printed into the main log buffer
> > or into the per-CPU buffers. Last time it was removed by
> > the commit 03fc7f9c99c1e7ae2925d ("printk/nmi: Prevent deadlock
> > when accessing the main log buffer in NMI").
> >
> > It should be safe because there should not be any other locks in the
> > code path. Note that only one backtrace might be triggered at the same
> > time, see @backtrace_flag in nmi_trigger_cpumask_backtrace().
> 
> It is adding a lock around a lockless ringbuffer. For me that is a step
> backwards.
> 
> > We _must_ serialize it somehow[*]. The lock in nmi_cpu_backtrace()
> > looks less evil than the nmi_safe machinery. nmi_safe() shrinks
> > too long backtraces, lose timestamps, needs to be explicitely
> > flushed here and there, is a non-trivial code.
> >
> > [*] Non-serialized bactraces are real mess. Caller-id is visible
> >     only on consoles or via syslogd interface. And it is not much
> >     convenient.
> 
> Caller-id solves this problem and is easy to sort for anyone with
> `grep'. Yes, it is a shame that `dmesg' does not show it, but directly
> using any of the printk interfaces does show it (kmsg_dump, /dev/kmsg,
> syslog, console).

True but frankly, the current situation is _far_ from convenient:

   + consoles do not show it by default
   + none userspace tool (dmesg, journalctl, crash) is able to show it
   + grep is a nightmare, especially if you have more than handful of CPUs

Yes, everything is solvable but not easily.

> >     I get this with "echo l >/proc/sysrq-trigger" and this patchset:
> 
> Of course. Without caller-id, it is a mess. But this has nothing to do
> with NMI. The same problem exists for WARN_ON() on multiple CPUs
> simultaneously. If the user is not using caller-id, they are
> lost. Caller-id is the current solution to the interlaced logs.

Sure. But in reality, the risk of mixed WARN_ONs is small. While
this patch makes backtraces from all CPUs always unusable without
caller_id and non-trivial effort.


> For the long term, we should introduce a printk-context API that allows
> callers to perfectly pack their multi-line output into a single
> entry. We discussed [0][1] this back in August 2020.

We need a "short" term solution. There are currently 3 solutions:

1. Keep nmi_safe() and all the hacks around.

2. Serialize nmi_cpu_backtrace() by a spin lock and later by
   the special lock used also by atomic consoles.

3. Tell complaining people how to sort the messed logs.


My preference:

I most prefer 2nd solution until I see a realistic scenario
of a possible deadlock with the current kernel code.

I would still prefer 1st solution over 3rd one until we improve
kernel/userspace support for sorting the log by the caller id.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH printk v2 2/5] printk: remove safe buffers
@ 2021-04-01 14:17         ` Petr Mladek
  0 siblings, 0 replies; 31+ messages in thread
From: Petr Mladek @ 2021-04-01 14:17 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Alexey Kardashevskiy, Paul Mackerras,
	Tiezhu Yang, Rafael Aquini, Aneesh Kumar K.V, Peter Zijlstra,
	Yue Hu, Jordan Niethe, Kees Cook, Paul E. McKenney,
	Alistair Popple, Guilherme G. Piccoli, Nicholas Piggin,
	Steven Rostedt, Thomas Gleixner, kexec, linux-kernel,
	Sergey Senozhatsky, Eric Biederman, Andrew Morton, linuxppc-dev

On Thu 2021-04-01 15:19:52, John Ogness wrote:
> On 2021-04-01, Petr Mladek <pmladek@suse.com> wrote:
> >> --- a/kernel/printk/printk.c
> >> +++ b/kernel/printk/printk.c
> >> @@ -1142,24 +1128,37 @@ void __init setup_log_buf(int early)
> >>  		 new_descs, ilog2(new_descs_count),
> >>  		 new_infos);
> >>  
> >> -	printk_safe_enter_irqsave(flags);
> >> +	local_irq_save(flags);
> >
> > IMHO, we actually do not have to disable IRQ here. We already copy
> > messages that might appear in the small race window in NMI. It would
> > work the same way also for IRQ context.
> 
> We do not have to, but why open up this window? We are still in early
> boot and interrupts have always been disabled here. I am not happy that
> this window even exists. I really prefer to keep it NMI-only.

Fair enough.

> >> --- a/lib/nmi_backtrace.c
> >> +++ b/lib/nmi_backtrace.c
> >> @@ -75,12 +75,6 @@ void nmi_trigger_cpumask_backtrace(const cpumask_t *mask,
> >>  		touch_softlockup_watchdog();
> >>  	}
> >>  
> >> -	/*
> >> -	 * Force flush any remote buffers that might be stuck in IRQ context
> >> -	 * and therefore could not run their irq_work.
> >> -	 */
> >> -	printk_safe_flush();
> >
> > Sigh, this reminds me that the nmi_safe buffers serialized backtraces
> > from all CPUs.
> >
> > I am afraid that we have to put back the spinlock into
> > nmi_cpu_backtrace().
> 
> Please no. That spinlock is a disaster. It can cause deadlocks with
> other cpu-locks (such as in kdb)

Could you please explain more the kdb case?
I am curious what locks might depend on each other here.

> and it will cause a major problem for atomic consoles.

AFAIK, you are going to add a special lock that would allow
nesting on the same CPU. It should possible and safe
to use is also for synchronizing the backtraces here.


> We need to be very careful about introducing locks
> where NMIs are waiting on other CPUs.

I agree.


> > It has been repeatedly added and removed depending
> > on whether the backtrace was printed into the main log buffer
> > or into the per-CPU buffers. Last time it was removed by
> > the commit 03fc7f9c99c1e7ae2925d ("printk/nmi: Prevent deadlock
> > when accessing the main log buffer in NMI").
> >
> > It should be safe because there should not be any other locks in the
> > code path. Note that only one backtrace might be triggered at the same
> > time, see @backtrace_flag in nmi_trigger_cpumask_backtrace().
> 
> It is adding a lock around a lockless ringbuffer. For me that is a step
> backwards.
> 
> > We _must_ serialize it somehow[*]. The lock in nmi_cpu_backtrace()
> > looks less evil than the nmi_safe machinery. nmi_safe() shrinks
> > too long backtraces, lose timestamps, needs to be explicitely
> > flushed here and there, is a non-trivial code.
> >
> > [*] Non-serialized bactraces are real mess. Caller-id is visible
> >     only on consoles or via syslogd interface. And it is not much
> >     convenient.
> 
> Caller-id solves this problem and is easy to sort for anyone with
> `grep'. Yes, it is a shame that `dmesg' does not show it, but directly
> using any of the printk interfaces does show it (kmsg_dump, /dev/kmsg,
> syslog, console).

True but frankly, the current situation is _far_ from convenient:

   + consoles do not show it by default
   + none userspace tool (dmesg, journalctl, crash) is able to show it
   + grep is a nightmare, especially if you have more than handful of CPUs

Yes, everything is solvable but not easily.

> >     I get this with "echo l >/proc/sysrq-trigger" and this patchset:
> 
> Of course. Without caller-id, it is a mess. But this has nothing to do
> with NMI. The same problem exists for WARN_ON() on multiple CPUs
> simultaneously. If the user is not using caller-id, they are
> lost. Caller-id is the current solution to the interlaced logs.

Sure. But in reality, the risk of mixed WARN_ONs is small. While
this patch makes backtraces from all CPUs always unusable without
caller_id and non-trivial effort.


> For the long term, we should introduce a printk-context API that allows
> callers to perfectly pack their multi-line output into a single
> entry. We discussed [0][1] this back in August 2020.

We need a "short" term solution. There are currently 3 solutions:

1. Keep nmi_safe() and all the hacks around.

2. Serialize nmi_cpu_backtrace() by a spin lock and later by
   the special lock used also by atomic consoles.

3. Tell complaining people how to sort the messed logs.


My preference:

I most prefer 2nd solution until I see a realistic scenario
of a possible deadlock with the current kernel code.

I would still prefer 1st solution over 3rd one until we improve
kernel/userspace support for sorting the log by the caller id.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH printk v2 3/5] printk: remove NMI tracking
  2021-03-30 15:35   ` John Ogness
  (?)
@ 2021-04-01 14:37     ` Petr Mladek
  -1 siblings, 0 replies; 31+ messages in thread
From: Petr Mladek @ 2021-04-01 14:37 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Steven Rostedt,
	Thomas Gleixner, linux-kernel, Russell King, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, Ingo Molnar,
	Marc Zyngier, Valentin Schneider, Andrew Morton, Mike Rapoport,
	Geert Uytterhoeven, Anshuman Khandual, Leonardo Bras,
	Peter Zijlstra, Andy Lutomirski, Will Deacon,
	Frederic Weisbecker, Paul E. McKenney, Masahiro Yamada,
	Nathan Chancellor, Alexei Starovoitov, Kees Cook, Nick Terrell,
	David Howells, Johannes Weiner, linux-arm-kernel, linuxppc-dev

On Tue 2021-03-30 17:35:10, John Ogness wrote:
> All NMI contexts are handled the same as the safe context: store the
> message and defer printing. There is no need to have special NMI
> context tracking for this. Using in_nmi() is enough.
> 
> Signed-off-by: John Ogness <john.ogness@linutronix.de>

This is another great win!

Reviewed-by: Petr Mladek <pmladek@suse.com>

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH printk v2 3/5] printk: remove NMI tracking
@ 2021-04-01 14:37     ` Petr Mladek
  0 siblings, 0 replies; 31+ messages in thread
From: Petr Mladek @ 2021-04-01 14:37 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Peter Zijlstra, Alexei Starovoitov,
	David Howells, Paul Mackerras, Will Deacon, Leonardo Bras,
	Marc Zyngier, Masahiro Yamada, Russell King, Ingo Molnar,
	Geert Uytterhoeven, Valentin Schneider, Kees Cook,
	Paul E. McKenney, Anshuman Khandual, Frederic Weisbecker,
	Johannes Weiner, Steven Rostedt, Nathan Chancellor, Nick Terrell,
	Andy Lutomirski, Thomas Gleixner, linux-arm-kernel, linux-kernel,
	Sergey Senozhatsky, Andrew Morton, linuxppc-dev, Mike Rapoport

On Tue 2021-03-30 17:35:10, John Ogness wrote:
> All NMI contexts are handled the same as the safe context: store the
> message and defer printing. There is no need to have special NMI
> context tracking for this. Using in_nmi() is enough.
> 
> Signed-off-by: John Ogness <john.ogness@linutronix.de>

This is another great win!

Reviewed-by: Petr Mladek <pmladek@suse.com>

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH printk v2 3/5] printk: remove NMI tracking
@ 2021-04-01 14:37     ` Petr Mladek
  0 siblings, 0 replies; 31+ messages in thread
From: Petr Mladek @ 2021-04-01 14:37 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Steven Rostedt,
	Thomas Gleixner, linux-kernel, Russell King, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, Ingo Molnar,
	Marc Zyngier, Valentin Schneider, Andrew Morton, Mike Rapoport,
	Geert Uytterhoeven, Anshuman Khandual, Leonardo Bras,
	Peter Zijlstra, Andy Lutomirski, Will Deacon,
	Frederic Weisbecker, Paul E. McKenney, Masahiro Yamada,
	Nathan Chancellor, Alexei Starovoitov, Kees Cook, Nick Terrell,
	David Howells, Johannes Weiner, linux-arm-kernel, linuxppc-dev

On Tue 2021-03-30 17:35:10, John Ogness wrote:
> All NMI contexts are handled the same as the safe context: store the
> message and defer printing. There is no need to have special NMI
> context tracking for this. Using in_nmi() is enough.
> 
> Signed-off-by: John Ogness <john.ogness@linutronix.de>

This is another great win!

Reviewed-by: Petr Mladek <pmladek@suse.com>

Best Regards,
Petr

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH printk v2 4/5] printk: convert @syslog_lock to mutex
  2021-03-30 15:35 ` [PATCH printk v2 4/5] printk: convert @syslog_lock to mutex John Ogness
@ 2021-04-01 15:13   ` Petr Mladek
  0 siblings, 0 replies; 31+ messages in thread
From: Petr Mladek @ 2021-04-01 15:13 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Steven Rostedt,
	Thomas Gleixner, linux-kernel

On Tue 2021-03-30 17:35:11, John Ogness wrote:
> @syslog_lock was a raw_spin_lock to simplify the transition of
> removing @logbuf_lock and the safe buffers. With that transition
> complete, and since all uses of @syslog_lock are within sleepable
> contexts, @syslog_lock can become a mutex.

It makes perfect sense.

> ---
>  Note: The removal of read_syslog_seq_irq() is technically a small
>        step backwards. But the follow-up patch moves forward again
>        and closes a window that existed with read_syslog_seq_irq()
>        and @syslog_lock as a spin_lock.

This change would deserve a comment in the commit message. Well, I do
not think that is a step backward, see below.

>  kernel/printk/printk.c | 49 +++++++++++++++++-------------------------
>  1 file changed, 20 insertions(+), 29 deletions(-)
> 
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index f090d6a1b39e..b771aae46445 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -1603,21 +1603,9 @@ static int syslog_print_all(char __user *buf, int size, bool clear)
>  
>  static void syslog_clear(void)
>  {
> -	raw_spin_lock_irq(&syslog_lock);
> +	mutex_lock(&syslog_lock);
>  	latched_seq_write(&clear_seq, prb_next_seq(prb));
> -	raw_spin_unlock_irq(&syslog_lock);
> -}
> -
> -/* Return a consistent copy of @syslog_seq. */
> -static u64 read_syslog_seq_irq(void)
> -{
> -	u64 seq;
> -
> -	raw_spin_lock_irq(&syslog_lock);
> -	seq = syslog_seq;
> -	raw_spin_unlock_irq(&syslog_lock);
> -
> -	return seq;
> +	mutex_unlock(&syslog_lock);
>  }
>  
>  int do_syslog(int type, char __user *buf, int len, int source)
> @@ -1644,8 +1633,12 @@ int do_syslog(int type, char __user *buf, int len, int source)
>  		if (!access_ok(buf, len))
>  			return -EFAULT;
>  
> -		error = wait_event_interruptible(log_wait,
> -				prb_read_valid(prb, read_syslog_seq_irq(), NULL));
> +		/* Get a consistent copy of @syslog_seq. */
> +		mutex_lock(&syslog_lock);
> +		seq = syslog_seq;
> +		mutex_unlock(&syslog_lock);
> +
> +		error = wait_event_interruptible(log_wait, prb_read_valid(prb, seq, NULL));

Honestly, I am not sure how the syslog interface should work when there
are two readers in the system. They both share the same "syslog_seq".

This either fixes a historic bug. The caller of SYSLOG_ACTION_READ
might miss the new message when another reader did read it in the
meantime.

Or it might introduce a regression when two readers would read the
same message.

Or it does not matter because the behavior is racy by definition.

Best Regards,
Petr

PS: I am going to look at this more with a fresh head after Easter
 holidays.  The answer is important also for the next patch that
 basically restores the original behavior.
 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH printk v2 1/5] printk: track/limit recursion
  2021-04-01 10:00   ` Petr Mladek
@ 2021-04-02  2:03     ` Sergey Senozhatsky
  0 siblings, 0 replies; 31+ messages in thread
From: Sergey Senozhatsky @ 2021-04-02  2:03 UTC (permalink / raw)
  To: Petr Mladek
  Cc: John Ogness, Sergey Senozhatsky, Sergey Senozhatsky,
	Steven Rostedt, Thomas Gleixner, linux-kernel

On (21/04/01 12:00), Petr Mladek wrote:
> On Tue 2021-03-30 17:35:08, John Ogness wrote:
> > Currently the printk safe buffers provide a form of recursion
> > protection by redirecting to the safe buffers whenever printk() is
> > recursively called.
> > 
> > In preparation for removal of the safe buffers, provide an alternate
> > explicit recursion protection. Recursion is limited to 3 levels
> > per-CPU and per-context.
> > 
> > Signed-off-by: John Ogness <john.ogness@linutronix.de>
> 
> Reviewed-by: Petr Mladek <pmladek@suse.com>

Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH printk v2 2/5] printk: remove safe buffers
  2021-04-01 14:17         ` Petr Mladek
@ 2021-04-02  2:14           ` Sergey Senozhatsky
  -1 siblings, 0 replies; 31+ messages in thread
From: Sergey Senozhatsky @ 2021-04-02  2:14 UTC (permalink / raw)
  To: Petr Mladek
  Cc: John Ogness, Sergey Senozhatsky, Sergey Senozhatsky,
	Steven Rostedt, Thomas Gleixner, linux-kernel, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, Eric Biederman,
	Christophe Leroy, Nicholas Piggin, Alistair Popple,
	Jordan Niethe, Peter Zijlstra, Aneesh Kumar K.V, Andrew Morton,
	Kees Cook, Tiezhu Yang, Alexey Kardashevskiy, Yue Hu,
	Rafael Aquini, Guilherme G. Piccoli, Paul E. McKenney,
	linuxppc-dev, kexec

On (21/04/01 16:17), Petr Mladek wrote:
> > For the long term, we should introduce a printk-context API that allows
> > callers to perfectly pack their multi-line output into a single
> > entry. We discussed [0][1] this back in August 2020.
> 
> We need a "short" term solution. There are currently 3 solutions:
> 
> 1. Keep nmi_safe() and all the hacks around.
> 
> 2. Serialize nmi_cpu_backtrace() by a spin lock and later by
>    the special lock used also by atomic consoles.
> 
> 3. Tell complaining people how to sort the messed logs.

Are we talking about nmi_cpu_backtrace()->dump_stack() or some
other path?

dump_stack() seems to be already serialized by `dump_lock`. Hmm,
show_regs() is not serialized, seems like it should be under the
same `dump_lock` as dump_stack().

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH printk v2 2/5] printk: remove safe buffers
@ 2021-04-02  2:14           ` Sergey Senozhatsky
  0 siblings, 0 replies; 31+ messages in thread
From: Sergey Senozhatsky @ 2021-04-02  2:14 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Peter Zijlstra, Paul Mackerras, Tiezhu Yang,
	Rafael Aquini, Paul E. McKenney, Aneesh Kumar K.V,
	Alexey Kardashevskiy, Yue Hu, Jordan Niethe, Kees Cook,
	John Ogness, Alistair Popple, Guilherme G. Piccoli,
	Nicholas Piggin, Steven Rostedt, Thomas Gleixner, kexec,
	linux-kernel, Sergey Senozhatsky, Eric Biederman, Andrew Morton,
	linuxppc-dev

On (21/04/01 16:17), Petr Mladek wrote:
> > For the long term, we should introduce a printk-context API that allows
> > callers to perfectly pack their multi-line output into a single
> > entry. We discussed [0][1] this back in August 2020.
> 
> We need a "short" term solution. There are currently 3 solutions:
> 
> 1. Keep nmi_safe() and all the hacks around.
> 
> 2. Serialize nmi_cpu_backtrace() by a spin lock and later by
>    the special lock used also by atomic consoles.
> 
> 3. Tell complaining people how to sort the messed logs.

Are we talking about nmi_cpu_backtrace()->dump_stack() or some
other path?

dump_stack() seems to be already serialized by `dump_lock`. Hmm,
show_regs() is not serialized, seems like it should be under the
same `dump_lock` as dump_stack().

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH printk v2 2/5] printk: remove safe buffers
  2021-04-01 14:17         ` Petr Mladek
@ 2021-04-06 11:01           ` John Ogness
  -1 siblings, 0 replies; 31+ messages in thread
From: John Ogness @ 2021-04-06 11:01 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Steven Rostedt,
	Thomas Gleixner, linux-kernel, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, Eric Biederman,
	Christophe Leroy, Nicholas Piggin, Alistair Popple,
	Jordan Niethe, Peter Zijlstra, Aneesh Kumar K.V, Andrew Morton,
	Kees Cook, Tiezhu Yang, Alexey Kardashevskiy, Yue Hu,
	Rafael Aquini, Guilherme G. Piccoli, Paul E. McKenney,
	linuxppc-dev, kexec

On 2021-04-01, Petr Mladek <pmladek@suse.com> wrote:
>> Caller-id solves this problem and is easy to sort for anyone with
>> `grep'. Yes, it is a shame that `dmesg' does not show it, but
>> directly using any of the printk interfaces does show it (kmsg_dump,
>> /dev/kmsg, syslog, console).
>
> True but frankly, the current situation is _far_ from convenient:
>
>    + consoles do not show it by default
>    + none userspace tool (dmesg, journalctl, crash) is able to show it
>    + grep is a nightmare, especially if you have more than handful of CPUs
>
> Yes, everything is solvable but not easily.
>
>> >     I get this with "echo l >/proc/sysrq-trigger" and this patchset:
>> 
>> Of course. Without caller-id, it is a mess. But this has nothing to do
>> with NMI. The same problem exists for WARN_ON() on multiple CPUs
>> simultaneously. If the user is not using caller-id, they are
>> lost. Caller-id is the current solution to the interlaced logs.
>
> Sure. But in reality, the risk of mixed WARN_ONs is small. While
> this patch makes backtraces from all CPUs always unusable without
> caller_id and non-trivial effort.

I would prefer we solve the situation for non-NMI as well, not just for
the sysrq "l" case.

>> For the long term, we should introduce a printk-context API that allows
>> callers to perfectly pack their multi-line output into a single
>> entry. We discussed [0][1] this back in August 2020.
>
> We need a "short" term solution. There are currently 3 solutions:
>
> 1. Keep nmi_safe() and all the hacks around.
>
> 2. Serialize nmi_cpu_backtrace() by a spin lock and later by
>    the special lock used also by atomic consoles.
>
> 3. Tell complaining people how to sort the messed logs.

Or we look into the long term solution now. If caller-id's cannot not be
used as the solution (because nobody turns it on, nobody knows about it,
and/or distros do not enable it), then we should look at how to make at
least the backtraces contiguous. I have a few ideas here.

John Ogness

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH printk v2 2/5] printk: remove safe buffers
@ 2021-04-06 11:01           ` John Ogness
  0 siblings, 0 replies; 31+ messages in thread
From: John Ogness @ 2021-04-06 11:01 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Alexey Kardashevskiy, Paul Mackerras,
	Tiezhu Yang, Rafael Aquini, Aneesh Kumar K.V, Peter Zijlstra,
	Yue Hu, Jordan Niethe, Kees Cook, Paul E. McKenney,
	Alistair Popple, Guilherme G. Piccoli, Nicholas Piggin,
	Steven Rostedt, Thomas Gleixner, kexec, linux-kernel,
	Sergey Senozhatsky, Eric Biederman, Andrew Morton, linuxppc-dev

On 2021-04-01, Petr Mladek <pmladek@suse.com> wrote:
>> Caller-id solves this problem and is easy to sort for anyone with
>> `grep'. Yes, it is a shame that `dmesg' does not show it, but
>> directly using any of the printk interfaces does show it (kmsg_dump,
>> /dev/kmsg, syslog, console).
>
> True but frankly, the current situation is _far_ from convenient:
>
>    + consoles do not show it by default
>    + none userspace tool (dmesg, journalctl, crash) is able to show it
>    + grep is a nightmare, especially if you have more than handful of CPUs
>
> Yes, everything is solvable but not easily.
>
>> >     I get this with "echo l >/proc/sysrq-trigger" and this patchset:
>> 
>> Of course. Without caller-id, it is a mess. But this has nothing to do
>> with NMI. The same problem exists for WARN_ON() on multiple CPUs
>> simultaneously. If the user is not using caller-id, they are
>> lost. Caller-id is the current solution to the interlaced logs.
>
> Sure. But in reality, the risk of mixed WARN_ONs is small. While
> this patch makes backtraces from all CPUs always unusable without
> caller_id and non-trivial effort.

I would prefer we solve the situation for non-NMI as well, not just for
the sysrq "l" case.

>> For the long term, we should introduce a printk-context API that allows
>> callers to perfectly pack their multi-line output into a single
>> entry. We discussed [0][1] this back in August 2020.
>
> We need a "short" term solution. There are currently 3 solutions:
>
> 1. Keep nmi_safe() and all the hacks around.
>
> 2. Serialize nmi_cpu_backtrace() by a spin lock and later by
>    the special lock used also by atomic consoles.
>
> 3. Tell complaining people how to sort the messed logs.

Or we look into the long term solution now. If caller-id's cannot not be
used as the solution (because nobody turns it on, nobody knows about it,
and/or distros do not enable it), then we should look at how to make at
least the backtraces contiguous. I have a few ideas here.

John Ogness

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH printk v2 2/5] printk: remove safe buffers
  2021-04-02  2:14           ` Sergey Senozhatsky
@ 2021-04-06 11:17             ` Petr Mladek
  -1 siblings, 0 replies; 31+ messages in thread
From: Petr Mladek @ 2021-04-06 11:17 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: John Ogness, Sergey Senozhatsky, Sergey Senozhatsky,
	Steven Rostedt, Thomas Gleixner, linux-kernel, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, Eric Biederman,
	Christophe Leroy, Nicholas Piggin, Alistair Popple,
	Jordan Niethe, Peter Zijlstra, Aneesh Kumar K.V, Andrew Morton,
	Kees Cook, Tiezhu Yang, Alexey Kardashevskiy, Yue Hu,
	Rafael Aquini, Guilherme G. Piccoli, Paul E. McKenney,
	linuxppc-dev, kexec

On Fri 2021-04-02 11:14:18, Sergey Senozhatsky wrote:
> On (21/04/01 16:17), Petr Mladek wrote:
> > > For the long term, we should introduce a printk-context API that allows
> > > callers to perfectly pack their multi-line output into a single
> > > entry. We discussed [0][1] this back in August 2020.
> > 
> > We need a "short" term solution. There are currently 3 solutions:
> > 
> > 1. Keep nmi_safe() and all the hacks around.
> > 
> > 2. Serialize nmi_cpu_backtrace() by a spin lock and later by
> >    the special lock used also by atomic consoles.
> > 
> > 3. Tell complaining people how to sort the messed logs.
> 
> Are we talking about nmi_cpu_backtrace()->dump_stack() or some
> other path?

It is about serializing

			if (regs)
				show_regs(regs);
			else
				dump_stack();

in nmi_cpu_backtrace() when it is triggered on many(all) CPUs
at the same time.


> dump_stack() seems to be already serialized by `dump_lock`. Hmm,
> show_regs() is not serialized, seems like it should be under the
> same `dump_lock` as dump_stack().

Ah, I think that you already mentioned it in the past and I forget it.

Yes, we would need to synchronize all these dump/show functions using
the same lock. It is already the lock that might be taken recursively
on the same CPU.

In each case, we must not introduce another lock in
nmi_cpu_backtrace() because it might cause deadlock with
@dump_lock.

Anyway, I would really like to keep the dumps serialized. So, we
either need to use the same lock everywhere or we need to keep
nmi_safe buffers for now.

I would like to remove the nmi_safe buffers in the long term
but I am fine with doing it later after the consoles rework.
I'll leave the prioritization for John who is doing the work
and might have some preferences.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH printk v2 2/5] printk: remove safe buffers
@ 2021-04-06 11:17             ` Petr Mladek
  0 siblings, 0 replies; 31+ messages in thread
From: Petr Mladek @ 2021-04-06 11:17 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Sergey Senozhatsky, Peter Zijlstra, Paul Mackerras, Tiezhu Yang,
	Rafael Aquini, Paul E. McKenney, Aneesh Kumar K.V,
	Alexey Kardashevskiy, Yue Hu, Jordan Niethe, Kees Cook,
	John Ogness, Alistair Popple, Guilherme G. Piccoli,
	Nicholas Piggin, Steven Rostedt, Thomas Gleixner, kexec,
	linux-kernel, Sergey Senozhatsky, Eric Biederman, Andrew Morton,
	linuxppc-dev

On Fri 2021-04-02 11:14:18, Sergey Senozhatsky wrote:
> On (21/04/01 16:17), Petr Mladek wrote:
> > > For the long term, we should introduce a printk-context API that allows
> > > callers to perfectly pack their multi-line output into a single
> > > entry. We discussed [0][1] this back in August 2020.
> > 
> > We need a "short" term solution. There are currently 3 solutions:
> > 
> > 1. Keep nmi_safe() and all the hacks around.
> > 
> > 2. Serialize nmi_cpu_backtrace() by a spin lock and later by
> >    the special lock used also by atomic consoles.
> > 
> > 3. Tell complaining people how to sort the messed logs.
> 
> Are we talking about nmi_cpu_backtrace()->dump_stack() or some
> other path?

It is about serializing

			if (regs)
				show_regs(regs);
			else
				dump_stack();

in nmi_cpu_backtrace() when it is triggered on many(all) CPUs
at the same time.


> dump_stack() seems to be already serialized by `dump_lock`. Hmm,
> show_regs() is not serialized, seems like it should be under the
> same `dump_lock` as dump_stack().

Ah, I think that you already mentioned it in the past and I forget it.

Yes, we would need to synchronize all these dump/show functions using
the same lock. It is already the lock that might be taken recursively
on the same CPU.

In each case, we must not introduce another lock in
nmi_cpu_backtrace() because it might cause deadlock with
@dump_lock.

Anyway, I would really like to keep the dumps serialized. So, we
either need to use the same lock everywhere or we need to keep
nmi_safe buffers for now.

I would like to remove the nmi_safe buffers in the long term
but I am fine with doing it later after the consoles rework.
I'll leave the prioritization for John who is doing the work
and might have some preferences.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2021-04-06 11:18 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-30 15:35 [PATCH printk v2 0/5] printk: remove safe buffers John Ogness
2021-03-30 15:35 ` John Ogness
2021-03-30 15:35 ` John Ogness
2021-03-30 15:35 ` [PATCH printk v2 1/5] printk: track/limit recursion John Ogness
2021-04-01 10:00   ` Petr Mladek
2021-04-02  2:03     ` Sergey Senozhatsky
2021-03-30 15:35 ` [PATCH printk v2 2/5] printk: remove safe buffers John Ogness
2021-03-30 15:35   ` John Ogness
2021-03-31  7:59   ` John Ogness
2021-03-31  7:59     ` John Ogness
2021-04-01 12:21   ` Petr Mladek
2021-04-01 12:21     ` Petr Mladek
2021-04-01 13:19     ` John Ogness
2021-04-01 13:19       ` John Ogness
2021-04-01 14:17       ` Petr Mladek
2021-04-01 14:17         ` Petr Mladek
2021-04-02  2:14         ` Sergey Senozhatsky
2021-04-02  2:14           ` Sergey Senozhatsky
2021-04-06 11:17           ` Petr Mladek
2021-04-06 11:17             ` Petr Mladek
2021-04-06 11:01         ` John Ogness
2021-04-06 11:01           ` John Ogness
2021-03-30 15:35 ` [PATCH printk v2 3/5] printk: remove NMI tracking John Ogness
2021-03-30 15:35   ` John Ogness
2021-03-30 15:35   ` John Ogness
2021-04-01 14:37   ` Petr Mladek
2021-04-01 14:37     ` Petr Mladek
2021-04-01 14:37     ` Petr Mladek
2021-03-30 15:35 ` [PATCH printk v2 4/5] printk: convert @syslog_lock to mutex John Ogness
2021-04-01 15:13   ` Petr Mladek
2021-03-30 15:35 ` [PATCH printk v2 5/5] printk: syslog: close window between wait and read John Ogness

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.