linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH printk v3 0/6] printk: remove safe buffers
@ 2021-06-24 11:11 John Ogness
  2021-06-24 11:11 ` [PATCH printk v3 1/6] lib/nmi_backtrace: explicitly serialize banner and regs John Ogness
                   ` (5 more replies)
  0 siblings, 6 replies; 23+ messages in thread
From: John Ogness @ 2021-06-24 11:11 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Paul E. McKenney, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, Eric Biederman,
	Nicholas Piggin, Christophe Leroy, Cédric Le Goater,
	Andrew Morton, Kees Cook, Tiezhu Yang, Yue Hu,
	Alexey Kardashevskiy, linuxppc-dev, kexec, Russell King,
	Ingo Molnar, Marc Zyngier, Valentin Schneider, Pekka Enberg,
	Mike Rapoport, Wolfram Sang (Renesas),
	Anshuman Khandual, Peter Zijlstra, Frederic Weisbecker,
	Masahiro Yamada, Nathan Chancellor, Sami Tolvanen,
	Alexei Starovoitov, Nick Terrell, Chris Wilson, Vlastimil Babka,
	linux-arm-kernel

Hi,

Here is v3 of a series to remove the safe buffers. v2 can be
found here [0]. The safe buffers are no longer needed because
messages can be stored directly into the log buffer from any
context.

However, the safe buffers also provided a form of recursion
protection. For that reason, explicit recursion protection is
implemented for this series.

The safe buffers also implicitly provided serialization
between multiple CPUs executing in NMI context. This was
particularly necessary for the nmi_backtrace() output. This
serializiation is now preserved by using the printk_cpu_lock.

And finally, with the removal of the safe buffers, there is no
need for extra NMI enter/exit tracking. So this is also removed
(which includes removing config option CONFIG_PRINTK_NMI).

Changes since v2:

- Move irq disabling/enabling out of the
  console_lock_spinning_*() functions to simplify the patches
  keep the function prototypes simple.

- Change printk_enter_irqsave()/printk_exit_irqrestore() to
  macros to allow a more common calling convention for irq
  flags.

- Use the counter pointer from printk_enter_irqsave() in
  printk_exit_irqrestore() rather than fetching it again. This
  avoids any possible race conditions when printk's percpu
  flag is set.

- Use the printk_cpu_lock to serialize banner and regs with
  the stack dump in nmi_cpu_backtrace().

John Ogness

[0] https://lore.kernel.org/lkml/20210330153512.1182-1-john.ogness@linutronix.de

John Ogness (6):
  lib/nmi_backtrace: explicitly serialize banner and regs
  printk: track/limit recursion
  printk: remove safe buffers
  printk: remove NMI tracking
  printk: convert @syslog_lock to mutex
  printk: syslog: close window between wait and read

 arch/arm/kernel/smp.c          |   2 -
 arch/powerpc/kernel/traps.c    |   1 -
 arch/powerpc/kernel/watchdog.c |   5 -
 arch/powerpc/kexec/crash.c     |   3 -
 include/linux/hardirq.h        |   2 -
 include/linux/printk.h         |  22 --
 init/Kconfig                   |   5 -
 kernel/kexec_core.c            |   1 -
 kernel/panic.c                 |   3 -
 kernel/printk/internal.h       |  23 ---
 kernel/printk/printk.c         | 273 +++++++++++++++----------
 kernel/printk/printk_safe.c    | 361 +--------------------------------
 kernel/trace/trace.c           |   2 -
 lib/nmi_backtrace.c            |  13 +-
 14 files changed, 176 insertions(+), 540 deletions(-)


base-commit: 48e72544d6f06daedbf1d9b14610be89dba67526
-- 
2.20.1


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH printk v3 1/6] lib/nmi_backtrace: explicitly serialize banner and regs
  2021-06-24 11:11 [PATCH printk v3 0/6] printk: remove safe buffers John Ogness
@ 2021-06-24 11:11 ` John Ogness
  2021-06-24 12:26   ` Petr Mladek
  2021-06-24 11:11 ` [PATCH printk v3 2/6] printk: track/limit recursion John Ogness
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 23+ messages in thread
From: John Ogness @ 2021-06-24 11:11 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Paul E. McKenney

Currently the nmi_backtrace is serialized against other CPUs because
the messages are sent to the NMI buffers. Once these buffers are
removed, only the dumped stack will be serialized against other CPUs
(via the printk_cpu_lock).

Also serialize the nmi_backtrace banner and regs using the
printk_cpu_lock so that per-CPU serialization will be preserved even
after the NMI buffers are removed.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
---
 lib/nmi_backtrace.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/lib/nmi_backtrace.c b/lib/nmi_backtrace.c
index 8abe1870dba4..dae233c5f597 100644
--- a/lib/nmi_backtrace.c
+++ b/lib/nmi_backtrace.c
@@ -92,17 +92,24 @@ module_param(backtrace_idle, bool, 0644);
 bool nmi_cpu_backtrace(struct pt_regs *regs)
 {
 	int cpu = smp_processor_id();
+	unsigned long flags;
 
 	if (cpumask_test_cpu(cpu, to_cpumask(backtrace_mask))) {
 		if (!READ_ONCE(backtrace_idle) && regs && cpu_in_idle(instruction_pointer(regs))) {
 			pr_warn("NMI backtrace for cpu %d skipped: idling at %pS\n",
 				cpu, (void *)instruction_pointer(regs));
 		} else {
+			/*
+			 * Allow nested NMI backtraces while serializing
+			 * against other CPUs.
+			 */
+			printk_cpu_lock_irqsave(flags);
 			pr_warn("NMI backtrace for cpu %d\n", cpu);
 			if (regs)
 				show_regs(regs);
 			else
 				dump_stack();
+			printk_cpu_unlock_irqrestore(flags);
 		}
 		cpumask_clear_cpu(cpu, to_cpumask(backtrace_mask));
 		return true;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH printk v3 2/6] printk: track/limit recursion
  2021-06-24 11:11 [PATCH printk v3 0/6] printk: remove safe buffers John Ogness
  2021-06-24 11:11 ` [PATCH printk v3 1/6] lib/nmi_backtrace: explicitly serialize banner and regs John Ogness
@ 2021-06-24 11:11 ` John Ogness
  2021-06-24 12:55   ` Petr Mladek
  2021-06-24 11:11 ` [PATCH printk v3 3/6] printk: remove safe buffers John Ogness
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 23+ messages in thread
From: John Ogness @ 2021-06-24 11:11 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel

Currently the printk safe buffers provide a form of recursion
protection by redirecting to the safe buffers whenever printk() is
recursively called.

In preparation for removal of the safe buffers, provide an alternate
explicit recursion protection. Recursion is limited to 3 levels
per-CPU and per-context.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
---
 kernel/printk/printk.c | 86 ++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 83 insertions(+), 3 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 142a58d124d9..7fa0b4d91975 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1940,6 +1940,76 @@ static void call_console_drivers(const char *ext_text, size_t ext_len,
 	}
 }
 
+/*
+ * Recursion is tracked separately on each CPU. If NMIs are supported, an
+ * additional NMI context per CPU is also separately tracked. Until per-CPU
+ * is available, a separate "early tracking" is performed.
+ */
+static DEFINE_PER_CPU(u8, printk_count);
+static u8 printk_count_early;
+#ifdef CONFIG_HAVE_NMI
+static DEFINE_PER_CPU(u8, printk_count_nmi);
+static u8 printk_count_nmi_early;
+#endif
+
+/*
+ * Recursion is limited to keep the output sane. printk() should not require
+ * more than 1 level of recursion (allowing, for example, printk() to trigger
+ * a WARN), but a higher value is used in case some printk-internal errors
+ * exist, such as the ringbuffer validation checks failing.
+ */
+#define PRINTK_MAX_RECURSION 3
+
+/*
+ * Return a pointer to the dedicated counter for the CPU+context of the
+ * caller.
+ */
+static u8 *__printk_recursion_counter(void)
+{
+#ifdef CONFIG_HAVE_NMI
+	if (in_nmi()) {
+		if (printk_percpu_data_ready())
+			return this_cpu_ptr(&printk_count_nmi);
+		return &printk_count_nmi_early;
+	}
+#endif
+	if (printk_percpu_data_ready())
+		return this_cpu_ptr(&printk_count);
+	return &printk_count_early;
+}
+
+/*
+ * Enter recursion tracking. Interrupts are disabled to simplify tracking.
+ * The caller must check the boolean return value to see if the recursion is
+ * allowed. On failure, interrupts are not disabled.
+ *
+ * @recursion_ptr must be a variable of type (u8 *) and is the same variable
+ * that is passed to printk_exit_irqrestore().
+ */
+#define printk_enter_irqsave(recursion_ptr, flags)	\
+({							\
+	bool success = true;				\
+							\
+	typecheck(u8 *, recursion_ptr);			\
+	local_irq_save(flags);				\
+	(recursion_ptr) = __printk_recursion_counter();	\
+	if (*(recursion_ptr) > PRINTK_MAX_RECURSION) {	\
+		local_irq_restore(flags);		\
+		success = false;			\
+	} else {					\
+		(*(recursion_ptr))++;			\
+	}						\
+	success;					\
+})
+
+/* Exit recursion tracking, restoring interrupts. */
+#define printk_exit_irqrestore(recursion_ptr, flags)	\
+	do {						\
+		typecheck(u8 *, recursion_ptr);		\
+		(*(recursion_ptr))--;			\
+		local_irq_restore(flags);		\
+	} while (0)
+
 int printk_delay_msec __read_mostly;
 
 static inline void printk_delay(void)
@@ -2040,11 +2110,14 @@ int vprintk_store(int facility, int level,
 	struct prb_reserved_entry e;
 	enum log_flags lflags = 0;
 	struct printk_record r;
+	unsigned long irqflags;
 	u16 trunc_msg_len = 0;
 	char prefix_buf[8];
+	u8 *recursion_ptr;
 	u16 reserve_size;
 	va_list args2;
 	u16 text_len;
+	int ret = 0;
 	u64 ts_nsec;
 
 	/*
@@ -2055,6 +2128,9 @@ int vprintk_store(int facility, int level,
 	 */
 	ts_nsec = local_clock();
 
+	if (!printk_enter_irqsave(recursion_ptr, irqflags))
+		return 0;
+
 	/*
 	 * The sprintf needs to come first since the syslog prefix might be
 	 * passed in as a parameter. An extra byte must be reserved so that
@@ -2092,7 +2168,8 @@ int vprintk_store(int facility, int level,
 				prb_commit(&e);
 			}
 
-			return text_len;
+			ret = text_len;
+			goto out;
 		}
 	}
 
@@ -2108,7 +2185,7 @@ int vprintk_store(int facility, int level,
 
 		prb_rec_init_wr(&r, reserve_size + trunc_msg_len);
 		if (!prb_reserve(&e, prb, &r))
-			return 0;
+			goto out;
 	}
 
 	/* fill message */
@@ -2130,7 +2207,10 @@ int vprintk_store(int facility, int level,
 	else
 		prb_final_commit(&e);
 
-	return (text_len + trunc_msg_len);
+	ret = text_len + trunc_msg_len;
+out:
+	printk_exit_irqrestore(recursion_ptr, irqflags);
+	return ret;
 }
 
 asmlinkage int vprintk_emit(int facility, int level,
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH printk v3 3/6] printk: remove safe buffers
  2021-06-24 11:11 [PATCH printk v3 0/6] printk: remove safe buffers John Ogness
  2021-06-24 11:11 ` [PATCH printk v3 1/6] lib/nmi_backtrace: explicitly serialize banner and regs John Ogness
  2021-06-24 11:11 ` [PATCH printk v3 2/6] printk: track/limit recursion John Ogness
@ 2021-06-24 11:11 ` John Ogness
  2021-06-24 14:49   ` Petr Mladek
  2021-06-24 11:11 ` [PATCH printk v3 4/6] printk: remove NMI tracking John Ogness
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 23+ messages in thread
From: John Ogness @ 2021-06-24 11:11 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Eric Biederman, Nicholas Piggin,
	Christophe Leroy, Cédric Le Goater, Andrew Morton,
	Kees Cook, Tiezhu Yang, Yue Hu, Alexey Kardashevskiy,
	Paul E. McKenney, linuxppc-dev, kexec

With @logbuf_lock removed, the high level printk functions for
storing messages are lockless. Messages can be stored from any
context, so there is no need for the NMI and safe buffers anymore.
Remove the NMI and safe buffers.

Although the safe buffers are removed, the NMI and safe context
tracking is still in place. In these contexts, store the message
immediately but still use irq_work to defer the console printing.

Since printk recursion tracking is in place, safe context tracking
for most of printk is not needed. Remove it. Only safe context
tracking relating to the console lock is left in place. This is
because the console lock is needed for the actual printing.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
---
 arch/powerpc/kernel/traps.c    |   1 -
 arch/powerpc/kernel/watchdog.c |   5 -
 include/linux/printk.h         |  10 -
 kernel/kexec_core.c            |   1 -
 kernel/panic.c                 |   3 -
 kernel/printk/internal.h       |  17 --
 kernel/printk/printk.c         | 126 +++++--------
 kernel/printk/printk_safe.c    | 332 +--------------------------------
 lib/nmi_backtrace.c            |   6 -
 9 files changed, 51 insertions(+), 450 deletions(-)

diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index a44a30b0688c..5828c83eaca6 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -171,7 +171,6 @@ extern void panic_flush_kmsg_start(void)
 
 extern void panic_flush_kmsg_end(void)
 {
-	printk_safe_flush_on_panic();
 	kmsg_dump(KMSG_DUMP_PANIC);
 	bust_spinlocks(0);
 	debug_locks_off();
diff --git a/arch/powerpc/kernel/watchdog.c b/arch/powerpc/kernel/watchdog.c
index c9a8f4781a10..dc17d8903d4f 100644
--- a/arch/powerpc/kernel/watchdog.c
+++ b/arch/powerpc/kernel/watchdog.c
@@ -183,11 +183,6 @@ static void watchdog_smp_panic(int cpu, u64 tb)
 
 	wd_smp_unlock(&flags);
 
-	printk_safe_flush();
-	/*
-	 * printk_safe_flush() seems to require another print
-	 * before anything actually goes out to console.
-	 */
 	if (sysctl_hardlockup_all_cpu_backtrace)
 		trigger_allbutself_cpu_backtrace();
 
diff --git a/include/linux/printk.h b/include/linux/printk.h
index 1790a5521fd9..664612f75dac 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -207,8 +207,6 @@ __printf(1, 2) void dump_stack_set_arch_desc(const char *fmt, ...);
 void dump_stack_print_info(const char *log_lvl);
 void show_regs_print_info(const char *log_lvl);
 extern asmlinkage void dump_stack(void) __cold;
-extern void printk_safe_flush(void);
-extern void printk_safe_flush_on_panic(void);
 #else
 static inline __printf(1, 0)
 int vprintk(const char *s, va_list args)
@@ -272,14 +270,6 @@ static inline void show_regs_print_info(const char *log_lvl)
 static inline void dump_stack(void)
 {
 }
-
-static inline void printk_safe_flush(void)
-{
-}
-
-static inline void printk_safe_flush_on_panic(void)
-{
-}
 #endif
 
 #ifdef CONFIG_SMP
diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index a0b6780740c8..480d5f77ef4f 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -977,7 +977,6 @@ void crash_kexec(struct pt_regs *regs)
 	old_cpu = atomic_cmpxchg(&panic_cpu, PANIC_CPU_INVALID, this_cpu);
 	if (old_cpu == PANIC_CPU_INVALID) {
 		/* This is the 1st CPU which comes here, so go ahead. */
-		printk_safe_flush_on_panic();
 		__crash_kexec(regs);
 
 		/*
diff --git a/kernel/panic.c b/kernel/panic.c
index 332736a72a58..1f0df42f8d0c 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -247,7 +247,6 @@ void panic(const char *fmt, ...)
 	 * Bypass the panic_cpu check and call __crash_kexec directly.
 	 */
 	if (!_crash_kexec_post_notifiers) {
-		printk_safe_flush_on_panic();
 		__crash_kexec(NULL);
 
 		/*
@@ -271,8 +270,6 @@ void panic(const char *fmt, ...)
 	 */
 	atomic_notifier_call_chain(&panic_notifier_list, 0, buf);
 
-	/* Call flush even twice. It tries harder with a single online CPU */
-	printk_safe_flush_on_panic();
 	kmsg_dump(KMSG_DUMP_PANIC);
 
 	/*
diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h
index 51615c909b2f..6cc35c5de890 100644
--- a/kernel/printk/internal.h
+++ b/kernel/printk/internal.h
@@ -22,7 +22,6 @@ __printf(1, 0) int vprintk_deferred(const char *fmt, va_list args);
 void __printk_safe_enter(void);
 void __printk_safe_exit(void);
 
-void printk_safe_init(void);
 bool printk_percpu_data_ready(void);
 
 #define printk_safe_enter_irqsave(flags)	\
@@ -37,18 +36,6 @@ bool printk_percpu_data_ready(void);
 		local_irq_restore(flags);	\
 	} while (0)
 
-#define printk_safe_enter_irq()		\
-	do {					\
-		local_irq_disable();		\
-		__printk_safe_enter();		\
-	} while (0)
-
-#define printk_safe_exit_irq()			\
-	do {					\
-		__printk_safe_exit();		\
-		local_irq_enable();		\
-	} while (0)
-
 void defer_console_output(void);
 
 #else
@@ -61,9 +48,5 @@ void defer_console_output(void);
 #define printk_safe_enter_irqsave(flags) local_irq_save(flags)
 #define printk_safe_exit_irqrestore(flags) local_irq_restore(flags)
 
-#define printk_safe_enter_irq() local_irq_disable()
-#define printk_safe_exit_irq() local_irq_enable()
-
-static inline void printk_safe_init(void) { }
 static inline bool printk_percpu_data_ready(void) { return false; }
 #endif /* CONFIG_PRINTK */
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 7fa0b4d91975..495520b7369c 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -732,27 +732,22 @@ static ssize_t devkmsg_read(struct file *file, char __user *buf,
 	if (ret)
 		return ret;
 
-	printk_safe_enter_irq();
 	if (!prb_read_valid(prb, atomic64_read(&user->seq), r)) {
 		if (file->f_flags & O_NONBLOCK) {
 			ret = -EAGAIN;
-			printk_safe_exit_irq();
 			goto out;
 		}
 
-		printk_safe_exit_irq();
 		ret = wait_event_interruptible(log_wait,
 				prb_read_valid(prb, atomic64_read(&user->seq), r));
 		if (ret)
 			goto out;
-		printk_safe_enter_irq();
 	}
 
 	if (r->info->seq != atomic64_read(&user->seq)) {
 		/* our last seen message is gone, return error and reset */
 		atomic64_set(&user->seq, r->info->seq);
 		ret = -EPIPE;
-		printk_safe_exit_irq();
 		goto out;
 	}
 
@@ -762,7 +757,6 @@ static ssize_t devkmsg_read(struct file *file, char __user *buf,
 				  &r->info->dev_info);
 
 	atomic64_set(&user->seq, r->info->seq + 1);
-	printk_safe_exit_irq();
 
 	if (len > count) {
 		ret = -EINVAL;
@@ -797,7 +791,6 @@ static loff_t devkmsg_llseek(struct file *file, loff_t offset, int whence)
 	if (offset)
 		return -ESPIPE;
 
-	printk_safe_enter_irq();
 	switch (whence) {
 	case SEEK_SET:
 		/* the first record */
@@ -818,7 +811,6 @@ static loff_t devkmsg_llseek(struct file *file, loff_t offset, int whence)
 	default:
 		ret = -EINVAL;
 	}
-	printk_safe_exit_irq();
 	return ret;
 }
 
@@ -833,7 +825,6 @@ static __poll_t devkmsg_poll(struct file *file, poll_table *wait)
 
 	poll_wait(file, &log_wait, wait);
 
-	printk_safe_enter_irq();
 	if (prb_read_valid_info(prb, atomic64_read(&user->seq), &info, NULL)) {
 		/* return error when data has vanished underneath us */
 		if (info.seq != atomic64_read(&user->seq))
@@ -841,7 +832,6 @@ static __poll_t devkmsg_poll(struct file *file, poll_table *wait)
 		else
 			ret = EPOLLIN|EPOLLRDNORM;
 	}
-	printk_safe_exit_irq();
 
 	return ret;
 }
@@ -874,9 +864,7 @@ static int devkmsg_open(struct inode *inode, struct file *file)
 	prb_rec_init_rd(&user->record, &user->info,
 			&user->text_buf[0], sizeof(user->text_buf));
 
-	printk_safe_enter_irq();
 	atomic64_set(&user->seq, prb_first_valid_seq(prb));
-	printk_safe_exit_irq();
 
 	file->private_data = user;
 	return 0;
@@ -1042,9 +1030,6 @@ static inline void log_buf_add_cpu(void) {}
 
 static void __init set_percpu_data_ready(void)
 {
-	printk_safe_init();
-	/* Make sure we set this flag only after printk_safe() init is done */
-	barrier();
 	__printk_percpu_data_ready = true;
 }
 
@@ -1082,6 +1067,7 @@ void __init setup_log_buf(int early)
 	struct prb_desc *new_descs;
 	struct printk_info info;
 	struct printk_record r;
+	unsigned int text_size;
 	size_t new_descs_size;
 	size_t new_infos_size;
 	unsigned long flags;
@@ -1142,24 +1128,37 @@ void __init setup_log_buf(int early)
 		 new_descs, ilog2(new_descs_count),
 		 new_infos);
 
-	printk_safe_enter_irqsave(flags);
+	local_irq_save(flags);
 
 	log_buf_len = new_log_buf_len;
 	log_buf = new_log_buf;
 	new_log_buf_len = 0;
 
 	free = __LOG_BUF_LEN;
-	prb_for_each_record(0, &printk_rb_static, seq, &r)
-		free -= add_to_rb(&printk_rb_dynamic, &r);
+	prb_for_each_record(0, &printk_rb_static, seq, &r) {
+		text_size = add_to_rb(&printk_rb_dynamic, &r);
+		if (text_size > free)
+			free = 0;
+		else
+			free -= text_size;
+	}
 
-	/*
-	 * This is early enough that everything is still running on the
-	 * boot CPU and interrupts are disabled. So no new messages will
-	 * appear during the transition to the dynamic buffer.
-	 */
 	prb = &printk_rb_dynamic;
 
-	printk_safe_exit_irqrestore(flags);
+	local_irq_restore(flags);
+
+	/*
+	 * Copy any remaining messages that might have appeared from
+	 * NMI context after copying but before switching to the
+	 * dynamic buffer.
+	 */
+	prb_for_each_record(seq, &printk_rb_static, seq, &r) {
+		text_size = add_to_rb(&printk_rb_dynamic, &r);
+		if (text_size > free)
+			free = 0;
+		else
+			free -= text_size;
+	}
 
 	if (seq != prb_next_seq(&printk_rb_static)) {
 		pr_err("dropped %llu messages\n",
@@ -1498,11 +1497,9 @@ static int syslog_print(char __user *buf, int size)
 		size_t n;
 		size_t skip;
 
-		printk_safe_enter_irq();
-		raw_spin_lock(&syslog_lock);
+		raw_spin_lock_irq(&syslog_lock);
 		if (!prb_read_valid(prb, syslog_seq, &r)) {
-			raw_spin_unlock(&syslog_lock);
-			printk_safe_exit_irq();
+			raw_spin_unlock_irq(&syslog_lock);
 			break;
 		}
 		if (r.info->seq != syslog_seq) {
@@ -1531,8 +1528,7 @@ static int syslog_print(char __user *buf, int size)
 			syslog_partial += n;
 		} else
 			n = 0;
-		raw_spin_unlock(&syslog_lock);
-		printk_safe_exit_irq();
+		raw_spin_unlock_irq(&syslog_lock);
 
 		if (!n)
 			break;
@@ -1566,7 +1562,6 @@ static int syslog_print_all(char __user *buf, int size, bool clear)
 		return -ENOMEM;
 
 	time = printk_time;
-	printk_safe_enter_irq();
 	/*
 	 * Find first record that fits, including all following records,
 	 * into the user-provided buffer for this dump.
@@ -1587,23 +1582,20 @@ static int syslog_print_all(char __user *buf, int size, bool clear)
 			break;
 		}
 
-		printk_safe_exit_irq();
 		if (copy_to_user(buf + len, text, textlen))
 			len = -EFAULT;
 		else
 			len += textlen;
-		printk_safe_enter_irq();
 
 		if (len < 0)
 			break;
 	}
 
 	if (clear) {
-		raw_spin_lock(&syslog_lock);
+		raw_spin_lock_irq(&syslog_lock);
 		latched_seq_write(&clear_seq, seq);
-		raw_spin_unlock(&syslog_lock);
+		raw_spin_unlock_irq(&syslog_lock);
 	}
-	printk_safe_exit_irq();
 
 	kfree(text);
 	return len;
@@ -1611,11 +1603,9 @@ static int syslog_print_all(char __user *buf, int size, bool clear)
 
 static void syslog_clear(void)
 {
-	printk_safe_enter_irq();
-	raw_spin_lock(&syslog_lock);
+	raw_spin_lock_irq(&syslog_lock);
 	latched_seq_write(&clear_seq, prb_next_seq(prb));
-	raw_spin_unlock(&syslog_lock);
-	printk_safe_exit_irq();
+	raw_spin_unlock_irq(&syslog_lock);
 }
 
 /* Return a consistent copy of @syslog_seq. */
@@ -1703,12 +1693,10 @@ int do_syslog(int type, char __user *buf, int len, int source)
 		break;
 	/* Number of chars in the log buffer */
 	case SYSLOG_ACTION_SIZE_UNREAD:
-		printk_safe_enter_irq();
-		raw_spin_lock(&syslog_lock);
+		raw_spin_lock_irq(&syslog_lock);
 		if (!prb_read_valid_info(prb, syslog_seq, &info, NULL)) {
 			/* No unread messages. */
-			raw_spin_unlock(&syslog_lock);
-			printk_safe_exit_irq();
+			raw_spin_unlock_irq(&syslog_lock);
 			return 0;
 		}
 		if (info.seq != syslog_seq) {
@@ -1736,8 +1724,7 @@ int do_syslog(int type, char __user *buf, int len, int source)
 			}
 			error -= syslog_partial;
 		}
-		raw_spin_unlock(&syslog_lock);
-		printk_safe_exit_irq();
+		raw_spin_unlock_irq(&syslog_lock);
 		break;
 	/* Size of the log buffer */
 	case SYSLOG_ACTION_SIZE_BUFFER:
@@ -1852,7 +1839,7 @@ static int console_trylock_spinning(void)
 	if (console_trylock())
 		return 1;
 
-	printk_safe_enter_irqsave(flags);
+	local_irq_save(flags);
 
 	raw_spin_lock(&console_owner_lock);
 	owner = READ_ONCE(console_owner);
@@ -1873,7 +1860,7 @@ static int console_trylock_spinning(void)
 	 * that active printer isn't us (recursive printk?).
 	 */
 	if (!spin) {
-		printk_safe_exit_irqrestore(flags);
+		local_irq_restore(flags);
 		return 0;
 	}
 
@@ -1884,7 +1871,7 @@ static int console_trylock_spinning(void)
 		cpu_relax();
 	spin_release(&console_owner_dep_map, _THIS_IP_);
 
-	printk_safe_exit_irqrestore(flags);
+	local_irq_restore(flags);
 	/*
 	 * The owner passed the console lock to us.
 	 * Since we did not spin on console lock, annotate
@@ -2219,7 +2206,6 @@ asmlinkage int vprintk_emit(int facility, int level,
 {
 	int printed_len;
 	bool in_sched = false;
-	unsigned long flags;
 
 	/* Suppress unimportant messages after panic happens */
 	if (unlikely(suppress_printk))
@@ -2233,9 +2219,7 @@ asmlinkage int vprintk_emit(int facility, int level,
 	boot_delay_msec(level);
 	printk_delay();
 
-	printk_safe_enter_irqsave(flags);
 	printed_len = vprintk_store(facility, level, dev_info, fmt, args);
-	printk_safe_exit_irqrestore(flags);
 
 	/* If called from the scheduler, we can not call up(). */
 	if (!in_sched) {
@@ -2664,9 +2648,9 @@ void console_unlock(void)
 
 	for (;;) {
 		size_t ext_len = 0;
+		int handover;
 		size_t len;
 
-		printk_safe_enter_irqsave(flags);
 skip:
 		if (!prb_read_valid(prb, console_seq, &r))
 			break;
@@ -2716,19 +2700,22 @@ void console_unlock(void)
 		 * were to occur on another CPU, it may wait for this one to
 		 * finish. This task can not be preempted if there is a
 		 * waiter waiting to take over.
+		 *
+		 * Interrupts are disabled because the hand over to a waiter
+		 * must not be interrupted until the hand over is completed
+		 * (@console_waiter is cleared).
 		 */
+		local_irq_save(flags);
 		console_lock_spinning_enable();
 
 		stop_critical_timings();	/* don't trace print latency */
 		call_console_drivers(ext_text, ext_len, text, len);
 		start_critical_timings();
 
-		if (console_lock_spinning_disable_and_check()) {
-			printk_safe_exit_irqrestore(flags);
+		handover = console_lock_spinning_disable_and_check();
+		local_irq_restore(flags);
+		if (handover)
 			return;
-		}
-
-		printk_safe_exit_irqrestore(flags);
 
 		if (do_cond_resched)
 			cond_resched();
@@ -2745,8 +2732,6 @@ void console_unlock(void)
 	 * flush, no worries.
 	 */
 	retry = prb_read_valid(prb, console_seq, NULL);
-	printk_safe_exit_irqrestore(flags);
-
 	if (retry && console_trylock())
 		goto again;
 }
@@ -2808,13 +2793,8 @@ void console_flush_on_panic(enum con_flush_mode mode)
 	console_trylock();
 	console_may_schedule = 0;
 
-	if (mode == CONSOLE_REPLAY_ALL) {
-		unsigned long flags;
-
-		printk_safe_enter_irqsave(flags);
+	if (mode == CONSOLE_REPLAY_ALL)
 		console_seq = prb_first_valid_seq(prb);
-		printk_safe_exit_irqrestore(flags);
-	}
 	console_unlock();
 }
 
@@ -3466,14 +3446,12 @@ bool kmsg_dump_get_line(struct kmsg_dump_iter *iter, bool syslog,
 	struct printk_info info;
 	unsigned int line_count;
 	struct printk_record r;
-	unsigned long flags;
 	size_t l = 0;
 	bool ret = false;
 
 	if (iter->cur_seq < min_seq)
 		iter->cur_seq = min_seq;
 
-	printk_safe_enter_irqsave(flags);
 	prb_rec_init_rd(&r, &info, line, size);
 
 	/* Read text or count text lines? */
@@ -3494,7 +3472,6 @@ bool kmsg_dump_get_line(struct kmsg_dump_iter *iter, bool syslog,
 	iter->cur_seq = r.info->seq + 1;
 	ret = true;
 out:
-	printk_safe_exit_irqrestore(flags);
 	if (len)
 		*len = l;
 	return ret;
@@ -3526,7 +3503,6 @@ bool kmsg_dump_get_buffer(struct kmsg_dump_iter *iter, bool syslog,
 	u64 min_seq = latched_seq_read_nolock(&clear_seq);
 	struct printk_info info;
 	struct printk_record r;
-	unsigned long flags;
 	u64 seq;
 	u64 next_seq;
 	size_t len = 0;
@@ -3539,7 +3515,6 @@ bool kmsg_dump_get_buffer(struct kmsg_dump_iter *iter, bool syslog,
 	if (iter->cur_seq < min_seq)
 		iter->cur_seq = min_seq;
 
-	printk_safe_enter_irqsave(flags);
 	if (prb_read_valid_info(prb, iter->cur_seq, &info, NULL)) {
 		if (info.seq != iter->cur_seq) {
 			/* messages are gone, move to first available one */
@@ -3548,10 +3523,8 @@ bool kmsg_dump_get_buffer(struct kmsg_dump_iter *iter, bool syslog,
 	}
 
 	/* last entry */
-	if (iter->cur_seq >= iter->next_seq) {
-		printk_safe_exit_irqrestore(flags);
+	if (iter->cur_seq >= iter->next_seq)
 		goto out;
-	}
 
 	/*
 	 * Find first record that fits, including all following records,
@@ -3583,7 +3556,6 @@ bool kmsg_dump_get_buffer(struct kmsg_dump_iter *iter, bool syslog,
 
 	iter->next_seq = next_seq;
 	ret = true;
-	printk_safe_exit_irqrestore(flags);
 out:
 	if (len_out)
 		*len_out = len;
@@ -3601,12 +3573,8 @@ EXPORT_SYMBOL_GPL(kmsg_dump_get_buffer);
  */
 void kmsg_dump_rewind(struct kmsg_dump_iter *iter)
 {
-	unsigned long flags;
-
-	printk_safe_enter_irqsave(flags);
 	iter->cur_seq = latched_seq_read_nolock(&clear_seq);
 	iter->next_seq = prb_next_seq(prb);
-	printk_safe_exit_irqrestore(flags);
 }
 EXPORT_SYMBOL_GPL(kmsg_dump_rewind);
 
diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c
index 94232186fccb..0456cd48d01c 100644
--- a/kernel/printk/printk_safe.c
+++ b/kernel/printk/printk_safe.c
@@ -15,286 +15,9 @@
 
 #include "internal.h"
 
-/*
- * In NMI and safe mode, printk() avoids taking locks. Instead,
- * it uses an alternative implementation that temporary stores
- * the strings into a per-CPU buffer. The content of the buffer
- * is later flushed into the main ring buffer via IRQ work.
- *
- * The alternative implementation is chosen transparently
- * by examining current printk() context mask stored in @printk_context
- * per-CPU variable.
- *
- * The implementation allows to flush the strings also from another CPU.
- * There are situations when we want to make sure that all buffers
- * were handled or when IRQs are blocked.
- */
-
-#define SAFE_LOG_BUF_LEN ((1 << CONFIG_PRINTK_SAFE_LOG_BUF_SHIFT) -	\
-				sizeof(atomic_t) -			\
-				sizeof(atomic_t) -			\
-				sizeof(struct irq_work))
-
-struct printk_safe_seq_buf {
-	atomic_t		len;	/* length of written data */
-	atomic_t		message_lost;
-	struct irq_work		work;	/* IRQ work that flushes the buffer */
-	unsigned char		buffer[SAFE_LOG_BUF_LEN];
-};
-
-static DEFINE_PER_CPU(struct printk_safe_seq_buf, safe_print_seq);
 static DEFINE_PER_CPU(int, printk_context);
 
-static DEFINE_RAW_SPINLOCK(safe_read_lock);
-
-#ifdef CONFIG_PRINTK_NMI
-static DEFINE_PER_CPU(struct printk_safe_seq_buf, nmi_print_seq);
-#endif
-
-/* Get flushed in a more safe context. */
-static void queue_flush_work(struct printk_safe_seq_buf *s)
-{
-	if (printk_percpu_data_ready())
-		irq_work_queue(&s->work);
-}
-
-/*
- * Add a message to per-CPU context-dependent buffer. NMI and printk-safe
- * have dedicated buffers, because otherwise printk-safe preempted by
- * NMI-printk would have overwritten the NMI messages.
- *
- * The messages are flushed from irq work (or from panic()), possibly,
- * from other CPU, concurrently with printk_safe_log_store(). Should this
- * happen, printk_safe_log_store() will notice the buffer->len mismatch
- * and repeat the write.
- */
-static __printf(2, 0) int printk_safe_log_store(struct printk_safe_seq_buf *s,
-						const char *fmt, va_list args)
-{
-	int add;
-	size_t len;
-	va_list ap;
-
-again:
-	len = atomic_read(&s->len);
-
-	/* The trailing '\0' is not counted into len. */
-	if (len >= sizeof(s->buffer) - 1) {
-		atomic_inc(&s->message_lost);
-		queue_flush_work(s);
-		return 0;
-	}
-
-	/*
-	 * Make sure that all old data have been read before the buffer
-	 * was reset. This is not needed when we just append data.
-	 */
-	if (!len)
-		smp_rmb();
-
-	va_copy(ap, args);
-	add = vscnprintf(s->buffer + len, sizeof(s->buffer) - len, fmt, ap);
-	va_end(ap);
-	if (!add)
-		return 0;
-
-	/*
-	 * Do it once again if the buffer has been flushed in the meantime.
-	 * Note that atomic_cmpxchg() is an implicit memory barrier that
-	 * makes sure that the data were written before updating s->len.
-	 */
-	if (atomic_cmpxchg(&s->len, len, len + add) != len)
-		goto again;
-
-	queue_flush_work(s);
-	return add;
-}
-
-static inline void printk_safe_flush_line(const char *text, int len)
-{
-	/*
-	 * Avoid any console drivers calls from here, because we may be
-	 * in NMI or printk_safe context (when in panic). The messages
-	 * must go only into the ring buffer at this stage.  Consoles will
-	 * get explicitly called later when a crashdump is not generated.
-	 */
-	printk_deferred("%.*s", len, text);
-}
-
-/* printk part of the temporary buffer line by line */
-static int printk_safe_flush_buffer(const char *start, size_t len)
-{
-	const char *c, *end;
-	bool header;
-
-	c = start;
-	end = start + len;
-	header = true;
-
-	/* Print line by line. */
-	while (c < end) {
-		if (*c == '\n') {
-			printk_safe_flush_line(start, c - start + 1);
-			start = ++c;
-			header = true;
-			continue;
-		}
-
-		/* Handle continuous lines or missing new line. */
-		if ((c + 1 < end) && printk_get_level(c)) {
-			if (header) {
-				c = printk_skip_level(c);
-				continue;
-			}
-
-			printk_safe_flush_line(start, c - start);
-			start = c++;
-			header = true;
-			continue;
-		}
-
-		header = false;
-		c++;
-	}
-
-	/* Check if there was a partial line. Ignore pure header. */
-	if (start < end && !header) {
-		static const char newline[] = KERN_CONT "\n";
-
-		printk_safe_flush_line(start, end - start);
-		printk_safe_flush_line(newline, strlen(newline));
-	}
-
-	return len;
-}
-
-static void report_message_lost(struct printk_safe_seq_buf *s)
-{
-	int lost = atomic_xchg(&s->message_lost, 0);
-
-	if (lost)
-		printk_deferred("Lost %d message(s)!\n", lost);
-}
-
-/*
- * Flush data from the associated per-CPU buffer. The function
- * can be called either via IRQ work or independently.
- */
-static void __printk_safe_flush(struct irq_work *work)
-{
-	struct printk_safe_seq_buf *s =
-		container_of(work, struct printk_safe_seq_buf, work);
-	unsigned long flags;
-	size_t len;
-	int i;
-
-	/*
-	 * The lock has two functions. First, one reader has to flush all
-	 * available message to make the lockless synchronization with
-	 * writers easier. Second, we do not want to mix messages from
-	 * different CPUs. This is especially important when printing
-	 * a backtrace.
-	 */
-	raw_spin_lock_irqsave(&safe_read_lock, flags);
-
-	i = 0;
-more:
-	len = atomic_read(&s->len);
-
-	/*
-	 * This is just a paranoid check that nobody has manipulated
-	 * the buffer an unexpected way. If we printed something then
-	 * @len must only increase. Also it should never overflow the
-	 * buffer size.
-	 */
-	if ((i && i >= len) || len > sizeof(s->buffer)) {
-		const char *msg = "printk_safe_flush: internal error\n";
-
-		printk_safe_flush_line(msg, strlen(msg));
-		len = 0;
-	}
-
-	if (!len)
-		goto out; /* Someone else has already flushed the buffer. */
-
-	/* Make sure that data has been written up to the @len */
-	smp_rmb();
-	i += printk_safe_flush_buffer(s->buffer + i, len - i);
-
-	/*
-	 * Check that nothing has got added in the meantime and truncate
-	 * the buffer. Note that atomic_cmpxchg() is an implicit memory
-	 * barrier that makes sure that the data were copied before
-	 * updating s->len.
-	 */
-	if (atomic_cmpxchg(&s->len, len, 0) != len)
-		goto more;
-
-out:
-	report_message_lost(s);
-	raw_spin_unlock_irqrestore(&safe_read_lock, flags);
-}
-
-/**
- * printk_safe_flush - flush all per-cpu nmi buffers.
- *
- * The buffers are flushed automatically via IRQ work. This function
- * is useful only when someone wants to be sure that all buffers have
- * been flushed at some point.
- */
-void printk_safe_flush(void)
-{
-	int cpu;
-
-	for_each_possible_cpu(cpu) {
-#ifdef CONFIG_PRINTK_NMI
-		__printk_safe_flush(&per_cpu(nmi_print_seq, cpu).work);
-#endif
-		__printk_safe_flush(&per_cpu(safe_print_seq, cpu).work);
-	}
-}
-
-/**
- * printk_safe_flush_on_panic - flush all per-cpu nmi buffers when the system
- *	goes down.
- *
- * Similar to printk_safe_flush() but it can be called even in NMI context when
- * the system goes down. It does the best effort to get NMI messages into
- * the main ring buffer.
- *
- * Note that it could try harder when there is only one CPU online.
- */
-void printk_safe_flush_on_panic(void)
-{
-	/*
-	 * Make sure that we could access the safe buffers.
-	 * Do not risk a double release when more CPUs are up.
-	 */
-	if (raw_spin_is_locked(&safe_read_lock)) {
-		if (num_online_cpus() > 1)
-			return;
-
-		debug_locks_off();
-		raw_spin_lock_init(&safe_read_lock);
-	}
-
-	printk_safe_flush();
-}
-
 #ifdef CONFIG_PRINTK_NMI
-/*
- * Safe printk() for NMI context. It uses a per-CPU buffer to
- * store the message. NMIs are not nested, so there is always only
- * one writer running. But the buffer might get flushed from another
- * CPU, so we need to be careful.
- */
-static __printf(1, 0) int vprintk_nmi(const char *fmt, va_list args)
-{
-	struct printk_safe_seq_buf *s = this_cpu_ptr(&nmi_print_seq);
-
-	return printk_safe_log_store(s, fmt, args);
-}
-
 void noinstr printk_nmi_enter(void)
 {
 	this_cpu_add(printk_context, PRINTK_NMI_CONTEXT_OFFSET);
@@ -309,9 +32,6 @@ void noinstr printk_nmi_exit(void)
  * Marks a code that might produce many messages in NMI context
  * and the risk of losing them is more critical than eventual
  * reordering.
- *
- * It has effect only when called in NMI context. Then printk()
- * will store the messages into the main logbuf directly.
  */
 void printk_nmi_direct_enter(void)
 {
@@ -324,27 +44,8 @@ void printk_nmi_direct_exit(void)
 	this_cpu_and(printk_context, ~PRINTK_NMI_DIRECT_CONTEXT_MASK);
 }
 
-#else
-
-static __printf(1, 0) int vprintk_nmi(const char *fmt, va_list args)
-{
-	return 0;
-}
-
 #endif /* CONFIG_PRINTK_NMI */
 
-/*
- * Lock-less printk(), to avoid deadlocks should the printk() recurse
- * into itself. It uses a per-CPU buffer to store the message, just like
- * NMI.
- */
-static __printf(1, 0) int vprintk_safe(const char *fmt, va_list args)
-{
-	struct printk_safe_seq_buf *s = this_cpu_ptr(&safe_print_seq);
-
-	return printk_safe_log_store(s, fmt, args);
-}
-
 /* Can be preempted by NMI. */
 void __printk_safe_enter(void)
 {
@@ -369,7 +70,10 @@ asmlinkage int vprintk(const char *fmt, va_list args)
 	 * Use the main logbuf even in NMI. But avoid calling console
 	 * drivers that might have their own locks.
 	 */
-	if ((this_cpu_read(printk_context) & PRINTK_NMI_DIRECT_CONTEXT_MASK)) {
+	if (this_cpu_read(printk_context) &
+	    (PRINTK_NMI_DIRECT_CONTEXT_MASK |
+	     PRINTK_NMI_CONTEXT_MASK |
+	     PRINTK_SAFE_CONTEXT_MASK)) {
 		unsigned long flags;
 		int len;
 
@@ -380,35 +84,7 @@ asmlinkage int vprintk(const char *fmt, va_list args)
 		return len;
 	}
 
-	/* Use extra buffer in NMI. */
-	if (this_cpu_read(printk_context) & PRINTK_NMI_CONTEXT_MASK)
-		return vprintk_nmi(fmt, args);
-
-	/* Use extra buffer to prevent a recursion deadlock in safe mode. */
-	if (this_cpu_read(printk_context) & PRINTK_SAFE_CONTEXT_MASK)
-		return vprintk_safe(fmt, args);
-
 	/* No obstacles. */
 	return vprintk_default(fmt, args);
 }
 EXPORT_SYMBOL(vprintk);
-
-void __init printk_safe_init(void)
-{
-	int cpu;
-
-	for_each_possible_cpu(cpu) {
-		struct printk_safe_seq_buf *s;
-
-		s = &per_cpu(safe_print_seq, cpu);
-		init_irq_work(&s->work, __printk_safe_flush);
-
-#ifdef CONFIG_PRINTK_NMI
-		s = &per_cpu(nmi_print_seq, cpu);
-		init_irq_work(&s->work, __printk_safe_flush);
-#endif
-	}
-
-	/* Flush pending messages that did not have scheduled IRQ works. */
-	printk_safe_flush();
-}
diff --git a/lib/nmi_backtrace.c b/lib/nmi_backtrace.c
index dae233c5f597..9813a983d024 100644
--- a/lib/nmi_backtrace.c
+++ b/lib/nmi_backtrace.c
@@ -75,12 +75,6 @@ void nmi_trigger_cpumask_backtrace(const cpumask_t *mask,
 		touch_softlockup_watchdog();
 	}
 
-	/*
-	 * Force flush any remote buffers that might be stuck in IRQ context
-	 * and therefore could not run their irq_work.
-	 */
-	printk_safe_flush();
-
 	clear_bit_unlock(0, &backtrace_flag);
 	put_cpu();
 }
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH printk v3 4/6] printk: remove NMI tracking
  2021-06-24 11:11 [PATCH printk v3 0/6] printk: remove safe buffers John Ogness
                   ` (2 preceding siblings ...)
  2021-06-24 11:11 ` [PATCH printk v3 3/6] printk: remove safe buffers John Ogness
@ 2021-06-24 11:11 ` John Ogness
  2021-06-25 12:36   ` Petr Mladek
  2021-06-24 11:11 ` [PATCH printk v3 5/6] printk: convert @syslog_lock to mutex John Ogness
  2021-06-24 11:11 ` [PATCH printk v3 6/6] printk: syslog: close window between wait and read John Ogness
  5 siblings, 1 reply; 23+ messages in thread
From: John Ogness @ 2021-06-24 11:11 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Russell King, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, Ingo Molnar,
	Marc Zyngier, Valentin Schneider, Andrew Morton, Pekka Enberg,
	Mike Rapoport, Wolfram Sang (Renesas),
	Anshuman Khandual, Peter Zijlstra, Frederic Weisbecker,
	Kees Cook, Masahiro Yamada, Nathan Chancellor, Sami Tolvanen,
	Alexei Starovoitov, Nick Terrell, Chris Wilson, Vlastimil Babka,
	linux-arm-kernel, linuxppc-dev

All NMI contexts are handled the same as the safe context: store the
message and defer printing. There is no need to have special NMI
context tracking for this. Using in_nmi() is enough.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
---
 arch/arm/kernel/smp.c       |  2 --
 arch/powerpc/kexec/crash.c  |  3 ---
 include/linux/hardirq.h     |  2 --
 include/linux/printk.h      | 12 ------------
 init/Kconfig                |  5 -----
 kernel/printk/internal.h    |  6 ------
 kernel/printk/printk_safe.c | 37 +------------------------------------
 kernel/trace/trace.c        |  2 --
 8 files changed, 1 insertion(+), 68 deletions(-)

diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
index 74679240a9d8..0dd2d733ad62 100644
--- a/arch/arm/kernel/smp.c
+++ b/arch/arm/kernel/smp.c
@@ -668,9 +668,7 @@ static void do_handle_IPI(int ipinr)
 		break;
 
 	case IPI_CPU_BACKTRACE:
-		printk_nmi_enter();
 		nmi_cpu_backtrace(get_irq_regs());
-		printk_nmi_exit();
 		break;
 
 	default:
diff --git a/arch/powerpc/kexec/crash.c b/arch/powerpc/kexec/crash.c
index c9a889880214..d488311efab1 100644
--- a/arch/powerpc/kexec/crash.c
+++ b/arch/powerpc/kexec/crash.c
@@ -311,9 +311,6 @@ void default_machine_crash_shutdown(struct pt_regs *regs)
 	unsigned int i;
 	int (*old_handler)(struct pt_regs *regs);
 
-	/* Avoid hardlocking with irresponsive CPU holding logbuf_lock */
-	printk_nmi_enter();
-
 	/*
 	 * This function is only called after the system
 	 * has panicked or is otherwise in a critical state.
diff --git a/include/linux/hardirq.h b/include/linux/hardirq.h
index 69bc86ea382c..76878b357ffa 100644
--- a/include/linux/hardirq.h
+++ b/include/linux/hardirq.h
@@ -116,7 +116,6 @@ extern void rcu_nmi_exit(void);
 	do {							\
 		lockdep_off();					\
 		arch_nmi_enter();				\
-		printk_nmi_enter();				\
 		BUG_ON(in_nmi() == NMI_MASK);			\
 		__preempt_count_add(NMI_OFFSET + HARDIRQ_OFFSET);	\
 	} while (0)
@@ -135,7 +134,6 @@ extern void rcu_nmi_exit(void);
 	do {							\
 		BUG_ON(!in_nmi());				\
 		__preempt_count_sub(NMI_OFFSET + HARDIRQ_OFFSET);	\
-		printk_nmi_exit();				\
 		arch_nmi_exit();				\
 		lockdep_on();					\
 	} while (0)
diff --git a/include/linux/printk.h b/include/linux/printk.h
index 664612f75dac..14a18fa15c92 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -149,18 +149,6 @@ static inline __printf(1, 2) __cold
 void early_printk(const char *s, ...) { }
 #endif
 
-#ifdef CONFIG_PRINTK_NMI
-extern void printk_nmi_enter(void);
-extern void printk_nmi_exit(void);
-extern void printk_nmi_direct_enter(void);
-extern void printk_nmi_direct_exit(void);
-#else
-static inline void printk_nmi_enter(void) { }
-static inline void printk_nmi_exit(void) { }
-static inline void printk_nmi_direct_enter(void) { }
-static inline void printk_nmi_direct_exit(void) { }
-#endif /* PRINTK_NMI */
-
 struct dev_printk_info;
 
 #ifdef CONFIG_PRINTK
diff --git a/init/Kconfig b/init/Kconfig
index 5babea38e346..4053588db7a0 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1488,11 +1488,6 @@ config PRINTK
 	  very difficult to diagnose system problems, saying N here is
 	  strongly discouraged.
 
-config PRINTK_NMI
-	def_bool y
-	depends on PRINTK
-	depends on HAVE_NMI
-
 config BUG
 	bool "BUG() support" if EXPERT
 	default y
diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h
index 6cc35c5de890..ff890ae3ee6a 100644
--- a/kernel/printk/internal.h
+++ b/kernel/printk/internal.h
@@ -6,12 +6,6 @@
 
 #ifdef CONFIG_PRINTK
 
-#define PRINTK_SAFE_CONTEXT_MASK	0x007ffffff
-#define PRINTK_NMI_DIRECT_CONTEXT_MASK	0x008000000
-#define PRINTK_NMI_CONTEXT_MASK		0xff0000000
-
-#define PRINTK_NMI_CONTEXT_OFFSET	0x010000000
-
 __printf(4, 0)
 int vprintk_store(int facility, int level,
 		  const struct dev_printk_info *dev_info,
diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c
index 0456cd48d01c..47d961149a06 100644
--- a/kernel/printk/printk_safe.c
+++ b/kernel/printk/printk_safe.c
@@ -4,12 +4,9 @@
  */
 
 #include <linux/preempt.h>
-#include <linux/spinlock.h>
-#include <linux/debug_locks.h>
 #include <linux/kdb.h>
 #include <linux/smp.h>
 #include <linux/cpumask.h>
-#include <linux/irq_work.h>
 #include <linux/printk.h>
 #include <linux/kprobes.h>
 
@@ -17,35 +14,6 @@
 
 static DEFINE_PER_CPU(int, printk_context);
 
-#ifdef CONFIG_PRINTK_NMI
-void noinstr printk_nmi_enter(void)
-{
-	this_cpu_add(printk_context, PRINTK_NMI_CONTEXT_OFFSET);
-}
-
-void noinstr printk_nmi_exit(void)
-{
-	this_cpu_sub(printk_context, PRINTK_NMI_CONTEXT_OFFSET);
-}
-
-/*
- * Marks a code that might produce many messages in NMI context
- * and the risk of losing them is more critical than eventual
- * reordering.
- */
-void printk_nmi_direct_enter(void)
-{
-	if (this_cpu_read(printk_context) & PRINTK_NMI_CONTEXT_MASK)
-		this_cpu_or(printk_context, PRINTK_NMI_DIRECT_CONTEXT_MASK);
-}
-
-void printk_nmi_direct_exit(void)
-{
-	this_cpu_and(printk_context, ~PRINTK_NMI_DIRECT_CONTEXT_MASK);
-}
-
-#endif /* CONFIG_PRINTK_NMI */
-
 /* Can be preempted by NMI. */
 void __printk_safe_enter(void)
 {
@@ -70,10 +38,7 @@ asmlinkage int vprintk(const char *fmt, va_list args)
 	 * Use the main logbuf even in NMI. But avoid calling console
 	 * drivers that might have their own locks.
 	 */
-	if (this_cpu_read(printk_context) &
-	    (PRINTK_NMI_DIRECT_CONTEXT_MASK |
-	     PRINTK_NMI_CONTEXT_MASK |
-	     PRINTK_SAFE_CONTEXT_MASK)) {
+	if (this_cpu_read(printk_context) || in_nmi()) {
 		unsigned long flags;
 		int len;
 
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 915fe8790f04..6ac254f7a04c 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -9409,7 +9409,6 @@ void ftrace_dump(enum ftrace_dump_mode oops_dump_mode)
 	tracing_off();
 
 	local_irq_save(flags);
-	printk_nmi_direct_enter();
 
 	/* Simulate the iterator */
 	trace_init_global_iter(&iter);
@@ -9491,7 +9490,6 @@ void ftrace_dump(enum ftrace_dump_mode oops_dump_mode)
 		atomic_dec(&per_cpu_ptr(iter.array_buffer->data, cpu)->disabled);
 	}
 	atomic_dec(&dump_running);
-	printk_nmi_direct_exit();
 	local_irq_restore(flags);
 }
 EXPORT_SYMBOL_GPL(ftrace_dump);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH printk v3 5/6] printk: convert @syslog_lock to mutex
  2021-06-24 11:11 [PATCH printk v3 0/6] printk: remove safe buffers John Ogness
                   ` (3 preceding siblings ...)
  2021-06-24 11:11 ` [PATCH printk v3 4/6] printk: remove NMI tracking John Ogness
@ 2021-06-24 11:11 ` John Ogness
  2021-06-24 11:11 ` [PATCH printk v3 6/6] printk: syslog: close window between wait and read John Ogness
  5 siblings, 0 replies; 23+ messages in thread
From: John Ogness @ 2021-06-24 11:11 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel

@syslog_lock was a raw_spin_lock to simplify the transition of
removing @logbuf_lock and the safe buffers. With that transition
complete, and since all uses of @syslog_lock are within sleepable
contexts, @syslog_lock can become a mutex.

Note that until now register_console() would disable interrupts
using irqsave, which implies that it may be called with interrupts
disabled. And indeed, there is one possible call chain on parisc
where this happens:

handle_interruption(code=1) /* High-priority machine check (HPMC) */
  pdc_console_restart()
    pdc_console_init_force()
      register_console()

However, register_console() calls console_lock(), which might sleep.
So it has never been allowed to call register_console() from an
atomic context and the above call chain is a bug.

Note that the removal of read_syslog_seq_irq() is slightly changing
the behavior of SYSLOG_ACTION_READ by testing against a possibly
outdated @seq value. However, the value of @seq could have changed
after the test, so it is not a new window. A follow-up commit closes
this window.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
---
 kernel/printk/printk.c | 49 +++++++++++++++++-------------------------
 1 file changed, 20 insertions(+), 29 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 495520b7369c..90954cb5a0ab 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -356,7 +356,7 @@ enum log_flags {
 };
 
 /* syslog_lock protects syslog_* variables and write access to clear_seq. */
-static DEFINE_RAW_SPINLOCK(syslog_lock);
+static DEFINE_MUTEX(syslog_lock);
 
 #ifdef CONFIG_PRINTK
 DECLARE_WAIT_QUEUE_HEAD(log_wait);
@@ -1497,9 +1497,9 @@ static int syslog_print(char __user *buf, int size)
 		size_t n;
 		size_t skip;
 
-		raw_spin_lock_irq(&syslog_lock);
+		mutex_lock(&syslog_lock);
 		if (!prb_read_valid(prb, syslog_seq, &r)) {
-			raw_spin_unlock_irq(&syslog_lock);
+			mutex_unlock(&syslog_lock);
 			break;
 		}
 		if (r.info->seq != syslog_seq) {
@@ -1528,7 +1528,7 @@ static int syslog_print(char __user *buf, int size)
 			syslog_partial += n;
 		} else
 			n = 0;
-		raw_spin_unlock_irq(&syslog_lock);
+		mutex_unlock(&syslog_lock);
 
 		if (!n)
 			break;
@@ -1592,9 +1592,9 @@ static int syslog_print_all(char __user *buf, int size, bool clear)
 	}
 
 	if (clear) {
-		raw_spin_lock_irq(&syslog_lock);
+		mutex_lock(&syslog_lock);
 		latched_seq_write(&clear_seq, seq);
-		raw_spin_unlock_irq(&syslog_lock);
+		mutex_unlock(&syslog_lock);
 	}
 
 	kfree(text);
@@ -1603,21 +1603,9 @@ static int syslog_print_all(char __user *buf, int size, bool clear)
 
 static void syslog_clear(void)
 {
-	raw_spin_lock_irq(&syslog_lock);
+	mutex_lock(&syslog_lock);
 	latched_seq_write(&clear_seq, prb_next_seq(prb));
-	raw_spin_unlock_irq(&syslog_lock);
-}
-
-/* Return a consistent copy of @syslog_seq. */
-static u64 read_syslog_seq_irq(void)
-{
-	u64 seq;
-
-	raw_spin_lock_irq(&syslog_lock);
-	seq = syslog_seq;
-	raw_spin_unlock_irq(&syslog_lock);
-
-	return seq;
+	mutex_unlock(&syslog_lock);
 }
 
 int do_syslog(int type, char __user *buf, int len, int source)
@@ -1626,6 +1614,7 @@ int do_syslog(int type, char __user *buf, int len, int source)
 	bool clear = false;
 	static int saved_console_loglevel = LOGLEVEL_DEFAULT;
 	int error;
+	u64 seq;
 
 	error = check_syslog_permissions(type, source);
 	if (error)
@@ -1644,8 +1633,12 @@ int do_syslog(int type, char __user *buf, int len, int source)
 		if (!access_ok(buf, len))
 			return -EFAULT;
 
-		error = wait_event_interruptible(log_wait,
-				prb_read_valid(prb, read_syslog_seq_irq(), NULL));
+		/* Get a consistent copy of @syslog_seq. */
+		mutex_lock(&syslog_lock);
+		seq = syslog_seq;
+		mutex_unlock(&syslog_lock);
+
+		error = wait_event_interruptible(log_wait, prb_read_valid(prb, seq, NULL));
 		if (error)
 			return error;
 		error = syslog_print(buf, len);
@@ -1693,10 +1686,10 @@ int do_syslog(int type, char __user *buf, int len, int source)
 		break;
 	/* Number of chars in the log buffer */
 	case SYSLOG_ACTION_SIZE_UNREAD:
-		raw_spin_lock_irq(&syslog_lock);
+		mutex_lock(&syslog_lock);
 		if (!prb_read_valid_info(prb, syslog_seq, &info, NULL)) {
 			/* No unread messages. */
-			raw_spin_unlock_irq(&syslog_lock);
+			mutex_unlock(&syslog_lock);
 			return 0;
 		}
 		if (info.seq != syslog_seq) {
@@ -1714,7 +1707,6 @@ int do_syslog(int type, char __user *buf, int len, int source)
 		} else {
 			bool time = syslog_partial ? syslog_time : printk_time;
 			unsigned int line_count;
-			u64 seq;
 
 			prb_for_each_info(syslog_seq, prb, seq, &info,
 					  &line_count) {
@@ -1724,7 +1716,7 @@ int do_syslog(int type, char __user *buf, int len, int source)
 			}
 			error -= syslog_partial;
 		}
-		raw_spin_unlock_irq(&syslog_lock);
+		mutex_unlock(&syslog_lock);
 		break;
 	/* Size of the log buffer */
 	case SYSLOG_ACTION_SIZE_BUFFER:
@@ -2929,7 +2921,6 @@ static int try_enable_new_console(struct console *newcon, bool user_specified)
  */
 void register_console(struct console *newcon)
 {
-	unsigned long flags;
 	struct console *bcon = NULL;
 	int err;
 
@@ -3034,9 +3025,9 @@ void register_console(struct console *newcon)
 		exclusive_console_stop_seq = console_seq;
 
 		/* Get a consistent copy of @syslog_seq. */
-		raw_spin_lock_irqsave(&syslog_lock, flags);
+		mutex_lock(&syslog_lock);
 		console_seq = syslog_seq;
-		raw_spin_unlock_irqrestore(&syslog_lock, flags);
+		mutex_unlock(&syslog_lock);
 	}
 	console_unlock();
 	console_sysfs_notify();
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH printk v3 6/6] printk: syslog: close window between wait and read
  2021-06-24 11:11 [PATCH printk v3 0/6] printk: remove safe buffers John Ogness
                   ` (4 preceding siblings ...)
  2021-06-24 11:11 ` [PATCH printk v3 5/6] printk: convert @syslog_lock to mutex John Ogness
@ 2021-06-24 11:11 ` John Ogness
  2021-06-24 14:57   ` Petr Mladek
                     ` (2 more replies)
  5 siblings, 3 replies; 23+ messages in thread
From: John Ogness @ 2021-06-24 11:11 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel

Syslog's SYSLOG_ACTION_READ is supposed to block until the next
syslog record can be read, and then it should read that record.
However, because @syslog_lock is not held between waking up and
reading the record, another reader could read the record first,
thus causing SYSLOG_ACTION_READ to return with a value of 0, never
having read _anything_.

By holding @syslog_lock between waking up and reading, it can be
guaranteed that SYSLOG_ACTION_READ blocks until it successfully
reads a syslog record (or a real error occurs).

Signed-off-by: John Ogness <john.ogness@linutronix.de>
---
 kernel/printk/printk.c | 50 +++++++++++++++++++++++++++++++-----------
 1 file changed, 37 insertions(+), 13 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 90954cb5a0ab..4737804d6c6d 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1486,6 +1486,7 @@ static int syslog_print(char __user *buf, int size)
 	struct printk_record r;
 	char *text;
 	int len = 0;
+	u64 seq;
 
 	text = kmalloc(CONSOLE_LOG_MAX, GFP_KERNEL);
 	if (!text)
@@ -1493,11 +1494,38 @@ static int syslog_print(char __user *buf, int size)
 
 	prb_rec_init_rd(&r, &info, text, CONSOLE_LOG_MAX);
 
-	while (size > 0) {
+	/* Get a consistent copy of @syslog_seq. */
+	mutex_lock(&syslog_lock);
+	seq = syslog_seq;
+	mutex_unlock(&syslog_lock);
+
+	/* Wait for the @syslog_seq record to be available. */
+	for (;;) {
+		len = wait_event_interruptible(log_wait, prb_read_valid(prb, seq, NULL));
+		if (len)
+			goto out;
+
+		/*
+		 * @syslog_seq may have changed while waiting. If so, wait
+		 * for the new @syslog_seq record.
+		 */
+
+		mutex_lock(&syslog_lock);
+		if (syslog_seq == seq)
+			break;
+		seq = syslog_seq;
+		mutex_unlock(&syslog_lock);
+	}
+
+	/*
+	 * @syslog_lock is held when entering the read loop to prevent
+	 * another reader from modifying @syslog_seq.
+	 */
+
+	for (;;) {
 		size_t n;
 		size_t skip;
 
-		mutex_lock(&syslog_lock);
 		if (!prb_read_valid(prb, syslog_seq, &r)) {
 			mutex_unlock(&syslog_lock);
 			break;
@@ -1542,8 +1570,13 @@ static int syslog_print(char __user *buf, int size)
 		len += n;
 		size -= n;
 		buf += n;
-	}
 
+		if (!size)
+			break;
+
+		mutex_lock(&syslog_lock);
+	}
+out:
 	kfree(text);
 	return len;
 }
@@ -1614,7 +1647,6 @@ int do_syslog(int type, char __user *buf, int len, int source)
 	bool clear = false;
 	static int saved_console_loglevel = LOGLEVEL_DEFAULT;
 	int error;
-	u64 seq;
 
 	error = check_syslog_permissions(type, source);
 	if (error)
@@ -1632,15 +1664,6 @@ int do_syslog(int type, char __user *buf, int len, int source)
 			return 0;
 		if (!access_ok(buf, len))
 			return -EFAULT;
-
-		/* Get a consistent copy of @syslog_seq. */
-		mutex_lock(&syslog_lock);
-		seq = syslog_seq;
-		mutex_unlock(&syslog_lock);
-
-		error = wait_event_interruptible(log_wait, prb_read_valid(prb, seq, NULL));
-		if (error)
-			return error;
 		error = syslog_print(buf, len);
 		break;
 	/* Read/clear last kernel messages */
@@ -1707,6 +1730,7 @@ int do_syslog(int type, char __user *buf, int len, int source)
 		} else {
 			bool time = syslog_partial ? syslog_time : printk_time;
 			unsigned int line_count;
+			u64 seq;
 
 			prb_for_each_info(syslog_seq, prb, seq, &info,
 					  &line_count) {
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH printk v3 1/6] lib/nmi_backtrace: explicitly serialize banner and regs
  2021-06-24 11:11 ` [PATCH printk v3 1/6] lib/nmi_backtrace: explicitly serialize banner and regs John Ogness
@ 2021-06-24 12:26   ` Petr Mladek
  0 siblings, 0 replies; 23+ messages in thread
From: Petr Mladek @ 2021-06-24 12:26 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Paul E. McKenney

On Thu 2021-06-24 13:17:43, John Ogness wrote:
> Currently the nmi_backtrace is serialized against other CPUs because
> the messages are sent to the NMI buffers. Once these buffers are
> removed, only the dumped stack will be serialized against other CPUs
> (via the printk_cpu_lock).
> 
> Also serialize the nmi_backtrace banner and regs using the
> printk_cpu_lock so that per-CPU serialization will be preserved even
> after the NMI buffers are removed.
> 
> Signed-off-by: John Ogness <john.ogness@linutronix.de>

Reviewed-by: Petr Mladek <pmladek@suse.com>

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH printk v3 2/6] printk: track/limit recursion
  2021-06-24 11:11 ` [PATCH printk v3 2/6] printk: track/limit recursion John Ogness
@ 2021-06-24 12:55   ` Petr Mladek
  0 siblings, 0 replies; 23+ messages in thread
From: Petr Mladek @ 2021-06-24 12:55 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel

On Thu 2021-06-24 13:17:44, John Ogness wrote:
> Currently the printk safe buffers provide a form of recursion
> protection by redirecting to the safe buffers whenever printk() is
> recursively called.
> 
> In preparation for removal of the safe buffers, provide an alternate
> explicit recursion protection. Recursion is limited to 3 levels
> per-CPU and per-context.
> 
> Signed-off-by: John Ogness <john.ogness@linutronix.de>

It looks pretty straightforward.

Reviewed-by: Petr Mladek <pmladek@suse.com>

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH printk v3 3/6] printk: remove safe buffers
  2021-06-24 11:11 ` [PATCH printk v3 3/6] printk: remove safe buffers John Ogness
@ 2021-06-24 14:49   ` Petr Mladek
  2021-06-24 15:35     ` John Ogness
  0 siblings, 1 reply; 23+ messages in thread
From: Petr Mladek @ 2021-06-24 14:49 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Eric Biederman, Nicholas Piggin,
	Christophe Leroy, Cédric Le Goater, Andrew Morton,
	Kees Cook, Tiezhu Yang, Yue Hu, Alexey Kardashevskiy,
	Paul E. McKenney, linuxppc-dev, kexec

On Thu 2021-06-24 13:17:45, John Ogness wrote:
> With @logbuf_lock removed, the high level printk functions for
> storing messages are lockless. Messages can be stored from any
> context, so there is no need for the NMI and safe buffers anymore.
> Remove the NMI and safe buffers.
> 
> Although the safe buffers are removed, the NMI and safe context
> tracking is still in place. In these contexts, store the message
> immediately but still use irq_work to defer the console printing.
> 
> Since printk recursion tracking is in place, safe context tracking
> for most of printk is not needed. Remove it. Only safe context
> tracking relating to the console lock is left in place. This is
> because the console lock is needed for the actual printing.

Feel free to use:

Reviewed-by: Petr Mladek <pmladek@suse.com>

There are some comments below.

> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -1852,7 +1839,7 @@ static int console_trylock_spinning(void)
>  	if (console_trylock())
>  		return 1;
>  
> -	printk_safe_enter_irqsave(flags);
> +	local_irq_save(flags);
>  
>  	raw_spin_lock(&console_owner_lock);

This spin_lock is in the printk() path. We must make sure that
it does not cause deadlock.

printk_safe_enter_irqsave(flags) prevented the recursion because
it deferred the console handling.

One danger might be a lockdep report triggered by
raw_spin_lock(&console_owner_lock) itself. But it should be safe.
lockdep is checked before the lock is actually taken
and lockdep should disable itself before printing anything.

Another danger might be any printk() called under the lock.
The code just compares and assigns values to some variables
(static, on stack) so we should be on the safe side.

Well, I would feel more comfortable if we add printk_safe_enter_irqsave()
back around the sections guarded by this lock. It can be done
in a separate patch. The code looks safe at the moment.


> @@ -2664,9 +2648,9 @@ void console_unlock(void)
>  
>  	for (;;) {
>  		size_t ext_len = 0;
> +		int handover;
>  		size_t len;
>  
> -		printk_safe_enter_irqsave(flags);
>  skip:
>  		if (!prb_read_valid(prb, console_seq, &r))
>  			break;
> @@ -2716,19 +2700,22 @@ void console_unlock(void)
>  		 * were to occur on another CPU, it may wait for this one to
>  		 * finish. This task can not be preempted if there is a
>  		 * waiter waiting to take over.
> +		 *
> +		 * Interrupts are disabled because the hand over to a waiter
> +		 * must not be interrupted until the hand over is completed
> +		 * (@console_waiter is cleared).
>  		 */
> +		local_irq_save(flags);
>  		console_lock_spinning_enable();

Same here. console_lock_spinning_enable() takes console_owner_lock.
I would feel more comfortable if we added printk_safe_enter_irqsave(flags)
inside console_lock_spinning_enable() around the locked code. The code
is safe at the moment but...

>  
>  		stop_critical_timings();	/* don't trace print latency */
>  		call_console_drivers(ext_text, ext_len, text, len);
>  		start_critical_timings();
>  
> -		if (console_lock_spinning_disable_and_check()) {
> -			printk_safe_exit_irqrestore(flags);
> +		handover = console_lock_spinning_disable_and_check();

Same here. Also console_lock_spinning_disable_and_check() takes
console_owner_lock. It looks safe at the moment but...


> +		local_irq_restore(flags);
> +		if (handover)
>  			return;
> -		}
> -
> -		printk_safe_exit_irqrestore(flags);
>  
>  		if (do_cond_resched)
>  			cond_resched();

> --- a/kernel/printk/printk_safe.c
> +++ b/kernel/printk/printk_safe.c
> @@ -369,7 +70,10 @@ asmlinkage int vprintk(const char *fmt, va_list args)
>  	 * Use the main logbuf even in NMI. But avoid calling console
>  	 * drivers that might have their own locks.
>  	 */
> -	if ((this_cpu_read(printk_context) & PRINTK_NMI_DIRECT_CONTEXT_MASK)) {
> +	if (this_cpu_read(printk_context) &
> +	    (PRINTK_NMI_DIRECT_CONTEXT_MASK |
> +	     PRINTK_NMI_CONTEXT_MASK |
> +	     PRINTK_SAFE_CONTEXT_MASK)) {
>  		unsigned long flags;
>  		int len;
>  

There is the following code right below:

		printk_safe_enter_irqsave(flags);
		len = vprintk_store(0, LOGLEVEL_DEFAULT, NULL, fmt, args);
		printk_safe_exit_irqrestore(flags);
		defer_console_output();
		return len;

printk_safe_enter_irqsave(flags) is not needed here. Any nested
printk() ends here as well.

Against this can be done in a separate patch. Well, the commit message
mentions that the printk_safe context is removed everywhere except
for the code manipulating console lock. But is it just a detail.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH printk v3 6/6] printk: syslog: close window between wait and read
  2021-06-24 11:11 ` [PATCH printk v3 6/6] printk: syslog: close window between wait and read John Ogness
@ 2021-06-24 14:57   ` Petr Mladek
  2021-06-24 15:25   ` Petr Mladek
  2021-06-25 13:33   ` Steven Rostedt
  2 siblings, 0 replies; 23+ messages in thread
From: Petr Mladek @ 2021-06-24 14:57 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel

On Thu 2021-06-24 13:17:48, John Ogness wrote:
> Syslog's SYSLOG_ACTION_READ is supposed to block until the next
> syslog record can be read, and then it should read that record.
> However, because @syslog_lock is not held between waking up and
> reading the record, another reader could read the record first,
> thus causing SYSLOG_ACTION_READ to return with a value of 0, never
> having read _anything_.
> 
> By holding @syslog_lock between waking up and reading, it can be
> guaranteed that SYSLOG_ACTION_READ blocks until it successfully
> reads a syslog record (or a real error occurs).
> 
> Signed-off-by: John Ogness <john.ogness@linutronix.de>

Reviewed-by: Petr Mladek <pmladek@suse.com>

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH printk v3 6/6] printk: syslog: close window between wait and read
  2021-06-24 11:11 ` [PATCH printk v3 6/6] printk: syslog: close window between wait and read John Ogness
  2021-06-24 14:57   ` Petr Mladek
@ 2021-06-24 15:25   ` Petr Mladek
  2021-06-25  8:11     ` John Ogness
  2021-06-25 13:33   ` Steven Rostedt
  2 siblings, 1 reply; 23+ messages in thread
From: Petr Mladek @ 2021-06-24 15:25 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel

On Thu 2021-06-24 13:17:48, John Ogness wrote:
> Syslog's SYSLOG_ACTION_READ is supposed to block until the next
> syslog record can be read, and then it should read that record.
> However, because @syslog_lock is not held between waking up and
> reading the record, another reader could read the record first,
> thus causing SYSLOG_ACTION_READ to return with a value of 0, never
> having read _anything_.
> 
> By holding @syslog_lock between waking up and reading, it can be
> guaranteed that SYSLOG_ACTION_READ blocks until it successfully
> reads a syslog record (or a real error occurs).
> 
> Signed-off-by: John Ogness <john.ogness@linutronix.de>
> ---
>  kernel/printk/printk.c | 50 +++++++++++++++++++++++++++++++-----------
>  1 file changed, 37 insertions(+), 13 deletions(-)
> 
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 90954cb5a0ab..4737804d6c6d 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -1542,8 +1570,13 @@ static int syslog_print(char __user *buf, int size)
>  		len += n;
>  		size -= n;
>  		buf += n;
> -	}
>  
> +		if (!size)
> +			break;

This looks like an unrelated optimization. If I get it correctly,
it does not change the existing behavior. The next cycle would
end up with n == 0 and break anyway.

It would have been better to do it in a separate patch or do not do
it at all or at least mention it in the commit message.

> +
> +		mutex_lock(&syslog_lock);
> +	}
> +out:
>  	kfree(text);
>  	return len;
>  }

The patch itself makes sense. It somehow fixes a long standing race.
Even though the result still might be racy. The lock is released
when each record is copied to the user-provided buffer.

I do not want to block it because of details. Feel free to use:

Reviewed-by: Petr Mladek <pmladek@suse.com>

but I would feel more comfortable if we handled the optimization one
of the suggested way.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH printk v3 3/6] printk: remove safe buffers
  2021-06-24 14:49   ` Petr Mladek
@ 2021-06-24 15:35     ` John Ogness
  2021-06-25 12:41       ` Petr Mladek
  0 siblings, 1 reply; 23+ messages in thread
From: John Ogness @ 2021-06-24 15:35 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Eric Biederman, Nicholas Piggin,
	Christophe Leroy, Cédric Le Goater, Andrew Morton,
	Kees Cook, Tiezhu Yang, Yue Hu, Alexey Kardashevskiy,
	Paul E. McKenney, linuxppc-dev, kexec

On 2021-06-24, Petr Mladek <pmladek@suse.com> wrote:
>> --- a/kernel/printk/printk.c
>> +++ b/kernel/printk/printk.c
>> @@ -1852,7 +1839,7 @@ static int console_trylock_spinning(void)
>>  	if (console_trylock())
>>  		return 1;
>>  
>> -	printk_safe_enter_irqsave(flags);
>> +	local_irq_save(flags);
>>  
>>  	raw_spin_lock(&console_owner_lock);
>
> This spin_lock is in the printk() path. We must make sure that
> it does not cause deadlock.
>
> printk_safe_enter_irqsave(flags) prevented the recursion because
> it deferred the console handling.
>
> One danger might be a lockdep report triggered by
> raw_spin_lock(&console_owner_lock) itself. But it should be safe.
> lockdep is checked before the lock is actually taken
> and lockdep should disable itself before printing anything.
>
> Another danger might be any printk() called under the lock.
> The code just compares and assigns values to some variables
> (static, on stack) so we should be on the safe side.
>
> Well, I would feel more comfortable if we add printk_safe_enter_irqsave()
> back around the sections guarded by this lock. It can be done
> in a separate patch. The code looks safe at the moment.

You are correct. printk_safe should also be wrapping @console_owner_lock
locking.

>> @@ -2716,19 +2700,22 @@ void console_unlock(void)
>>  		 * were to occur on another CPU, it may wait for this one to
>>  		 * finish. This task can not be preempted if there is a
>>  		 * waiter waiting to take over.
>> +		 *
>> +		 * Interrupts are disabled because the hand over to a waiter
>> +		 * must not be interrupted until the hand over is completed
>> +		 * (@console_waiter is cleared).
>>  		 */
>> +		local_irq_save(flags);
>>  		console_lock_spinning_enable();
>
> Same here. console_lock_spinning_enable() takes console_owner_lock.
> I would feel more comfortable if we added printk_safe_enter_irqsave(flags)
> inside console_lock_spinning_enable() around the locked code. The code
> is safe at the moment but...

Agreed.

>>  		stop_critical_timings();	/* don't trace print latency */
>>  		call_console_drivers(ext_text, ext_len, text, len);
>>  		start_critical_timings();
>>  
>> -		if (console_lock_spinning_disable_and_check()) {
>> -			printk_safe_exit_irqrestore(flags);
>> +		handover = console_lock_spinning_disable_and_check();
>
> Same here. Also console_lock_spinning_disable_and_check() takes
> console_owner_lock. It looks safe at the moment but...

Agreed.

>> --- a/kernel/printk/printk_safe.c
>> +++ b/kernel/printk/printk_safe.c
>> @@ -369,7 +70,10 @@ asmlinkage int vprintk(const char *fmt, va_list args)
>>  	 * Use the main logbuf even in NMI. But avoid calling console
>>  	 * drivers that might have their own locks.
>>  	 */
>> -	if ((this_cpu_read(printk_context) & PRINTK_NMI_DIRECT_CONTEXT_MASK)) {
>> +	if (this_cpu_read(printk_context) &
>> +	    (PRINTK_NMI_DIRECT_CONTEXT_MASK |
>> +	     PRINTK_NMI_CONTEXT_MASK |
>> +	     PRINTK_SAFE_CONTEXT_MASK)) {
>>  		unsigned long flags;
>>  		int len;
>>  
>
> There is the following code right below:
>
> 		printk_safe_enter_irqsave(flags);
> 		len = vprintk_store(0, LOGLEVEL_DEFAULT, NULL, fmt, args);
> 		printk_safe_exit_irqrestore(flags);
> 		defer_console_output();
> 		return len;
>
> printk_safe_enter_irqsave(flags) is not needed here. Any nested
> printk() ends here as well.

Ah, I missed that one. Good eye!

> Against this can be done in a separate patch. Well, the commit message
> mentions that the printk_safe context is removed everywhere except
> for the code manipulating console lock. But is it just a detail.

I would prefer a v4 with these fixes:

- wrap @console_owner_lock with printk_safe usage

- remove unnecessary printk_safe usage from printk_safe.c

- update commit message to say that safe context tracking is left in
  place for both the console and console_owner locks

John Ogness

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH printk v3 6/6] printk: syslog: close window between wait and read
  2021-06-24 15:25   ` Petr Mladek
@ 2021-06-25  8:11     ` John Ogness
  2021-06-25 14:55       ` Petr Mladek
  0 siblings, 1 reply; 23+ messages in thread
From: John Ogness @ 2021-06-25  8:11 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel

On 2021-06-24, Petr Mladek <pmladek@suse.com> wrote:
>> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
>> index 90954cb5a0ab..4737804d6c6d 100644
>> --- a/kernel/printk/printk.c
>> +++ b/kernel/printk/printk.c
>> @@ -1542,8 +1570,13 @@ static int syslog_print(char __user *buf, int size)
>>  		len += n;
>>  		size -= n;
>>  		buf += n;
>> -	}
>>  
>> +		if (!size)
>> +			break;
>
> This looks like an unrelated optimization. If I get it correctly, it
> does not change the existing behavior.

It was a necessary change in order to preserve the existing logic but
allow the lock to be held when enterring the loop. Before the patch we
have:

        ...get seq to read...

        while (size > 0) {
            mutex_lock(&syslog_lock);
            ...read record...
            mutex_unlock(&syslog_lock);
            ...copy record...
       }

After the patch we enter the loop with the lock already held. So this
changes the code to:

        mutex_lock(&syslog_lock);
        ...get seq to read...

        for (;;) {
            ...read record...
            mutex_unlock(&syslog_lock);
            ...copy record...
           
            if (!size)
                break;
            mutex_lock(&syslog_lock);               
        }

Note that @size always starts with >0, so there is no need to check it
at the beginning of the loop. And checking for !0 instead of >0 is also
ok, since @size will never be less than zero.

> The next cycle would end up with n == 0 and break anyway.

Doing an extra loop of reading more data and sprinting it into the
temporary buffer even though we know the user buffer is not desirable.

If you insisted on keeping the "while (size > 0)" loop, then there would
be an unnecessary lock/unlock call and the code gets even more complex.

I could add some comments to the implementation if you prefer.

> The patch itself makes sense. It somehow fixes a long standing race.
> Even though the result still might be racy. The lock is released
> when each record is copied to the user-provided buffer.

I do not understand this conclusion. The existing race is
real. SYSLOG_ACTION_READ could return with no data, not because there is
no records available, but because the race was hit. With this patch that
race is closed: SYSLOG_ACTION_READ will either return with data or with
an error.

You claim the result is still racy, but I do not know what you are
referring to. If you have multiple readers, they will get different
records (and record pieces), but collectively no data would be lost and
no data would be redundant. And no readers would return from
SYSLOG_ACTION_READ without data.

> I would feel more comfortable if we handled the optimization one of
> the suggested way.

There is no optimization here. Perhaps you have missed that the loop
changes from "while (size > 0)" to "for (;;)".

John Ogness

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH printk v3 4/6] printk: remove NMI tracking
  2021-06-24 11:11 ` [PATCH printk v3 4/6] printk: remove NMI tracking John Ogness
@ 2021-06-25 12:36   ` Petr Mladek
  2021-06-25 13:34     ` Russell King (Oracle)
  0 siblings, 1 reply; 23+ messages in thread
From: Petr Mladek @ 2021-06-25 12:36 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Russell King, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, Ingo Molnar,
	Marc Zyngier, Valentin Schneider, Andrew Morton, Pekka Enberg,
	Mike Rapoport, Wolfram Sang (Renesas),
	Anshuman Khandual, Peter Zijlstra, Frederic Weisbecker,
	Kees Cook, Masahiro Yamada, Nathan Chancellor, Sami Tolvanen,
	Alexei Starovoitov, Nick Terrell, Chris Wilson, Vlastimil Babka,
	linux-arm-kernel, linuxppc-dev

On Thu 2021-06-24 13:17:46, John Ogness wrote:
> All NMI contexts are handled the same as the safe context: store the
> message and defer printing. There is no need to have special NMI
> context tracking for this. Using in_nmi() is enough.
> 
> Signed-off-by: John Ogness <john.ogness@linutronix.de>
> Reviewed-by: Petr Mladek <pmladek@suse.com>
> ---
>  arch/arm/kernel/smp.c       |  2 --
>  arch/powerpc/kexec/crash.c  |  3 ---
>  include/linux/hardirq.h     |  2 --
>  include/linux/printk.h      | 12 ------------
>  init/Kconfig                |  5 -----
>  kernel/printk/internal.h    |  6 ------
>  kernel/printk/printk_safe.c | 37 +------------------------------------
>  kernel/trace/trace.c        |  2 --
>  8 files changed, 1 insertion(+), 68 deletions(-)
> 
> diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
> index 74679240a9d8..0dd2d733ad62 100644
> --- a/arch/arm/kernel/smp.c
> +++ b/arch/arm/kernel/smp.c
> @@ -668,9 +668,7 @@ static void do_handle_IPI(int ipinr)
>  		break;
>  
>  	case IPI_CPU_BACKTRACE:
> -		printk_nmi_enter();
>  		nmi_cpu_backtrace(get_irq_regs());
> -		printk_nmi_exit();

It looks to me that in_nmi() returns false here. As a result,
nmi_cpu_backtrace() might newly call consoles immediately.

If I recall correctly, arm does not have a proper NMI.
And this is just some special case of a "normal" IRQ.

And indeed, nmi_enter() is called only from handle_fiq_as_nmi()
and it is just a boiler plate.

If I am right, we should replace printk_nmi_enter() with
printk_safe_enter_irqsave(flags) or so.

Even better solution might be to call this within
nmi_enter()/nmi_exit(). But I am not sure if this is what
the arm people want.

Best Regards,
Petr

PS: Sigh, I have skipped this patch yesterday because it already had
my Reviewed-by. And I missed it before...

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH printk v3 3/6] printk: remove safe buffers
  2021-06-24 15:35     ` John Ogness
@ 2021-06-25 12:41       ` Petr Mladek
  0 siblings, 0 replies; 23+ messages in thread
From: Petr Mladek @ 2021-06-25 12:41 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Eric Biederman, Nicholas Piggin,
	Christophe Leroy, Cédric Le Goater, Andrew Morton,
	Kees Cook, Tiezhu Yang, Yue Hu, Alexey Kardashevskiy,
	Paul E. McKenney, linuxppc-dev, kexec

On Thu 2021-06-24 17:41:56, John Ogness wrote:
> I would prefer a v4 with these fixes:
> 
> - wrap @console_owner_lock with printk_safe usage
> 
> - remove unnecessary printk_safe usage from printk_safe.c
> 
> - update commit message to say that safe context tracking is left in
>   place for both the console and console_owner locks

Sounds good to me.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH printk v3 6/6] printk: syslog: close window between wait and read
  2021-06-24 11:11 ` [PATCH printk v3 6/6] printk: syslog: close window between wait and read John Ogness
  2021-06-24 14:57   ` Petr Mladek
  2021-06-24 15:25   ` Petr Mladek
@ 2021-06-25 13:33   ` Steven Rostedt
  2021-06-25 14:14     ` John Ogness
  2021-06-28 14:35     ` Petr Mladek
  2 siblings, 2 replies; 23+ messages in thread
From: Steven Rostedt @ 2021-06-25 13:33 UTC (permalink / raw)
  To: John Ogness
  Cc: Petr Mladek, Sergey Senozhatsky, Thomas Gleixner, linux-kernel

On Thu, 24 Jun 2021 13:17:48 +0206
John Ogness <john.ogness@linutronix.de> wrote:

> +	 * @syslog_lock is held when entering the read loop to prevent
> +	 * another reader from modifying @syslog_seq.

You should add to the above comment:

	 * And the @syslog_lock is released before exiting the loop.

Because it's not normal to enter a loop locked, and have it unlocked
when exiting the loop. And I can envision in the future, someone might
add a break (for error) while still holding the lock.

-- Steve

> +	 */
> +
> +	for (;;) {
>  		size_t n;
>  		size_t skip;
>  
> -		mutex_lock(&syslog_lock);
>  		if (!prb_read_valid(prb, syslog_seq, &r)) {
>  			mutex_unlock(&syslog_lock);
>  			break;
> @@ -1542,8 +1570,13 @@ static int syslog_print(char __user *buf, int size)
>  		len += n;
>  		size -= n;
>  		buf += n;
> -	}
>  
> +		if (!size)
> +			break;
> +
> +		mutex_lock(&syslog_lock);
> +	}

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH printk v3 4/6] printk: remove NMI tracking
  2021-06-25 12:36   ` Petr Mladek
@ 2021-06-25 13:34     ` Russell King (Oracle)
  0 siblings, 0 replies; 23+ messages in thread
From: Russell King (Oracle) @ 2021-06-25 13:34 UTC (permalink / raw)
  To: Petr Mladek
  Cc: John Ogness, Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Ingo Molnar, Marc Zyngier, Valentin Schneider,
	Andrew Morton, Pekka Enberg, Mike Rapoport,
	Wolfram Sang (Renesas),
	Anshuman Khandual, Peter Zijlstra, Frederic Weisbecker,
	Kees Cook, Masahiro Yamada, Nathan Chancellor, Sami Tolvanen,
	Alexei Starovoitov, Nick Terrell, Chris Wilson, Vlastimil Babka,
	linux-arm-kernel, linuxppc-dev

On Fri, Jun 25, 2021 at 02:36:23PM +0200, Petr Mladek wrote:
> On Thu 2021-06-24 13:17:46, John Ogness wrote:
> > All NMI contexts are handled the same as the safe context: store the
> > message and defer printing. There is no need to have special NMI
> > context tracking for this. Using in_nmi() is enough.
> > 
> > Signed-off-by: John Ogness <john.ogness@linutronix.de>
> > Reviewed-by: Petr Mladek <pmladek@suse.com>
> > ---
> >  arch/arm/kernel/smp.c       |  2 --
> >  arch/powerpc/kexec/crash.c  |  3 ---
> >  include/linux/hardirq.h     |  2 --
> >  include/linux/printk.h      | 12 ------------
> >  init/Kconfig                |  5 -----
> >  kernel/printk/internal.h    |  6 ------
> >  kernel/printk/printk_safe.c | 37 +------------------------------------
> >  kernel/trace/trace.c        |  2 --
> >  8 files changed, 1 insertion(+), 68 deletions(-)
> > 
> > diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
> > index 74679240a9d8..0dd2d733ad62 100644
> > --- a/arch/arm/kernel/smp.c
> > +++ b/arch/arm/kernel/smp.c
> > @@ -668,9 +668,7 @@ static void do_handle_IPI(int ipinr)
> >  		break;
> >  
> >  	case IPI_CPU_BACKTRACE:
> > -		printk_nmi_enter();
> >  		nmi_cpu_backtrace(get_irq_regs());
> > -		printk_nmi_exit();
> 
> It looks to me that in_nmi() returns false here. As a result,
> nmi_cpu_backtrace() might newly call consoles immediately.
> 
> If I recall correctly, arm does not have a proper NMI.
> And this is just some special case of a "normal" IRQ.
> 
> And indeed, nmi_enter() is called only from handle_fiq_as_nmi()
> and it is just a boiler plate.
> 
> If I am right, we should replace printk_nmi_enter() with
> printk_safe_enter_irqsave(flags) or so.
> 
> Even better solution might be to call this within
> nmi_enter()/nmi_exit(). But I am not sure if this is what
> the arm people want.

As I seem to recall, the guy in ARM Ltd who was working on this seemed
to drift away and it never got finished - however, I've always carried
platform specific hacks in my tree to make this work from FIQ on the
platforms I cared about:

http://git.armlinux.org.uk/cgit/linux-arm.git/commit/?h=fiq

Not suitable for mainline like that. I'm not aware of anyone working on
it now.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH printk v3 6/6] printk: syslog: close window between wait and read
  2021-06-25 13:33   ` Steven Rostedt
@ 2021-06-25 14:14     ` John Ogness
  2021-06-28 14:35     ` Petr Mladek
  1 sibling, 0 replies; 23+ messages in thread
From: John Ogness @ 2021-06-25 14:14 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Petr Mladek, Sergey Senozhatsky, Thomas Gleixner, linux-kernel

On 2021-06-25, Steven Rostedt <rostedt@goodmis.org> wrote:
>> +	 * @syslog_lock is held when entering the read loop to prevent
>> +	 * another reader from modifying @syslog_seq.
>
> You should add to the above comment:
>
> 	 * And the @syslog_lock is released before exiting the loop.
>
> Because it's not normal to enter a loop locked, and have it unlocked
> when exiting the loop. And I can envision in the future, someone might
> add a break (for error) while still holding the lock.

Agreed. Thanks.

John

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH printk v3 6/6] printk: syslog: close window between wait and read
  2021-06-25  8:11     ` John Ogness
@ 2021-06-25 14:55       ` Petr Mladek
  0 siblings, 0 replies; 23+ messages in thread
From: Petr Mladek @ 2021-06-25 14:55 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel

On Fri 2021-06-25 10:17:40, John Ogness wrote:
> On 2021-06-24, Petr Mladek <pmladek@suse.com> wrote:
> >> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> >> index 90954cb5a0ab..4737804d6c6d 100644
> >> --- a/kernel/printk/printk.c
> >> +++ b/kernel/printk/printk.c
> >> @@ -1542,8 +1570,13 @@ static int syslog_print(char __user *buf, int size)
> >>  		len += n;
> >>  		size -= n;
> >>  		buf += n;
> >> -	}
> >>  
> >> +		if (!size)
> >> +			break;
> >
> > This looks like an unrelated optimization. If I get it correctly, it
> > does not change the existing behavior.
> 
> It was a necessary change in order to preserve the existing logic but
> allow the lock to be held when enterring the loop. Before the patch we
> have:
> 
>         ...get seq to read...
> 
>         while (size > 0) {
>             mutex_lock(&syslog_lock);
>             ...read record...
>             mutex_unlock(&syslog_lock);
>             ...copy record...
>        }
> 
> After the patch we enter the loop with the lock already held. So this
> changes the code to:
> 
>         mutex_lock(&syslog_lock);
>         ...get seq to read...
> 
>         for (;;) {
>             ...read record...
>             mutex_unlock(&syslog_lock);
>             ...copy record...
>            
>             if (!size)
>                 break;
>             mutex_lock(&syslog_lock);               
>         }
> 
> Note that @size always starts with >0, so there is no need to check it
> at the beginning of the loop. And checking for !0 instead of >0 is also
> ok, since @size will never be less than zero.

Ah, I have missed that you replaced the while-cycle with a for-cycle.
It makes sense now.

Plese, just mention these changes in the commit message. I mean that
size is always >0 at the befinning and never <0 later.

> > The patch itself makes sense. It somehow fixes a long standing race.
> > Even though the result still might be racy. The lock is released
> > when each record is copied to the user-provided buffer.
> 
> I do not understand this conclusion. The existing race is
> real. SYSLOG_ACTION_READ could return with no data, not because there is
> no records available, but because the race was hit. With this patch that
> race is closed: SYSLOG_ACTION_READ will either return with data or with
> an error.
> 
> You claim the result is still racy, but I do not know what you are
> referring to. If you have multiple readers, they will get different
> records (and record pieces), but collectively no data would be lost and
> no data would be redundant. And no readers would return from
> SYSLOG_ACTION_READ without data.

I mean that each reader will still get random lines. The race is that
it is not guaranteed what reader would get what lines.

By other words, the improvement is that each reader will read
at least something. But it is still not guaranteed that it will
see everything.

My understanding is that it was designed for a single daemon reading
all messages. And dmesg might probably cause races when using
the syslog interface.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH printk v3 6/6] printk: syslog: close window between wait and read
  2021-06-25 13:33   ` Steven Rostedt
  2021-06-25 14:14     ` John Ogness
@ 2021-06-28 14:35     ` Petr Mladek
  2021-06-28 14:52       ` Steven Rostedt
  1 sibling, 1 reply; 23+ messages in thread
From: Petr Mladek @ 2021-06-28 14:35 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: John Ogness, Sergey Senozhatsky, Thomas Gleixner, linux-kernel

On Fri 2021-06-25 09:33:54, Steven Rostedt wrote:
> On Thu, 24 Jun 2021 13:17:48 +0206
> John Ogness <john.ogness@linutronix.de> wrote:
> 
> > +	 * @syslog_lock is held when entering the read loop to prevent
> > +	 * another reader from modifying @syslog_seq.
> 
> You should add to the above comment:
> 
> 	 * And the @syslog_lock is released before exiting the loop.
> 
> Because it's not normal to enter a loop locked, and have it unlocked
> when exiting the loop. And I can envision in the future, someone might
> add a break (for error) while still holding the lock.

I was double checking the code and the locking is really hard to
follow. I would if the following approach make it easier. The main
trick is that the lock is taken at the beginnig and release at
the end. It is only temporary released around a single line
when needed.

static int syslog_print(char __user *buf, int size)
{
	struct printk_info info;
	struct printk_record r;
	char *text;
	int len = 0;
	u64 seq;

	text = kmalloc(CONSOLE_LOG_MAX, GFP_KERNEL);
	if (!text)
		return -ENOMEM;

	prb_rec_init_rd(&r, &info, text, CONSOLE_LOG_MAX);

	mutex_lock(&syslog_lock);

	/*
	 * Wait for the @syslog_seq record to be vailable. @syslog_seq may
	 * change while waiting.
	 */
	do {
		seq = syslog_seq;

		mutex_unlock(&syslog_lock);
		len = wait_event_interruptible(log_wait, prb_read_valid(prb, seq, NULL));
		mutex_lock(&syslog_lock);

		if (len)
			goto out;
	} while (syslog_seq != seq);

	/*
	 * Copy records that fit into the buffer. The above cycle makes sure
	 * that the first record is always available.
	 */
	do {
		size_t n;
		size_t skip;
		unsigned long err;

		if (!prb_read_valid(prb, syslog_seq, &r))
			break;

		if (r.info->seq != syslog_seq) {
			/* message is gone, move to next valid one */
			syslog_seq = r.info->seq;
			syslog_partial = 0;
		}

		/*
		 * To keep reading/counting partial line consistent,
		 * use printk_time value as of the beginning of a line.
		 */
		if (!syslog_partial)
			syslog_time = printk_time;

		skip = syslog_partial;
		n = record_print_text(&r, true, syslog_time);
		if (n - syslog_partial <= size) {
			/* message fits into buffer, move forward */
			syslog_seq = r.info->seq + 1;
			n -= syslog_partial;
			syslog_partial = 0;
		} else if (!len){
			/* partial read(), remember position */
			n = size;
			syslog_partial += n;
		} else
			n = 0;

		if (!n)
			break;

		mutex_unlock(&syslog_lock);
		err = copy_to_user(buf, text + skip, n);
		mutex_lock(&syslog_lock);

		if (err && !len) {
			len = -EFAULT;
			break;
		}

		len += n;
		size -= n;
		buf += n;
	} while(size);
out:
	mutex_unlock(&syslog_lock);
	kfree(text);
	return len;
}

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH printk v3 6/6] printk: syslog: close window between wait and read
  2021-06-28 14:35     ` Petr Mladek
@ 2021-06-28 14:52       ` Steven Rostedt
  2021-06-28 15:00         ` John Ogness
  0 siblings, 1 reply; 23+ messages in thread
From: Steven Rostedt @ 2021-06-28 14:52 UTC (permalink / raw)
  To: Petr Mladek
  Cc: John Ogness, Sergey Senozhatsky, Thomas Gleixner, linux-kernel

On Mon, 28 Jun 2021 16:35:48 +0200
Petr Mladek <pmladek@suse.com> wrote:

> I was double checking the code and the locking is really hard to
> follow. I would if the following approach make it easier. The main
> trick is that the lock is taken at the beginnig and release at
> the end. It is only temporary released around a single line
> when needed.
> 
> static int syslog_print(char __user *buf, int size)
> {
> 	struct printk_info info;
> 	struct printk_record r;
> 	char *text;
> 	int len = 0;
> 	u64 seq;
> 
> 	text = kmalloc(CONSOLE_LOG_MAX, GFP_KERNEL);
> 	if (!text)
> 		return -ENOMEM;
> 
> 	prb_rec_init_rd(&r, &info, text, CONSOLE_LOG_MAX);
> 
> 	mutex_lock(&syslog_lock);
> 
> 	/*
> 	 * Wait for the @syslog_seq record to be vailable. @syslog_seq may
> 	 * change while waiting.
> 	 */
> 	do {
> 		seq = syslog_seq;
> 
> 		mutex_unlock(&syslog_lock);
> 		len = wait_event_interruptible(log_wait, prb_read_valid(prb, seq, NULL));
> 		mutex_lock(&syslog_lock);
> 
> 		if (len)
> 			goto out;
> 	} while (syslog_seq != seq);
> 
> 	/*
> 	 * Copy records that fit into the buffer. The above cycle makes sure
> 	 * that the first record is always available.
> 	 */
> 	do {
> 		size_t n;
> 		size_t skip;
> 		unsigned long err;
> 
> 		if (!prb_read_valid(prb, syslog_seq, &r))
> 			break;
> 
> 		if (r.info->seq != syslog_seq) {
> 			/* message is gone, move to next valid one */
> 			syslog_seq = r.info->seq;
> 			syslog_partial = 0;
> 		}
> 
> 		/*
> 		 * To keep reading/counting partial line consistent,
> 		 * use printk_time value as of the beginning of a line.
> 		 */
> 		if (!syslog_partial)
> 			syslog_time = printk_time;
> 
> 		skip = syslog_partial;
> 		n = record_print_text(&r, true, syslog_time);
> 		if (n - syslog_partial <= size) {
> 			/* message fits into buffer, move forward */
> 			syslog_seq = r.info->seq + 1;
> 			n -= syslog_partial;
> 			syslog_partial = 0;
> 		} else if (!len){
> 			/* partial read(), remember position */
> 			n = size;
> 			syslog_partial += n;
> 		} else
> 			n = 0;
> 
> 		if (!n)
> 			break;
> 
> 		mutex_unlock(&syslog_lock);
> 		err = copy_to_user(buf, text + skip, n);
> 		mutex_lock(&syslog_lock);
> 
> 		if (err && !len) {
> 			len = -EFAULT;
> 			break;
> 		}
> 
> 		len += n;
> 		size -= n;
> 		buf += n;
> 	} while(size);
> out:
> 	mutex_unlock(&syslog_lock);
> 	kfree(text);
> 	return len;
> }

That's a much more common approach to locking, that may not be as
efficient, but is much easier to keep straight, and less error prone.

-- Steve

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH printk v3 6/6] printk: syslog: close window between wait and read
  2021-06-28 14:52       ` Steven Rostedt
@ 2021-06-28 15:00         ` John Ogness
  0 siblings, 0 replies; 23+ messages in thread
From: John Ogness @ 2021-06-28 15:00 UTC (permalink / raw)
  To: Steven Rostedt, Petr Mladek
  Cc: Sergey Senozhatsky, Thomas Gleixner, linux-kernel

On 2021-06-28, Steven Rostedt <rostedt@goodmis.org> wrote:
> Petr Mladek <pmladek@suse.com> wrote:
>> I was double checking the code and the locking is really hard to
>> follow. I would if the following approach make it easier. The main
>> trick is that the lock is taken at the beginnig and release at
>> the end. It is only temporary released around a single line
>> when needed.
>
> That's a much more common approach to locking, that may not be as
> efficient, but is much easier to keep straight, and less error prone.

OK. I will use this approach for v4. Thank you.

John

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2021-06-28 15:30 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-24 11:11 [PATCH printk v3 0/6] printk: remove safe buffers John Ogness
2021-06-24 11:11 ` [PATCH printk v3 1/6] lib/nmi_backtrace: explicitly serialize banner and regs John Ogness
2021-06-24 12:26   ` Petr Mladek
2021-06-24 11:11 ` [PATCH printk v3 2/6] printk: track/limit recursion John Ogness
2021-06-24 12:55   ` Petr Mladek
2021-06-24 11:11 ` [PATCH printk v3 3/6] printk: remove safe buffers John Ogness
2021-06-24 14:49   ` Petr Mladek
2021-06-24 15:35     ` John Ogness
2021-06-25 12:41       ` Petr Mladek
2021-06-24 11:11 ` [PATCH printk v3 4/6] printk: remove NMI tracking John Ogness
2021-06-25 12:36   ` Petr Mladek
2021-06-25 13:34     ` Russell King (Oracle)
2021-06-24 11:11 ` [PATCH printk v3 5/6] printk: convert @syslog_lock to mutex John Ogness
2021-06-24 11:11 ` [PATCH printk v3 6/6] printk: syslog: close window between wait and read John Ogness
2021-06-24 14:57   ` Petr Mladek
2021-06-24 15:25   ` Petr Mladek
2021-06-25  8:11     ` John Ogness
2021-06-25 14:55       ` Petr Mladek
2021-06-25 13:33   ` Steven Rostedt
2021-06-25 14:14     ` John Ogness
2021-06-28 14:35     ` Petr Mladek
2021-06-28 14:52       ` Steven Rostedt
2021-06-28 15:00         ` John Ogness

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).