All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH printk v2 00/11] wire up nbcon atomic printing
@ 2023-09-19 23:08 John Ogness
  2023-09-19 23:08 ` [PATCH printk v2 01/11] printk: Make console_is_usable() available to nbcon John Ogness
                   ` (10 more replies)
  0 siblings, 11 replies; 44+ messages in thread
From: John Ogness @ 2023-09-19 23:08 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman, Kees Cook, Luis Chamberlain,
	Andrew Morton, Peter Zijlstra, Josh Poimboeuf, Andy Shevchenko,
	Guilherme G. Piccoli, Arnd Bergmann, Paul E. McKenney,
	Frederic Weisbecker, Neeraj Upadhyay, Joel Fernandes,
	Josh Triplett, Boqun Feng, Mathieu Desnoyers, Lai Jiangshan,
	Zqiang, rcu, Ingo Molnar, Will Deacon, Waiman Long

Hi,

This is v2 of a series to wire up the nbcon consoles so that
they actually perform atomic printing. This series is only a
subset of the original v1 [0]. In particular, this series
represents patches 14 and 16-18 of the v1 series. For
information about the motivation of the atomic consoles,
please read the cover letter of v1.

This series focuses on providing the functionality and marking
of atomic printing sections as well as wiring up the nbcon
atomic printing to the general console_flush_all() function.
This series does _not_ include threaded printing or nbcon
drivers. Those features will be added in separate follow-up
series.

Note that the behavior of atomic printing in priority-elevated
atomic printing sections differs from the legacy consoles. With
atomic printing, the full set of urgent messages
(WARN/OOPS/PANIC) are first stored into the ringbuffer and then
afterwards flushed. This makes sure the full backtraces are
available in the ringbuffer, even if they do not make it out to
the console(s). This is in accordance with what was discussed
at LPC 2022 [1].

A lot has changed since v1 and the patches no longer correlate
1:1. Here is an attempt to list the changes:

- Rather than flushing the full ringbuffer with one nbcon
  console before moving to the next nbcon console, adapt the
  same flushing strategy as the legacy consoles: rotation
  through the consoles, one record at a time.

- Introduce nbcon_atomic_emit_one() to perform the "lock, emit
  one record, unlock" pattern. (This is only a helper
  function.)

- Introduce nbcon_console_emit_next_record() to act as the
  nbcon variant of console_emit_next_record(). This allows
  straight forward integration of nbcon consoles into
  console_flush_all().

- nbcon_atomic_flush() no longer takes any arguments. These
  were awkward for the caller. The context object is now
  hidden from the caller.

- Introduce __nbcon_atomic_flush_all() as an internal helper
  function in order to hide the ability to attempt unsafe
  hostile takeovers from outside nbcon.c.

- For printk_trigger_flush(), migration is disabled instead of
  preemption.

- Add atomic write enforcement to oops and lockdep.

- Comments and kerneldoc updated.

John Ogness

[0] https://lore.kernel.org/lkml/20230302195618.156940-1-john.ogness@linutronix.de

[1] https://lore.kernel.org/lkml/875yheqh6v.fsf@jogness.linutronix.de

John Ogness (7):
  printk: Make console_is_usable() available to nbcon
  printk: Let console_is_usable() handle nbcon
  printk: Add @flags argument for console_is_usable()
  printk: nbcon: Wire up nbcon into console_flush_all()
  panic: Add atomic write enforcement to oops
  rcu: Add atomic write enforcement for rcu stalls
  lockdep: Add atomic write enforcement for lockdep splats

Thomas Gleixner (4):
  printk: nbcon: Provide functions to mark atomic write sections
  printk: nbcon: Provide function for atomic flushing
  printk: nbcon: Wire up nbcon console atomic flushing
  panic: Add atomic write enforcement to warn/panic

 include/linux/console.h  |   4 +
 include/linux/printk.h   |   6 +
 kernel/locking/lockdep.c |   7 ++
 kernel/panic.c           |  66 +++++++++++
 kernel/printk/internal.h |  37 ++++++
 kernel/printk/nbcon.c    | 246 ++++++++++++++++++++++++++++++++++++++-
 kernel/printk/printk.c   |  67 +++++------
 kernel/rcu/tree_stall.h  |   6 +
 8 files changed, 396 insertions(+), 43 deletions(-)


base-commit: 9757acd0a700ba4a0d16dde4ba820eb052aba1a7
-- 
2.39.2


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH printk v2 01/11] printk: Make console_is_usable() available to nbcon
  2023-09-19 23:08 [PATCH printk v2 00/11] wire up nbcon atomic printing John Ogness
@ 2023-09-19 23:08 ` John Ogness
  2023-09-22  8:33   ` Petr Mladek
  2023-09-19 23:08 ` [PATCH printk v2 02/11] printk: Let console_is_usable() handle nbcon John Ogness
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 44+ messages in thread
From: John Ogness @ 2023-09-19 23:08 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel

Move console_is_usable() as-is into internal.h so that it can
be used by nbcon printing functions as well.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
---
 kernel/printk/internal.h | 32 ++++++++++++++++++++++++++++++++
 kernel/printk/printk.c   | 30 ------------------------------
 2 files changed, 32 insertions(+), 30 deletions(-)

diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h
index 6c2afee5ef62..6e8c1b02adae 100644
--- a/kernel/printk/internal.h
+++ b/kernel/printk/internal.h
@@ -78,6 +78,36 @@ bool nbcon_alloc(struct console *con);
 void nbcon_init(struct console *con);
 void nbcon_free(struct console *con);
 
+/*
+ * Check if the given console is currently capable and allowed to print
+ * records.
+ *
+ * Requires the console_srcu_read_lock.
+ */
+static inline bool console_is_usable(struct console *con)
+{
+	short flags = console_srcu_read_flags(con);
+
+	if (!(flags & CON_ENABLED))
+		return false;
+
+	if ((flags & CON_SUSPENDED))
+		return false;
+
+	if (!con->write)
+		return false;
+
+	/*
+	 * Console drivers may assume that per-cpu resources have been
+	 * allocated. So unless they're explicitly marked as being able to
+	 * cope (CON_ANYTIME) don't call them until this CPU is officially up.
+	 */
+	if (!cpu_online(raw_smp_processor_id()) && !(flags & CON_ANYTIME))
+		return false;
+
+	return true;
+}
+
 #else
 
 #define PRINTK_PREFIX_MAX	0
@@ -99,6 +129,8 @@ static inline bool nbcon_alloc(struct console *con) { return false; }
 static inline void nbcon_init(struct console *con) { }
 static inline void nbcon_free(struct console *con) { }
 
+static inline bool console_is_usable(struct console *con) { return false; }
+
 #endif /* CONFIG_PRINTK */
 
 extern struct printk_buffers printk_shared_pbufs;
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 778359b21761..b6cfae75a574 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2679,36 +2679,6 @@ int is_console_locked(void)
 }
 EXPORT_SYMBOL(is_console_locked);
 
-/*
- * Check if the given console is currently capable and allowed to print
- * records.
- *
- * Requires the console_srcu_read_lock.
- */
-static inline bool console_is_usable(struct console *con)
-{
-	short flags = console_srcu_read_flags(con);
-
-	if (!(flags & CON_ENABLED))
-		return false;
-
-	if ((flags & CON_SUSPENDED))
-		return false;
-
-	if (!con->write)
-		return false;
-
-	/*
-	 * Console drivers may assume that per-cpu resources have been
-	 * allocated. So unless they're explicitly marked as being able to
-	 * cope (CON_ANYTIME) don't call them until this CPU is officially up.
-	 */
-	if (!cpu_online(raw_smp_processor_id()) && !(flags & CON_ANYTIME))
-		return false;
-
-	return true;
-}
-
 static void __console_unlock(void)
 {
 	console_locked = 0;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH printk v2 02/11] printk: Let console_is_usable() handle nbcon
  2023-09-19 23:08 [PATCH printk v2 00/11] wire up nbcon atomic printing John Ogness
  2023-09-19 23:08 ` [PATCH printk v2 01/11] printk: Make console_is_usable() available to nbcon John Ogness
@ 2023-09-19 23:08 ` John Ogness
  2023-09-22  8:37   ` Petr Mladek
  2023-09-19 23:08 ` [PATCH printk v2 03/11] printk: Add @flags argument for console_is_usable() John Ogness
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 44+ messages in thread
From: John Ogness @ 2023-09-19 23:08 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel

The nbcon consoles use a different printing callback. For nbcon
consoles, check for the write_atomic() callback instead of
write().

Signed-off-by: John Ogness <john.ogness@linutronix.de>
---
 kernel/printk/internal.h | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h
index 6e8c1b02adae..69c8861be92c 100644
--- a/kernel/printk/internal.h
+++ b/kernel/printk/internal.h
@@ -80,6 +80,8 @@ void nbcon_free(struct console *con);
 
 /*
  * Check if the given console is currently capable and allowed to print
+ * records. Note that this function does not consider the current context,
+ * which can also play a role in deciding if @con can be used to print
  * records.
  *
  * Requires the console_srcu_read_lock.
@@ -94,8 +96,13 @@ static inline bool console_is_usable(struct console *con)
 	if ((flags & CON_SUSPENDED))
 		return false;
 
-	if (!con->write)
-		return false;
+	if (flags & CON_NBCON) {
+		if (!con->write_atomic)
+			return false;
+	} else {
+		if (!con->write)
+			return false;
+	}
 
 	/*
 	 * Console drivers may assume that per-cpu resources have been
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH printk v2 03/11] printk: Add @flags argument for console_is_usable()
  2023-09-19 23:08 [PATCH printk v2 00/11] wire up nbcon atomic printing John Ogness
  2023-09-19 23:08 ` [PATCH printk v2 01/11] printk: Make console_is_usable() available to nbcon John Ogness
  2023-09-19 23:08 ` [PATCH printk v2 02/11] printk: Let console_is_usable() handle nbcon John Ogness
@ 2023-09-19 23:08 ` John Ogness
  2023-09-22  8:41   ` Petr Mladek
  2023-09-19 23:08 ` [PATCH printk v2 04/11] printk: nbcon: Provide functions to mark atomic write sections John Ogness
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 44+ messages in thread
From: John Ogness @ 2023-09-19 23:08 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel

The caller of console_is_usable() usually needs @console->flags
for its own checks. Rather than having console_is_usable() read
its own copy, make the caller pass in the @flags. This also
ensures that the caller saw the same @flags value.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
---
 kernel/printk/internal.h | 8 ++------
 kernel/printk/printk.c   | 5 +++--
 2 files changed, 5 insertions(+), 8 deletions(-)

diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h
index 69c8861be92c..6780911fa8f2 100644
--- a/kernel/printk/internal.h
+++ b/kernel/printk/internal.h
@@ -83,13 +83,9 @@ void nbcon_free(struct console *con);
  * records. Note that this function does not consider the current context,
  * which can also play a role in deciding if @con can be used to print
  * records.
- *
- * Requires the console_srcu_read_lock.
  */
-static inline bool console_is_usable(struct console *con)
+static inline bool console_is_usable(struct console *con, short flags)
 {
-	short flags = console_srcu_read_flags(con);
-
 	if (!(flags & CON_ENABLED))
 		return false;
 
@@ -136,7 +132,7 @@ static inline bool nbcon_alloc(struct console *con) { return false; }
 static inline void nbcon_init(struct console *con) { }
 static inline void nbcon_free(struct console *con) { }
 
-static inline bool console_is_usable(struct console *con) { return false; }
+static inline bool console_is_usable(struct console *con, short flags) { return false; }
 
 #endif /* CONFIG_PRINTK */
 
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index b6cfae75a574..6ef33cefa4d0 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2929,9 +2929,10 @@ static bool console_flush_all(bool do_cond_resched, u64 *next_seq, bool *handove
 
 		cookie = console_srcu_read_lock();
 		for_each_console_srcu(con) {
+			short flags = console_srcu_read_flags(con);
 			bool progress;
 
-			if (!console_is_usable(con))
+			if (!console_is_usable(con, flags))
 				continue;
 			any_usable = true;
 
@@ -3754,7 +3755,7 @@ static bool __pr_flush(struct console *con, int timeout_ms, bool reset_on_progre
 			 * that they make forward progress, so only increment
 			 * @diff for usable consoles.
 			 */
-			if (!console_is_usable(c))
+			if (!console_is_usable(c, flags))
 				continue;
 
 			if (flags & CON_NBCON) {
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH printk v2 04/11] printk: nbcon: Provide functions to mark atomic write sections
  2023-09-19 23:08 [PATCH printk v2 00/11] wire up nbcon atomic printing John Ogness
                   ` (2 preceding siblings ...)
  2023-09-19 23:08 ` [PATCH printk v2 03/11] printk: Add @flags argument for console_is_usable() John Ogness
@ 2023-09-19 23:08 ` John Ogness
  2023-09-22  9:33   ` Petr Mladek
  2023-09-19 23:08 ` [PATCH printk v2 05/11] printk: nbcon: Provide function for atomic flushing John Ogness
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 44+ messages in thread
From: John Ogness @ 2023-09-19 23:08 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman

From: Thomas Gleixner <tglx@linutronix.de>

WARN/OOPS/PANIC require printing out immediately since the
regular printing method (and in the future, the printing
threads) might not be able to run.

Add per-CPU state to denote the priority/urgency of the output
and provide functions to mark the beginning and end of sections
where the urgent messages are generated.

Note that when a CPU is in a priority elevated state, flushing
only occurs when dropping back to a lower priority. This allows
the full set of printk records (WARN/OOPS/PANIC output) to be
stored in the ringbuffer before beginning to flush the backlog.

Co-developed-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Thomas Gleixner (Intel) <tglx@linutronix.de>
---
 include/linux/console.h |  4 ++
 kernel/printk/nbcon.c   | 89 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 93 insertions(+)

diff --git a/include/linux/console.h b/include/linux/console.h
index e4fc6f7c1496..25a3ddd39083 100644
--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -452,10 +452,14 @@ static inline bool console_is_registered(const struct console *con)
 	hlist_for_each_entry(con, &console_list, node)
 
 #ifdef CONFIG_PRINTK
+extern enum nbcon_prio nbcon_atomic_enter(enum nbcon_prio prio);
+extern void nbcon_atomic_exit(enum nbcon_prio prio, enum nbcon_prio prev_prio);
 extern bool nbcon_can_proceed(struct nbcon_write_context *wctxt);
 extern bool nbcon_enter_unsafe(struct nbcon_write_context *wctxt);
 extern bool nbcon_exit_unsafe(struct nbcon_write_context *wctxt);
 #else
+static inline enum nbcon_prio nbcon_atomic_enter(enum nbcon_prio prio) { return NBCON_PRIO_NONE; }
+static inline void nbcon_atomic_exit(enum nbcon_prio prio, enum nbcon_prio prev_prio) { }
 static inline bool nbcon_can_proceed(struct nbcon_write_context *wctxt) { return false; }
 static inline bool nbcon_enter_unsafe(struct nbcon_write_context *wctxt) { return false; }
 static inline bool nbcon_exit_unsafe(struct nbcon_write_context *wctxt) { return false; }
diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c
index b96077152f49..9359906b575b 100644
--- a/kernel/printk/nbcon.c
+++ b/kernel/printk/nbcon.c
@@ -961,6 +961,95 @@ static bool nbcon_emit_next_record(struct nbcon_write_context *wctxt)
 	return nbcon_context_exit_unsafe(ctxt);
 }
 
+/**
+ * struct nbcon_cpu_state - Per CPU printk context state
+ * @prio:	The current context priority level
+ * @nesting:	Per priority nest counter
+ */
+struct nbcon_cpu_state {
+	enum nbcon_prio		prio;
+	int			nesting[NBCON_PRIO_MAX];
+};
+
+static DEFINE_PER_CPU(struct nbcon_cpu_state, nbcon_pcpu_state);
+static struct nbcon_cpu_state early_nbcon_pcpu_state __initdata;
+
+/**
+ * nbcon_get_cpu_state - Get the per CPU console state pointer
+ *
+ * Returns either a pointer to the per CPU state of the current CPU or to
+ * the init data state during early boot.
+ */
+static __ref struct nbcon_cpu_state *nbcon_get_cpu_state(void)
+{
+	if (!printk_percpu_data_ready())
+		return &early_nbcon_pcpu_state;
+
+	return this_cpu_ptr(&nbcon_pcpu_state);
+}
+
+/**
+ * nbcon_atomic_enter - Enter a context that enforces atomic printing
+ * @prio:	Priority of the context
+ *
+ * Return:	The previous priority that needs to be fed into
+ *		the corresponding nbcon_atomic_exit()
+ * Context:	Any context. Disables migration.
+ */
+enum nbcon_prio nbcon_atomic_enter(enum nbcon_prio prio)
+{
+	struct nbcon_cpu_state *cpu_state;
+	enum nbcon_prio prev_prio;
+
+	migrate_disable();
+
+	cpu_state = nbcon_get_cpu_state();
+
+	prev_prio = cpu_state->prio;
+	if (prio > prev_prio)
+		cpu_state->prio = prio;
+
+	/*
+	 * Increment the nesting on @cpu_state->prio (instead of
+	 * @prio) so that a WARN() nested within a panic printout
+	 * does not attempt to scribble state.
+	 */
+	cpu_state->nesting[cpu_state->prio]++;
+
+	return prev_prio;
+}
+
+/**
+ * nbcon_atomic_exit - Exit a context that enforces atomic printing
+ * @prio:	Priority of the context to leave
+ * @prev_prio:	Priority of the previous context for restore
+ *
+ * Context:	Any context. Enables migration.
+ *
+ * @prev_prio is the priority returned by the corresponding
+ * nbcon_atomic_enter().
+ */
+void nbcon_atomic_exit(enum nbcon_prio prio, enum nbcon_prio prev_prio)
+{
+	struct nbcon_cpu_state *cpu_state;
+
+	cpu_state = nbcon_get_cpu_state();
+
+	/*
+	 * Undo the nesting of nbcon_atomic_enter() at the CPU state
+	 * priority.
+	 */
+	cpu_state->nesting[cpu_state->prio]--;
+
+	/*
+	 * Restore the previous priority, which was returned by
+	 * nbcon_atomic_enter().
+	 */
+	cpu_state->prio = prev_prio;
+
+	migrate_enable();
+}
+
 /**
  * nbcon_alloc - Allocate buffers needed by the nbcon console
  * @con:	Console to allocate buffers for
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH printk v2 05/11] printk: nbcon: Provide function for atomic flushing
  2023-09-19 23:08 [PATCH printk v2 00/11] wire up nbcon atomic printing John Ogness
                   ` (3 preceding siblings ...)
  2023-09-19 23:08 ` [PATCH printk v2 04/11] printk: nbcon: Provide functions to mark atomic write sections John Ogness
@ 2023-09-19 23:08 ` John Ogness
  2023-09-22 12:32   ` Petr Mladek
  2023-09-19 23:08 ` [PATCH printk v2 06/11] printk: nbcon: Wire up nbcon console " John Ogness
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 44+ messages in thread
From: John Ogness @ 2023-09-19 23:08 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel

From: Thomas Gleixner <tglx@linutronix.de>

Provide nbcon_atomic_flush() to perform atomic write flushing
of all registered nbcon consoles. Like with legacy consoles,
the nbcon consoles are flushed one record per console. This
allows all nbcon consoles to generate pseudo-simultaneously,
rather than one console waiting for the full ringbuffer to
dump to another console before printing anything.

Note that if the current CPU is in a nested elevated priority
state (EMERGENCY/PANIC), nbcon_atomic_flush() does nothing.
This is in case the printing itself generates urgent messages
(OOPS/WARN/PANIC), that those messages are fully stored into
the ringbuffer before any printing resumes.

Co-developed-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Thomas Gleixner (Intel) <tglx@linutronix.de>
---
 include/linux/printk.h |   6 +++
 kernel/printk/nbcon.c  | 101 ++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 105 insertions(+), 2 deletions(-)

diff --git a/include/linux/printk.h b/include/linux/printk.h
index 8ef499ab3c1e..58e5f34d6df1 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -192,6 +192,7 @@ void show_regs_print_info(const char *log_lvl);
 extern asmlinkage void dump_stack_lvl(const char *log_lvl) __cold;
 extern asmlinkage void dump_stack(void) __cold;
 void printk_trigger_flush(void);
+extern void nbcon_atomic_flush_all(void);
 #else
 static inline __printf(1, 0)
 int vprintk(const char *s, va_list args)
@@ -271,6 +272,11 @@ static inline void dump_stack(void)
 static inline void printk_trigger_flush(void)
 {
 }
+
+static inline void nbcon_atomic_flush_all(void)
+{
+}
+
 #endif
 
 #ifdef CONFIG_SMP
diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c
index 9359906b575b..2a9ff18fc78c 100644
--- a/kernel/printk/nbcon.c
+++ b/kernel/printk/nbcon.c
@@ -571,7 +571,6 @@ static struct printk_buffers panic_nbcon_pbufs;
  * in an unsafe state. Otherwise, on success the caller may assume
  * the console is not in an unsafe state.
  */
-__maybe_unused
 static bool nbcon_context_try_acquire(struct nbcon_context *ctxt)
 {
 	unsigned int cpu = smp_processor_id();
@@ -873,7 +872,6 @@ EXPORT_SYMBOL_GPL(nbcon_exit_unsafe);
  * When true is returned, @wctxt->ctxt.backlog indicates whether there are
  * still records pending in the ringbuffer,
  */
-__maybe_unused
 static bool nbcon_emit_next_record(struct nbcon_write_context *wctxt)
 {
 	struct nbcon_context *ctxt = &ACCESS_PRIVATE(wctxt, ctxt);
@@ -988,6 +986,105 @@ static __ref struct nbcon_cpu_state *nbcon_get_cpu_state(void)
 	return this_cpu_ptr(&nbcon_pcpu_state);
 }
 
+/**
+ * nbcon_atomic_emit_one - Print one record for a console in atomic mode
+ * @wctxt:			An initialized write context struct to use
+ *				for this context
+ *
+ * Returns false if the given console could not print a record or there are
+ * no more records to print, otherwise true.
+ *
+ * This is an internal helper to handle the locking of the console before
+ * calling nbcon_emit_next_record().
+ */
+static bool nbcon_atomic_emit_one(struct nbcon_write_context *wctxt)
+{
+	struct nbcon_context *ctxt = &ACCESS_PRIVATE(wctxt, ctxt);
+
+	if (!nbcon_context_try_acquire(ctxt))
+		return false;
+
+	/*
+	 * nbcon_emit_next_record() returns false when the console was
+	 * handed over or taken over. In both cases the context is no
+	 * longer valid.
+	 */
+	if (!nbcon_emit_next_record(wctxt))
+		return false;
+
+	nbcon_context_release(ctxt);
+
+	return prb_read_valid(prb, ctxt->seq, NULL);
+}
+
+/**
+ * __nbcon_atomic_flush_all - Flush all nbcon consoles in atomic mode
+ * @allow_unsafe_takeover:	True, to allow unsafe hostile takeovers
+ */
+static void __nbcon_atomic_flush_all(bool allow_unsafe_takeover)
+{
+	struct nbcon_write_context wctxt = { };
+	struct nbcon_context *ctxt = &ACCESS_PRIVATE(&wctxt, ctxt);
+	struct nbcon_cpu_state *cpu_state;
+	struct console *con;
+	bool any_progress;
+	int cookie;
+
+	cpu_state = nbcon_get_cpu_state();
+
+	/*
+	 * Let the outermost flush of this priority print. This avoids
+	 * nasty hackery for nested WARN() where the printing itself
+	 * generates one and ensures such nested messages are stored to
+	 * the ringbuffer before any printing resumes.
+	 *
+	 * cpu_state->prio <= NBCON_PRIO_NORMAL is not subject to nesting
+	 * and can proceed in order to allow any atomic printing for
+	 * regular kernel messages.
+	 */
+	if (cpu_state->prio > NBCON_PRIO_NORMAL &&
+	    cpu_state->nesting[cpu_state->prio] != 1)
+		return;
+
+	do {
+		any_progress = false;
+
+		cookie = console_srcu_read_lock();
+		for_each_console_srcu(con) {
+			short flags = console_srcu_read_flags(con);
+			bool progress;
+
+			if (!(flags & CON_NBCON))
+				continue;
+
+			if (!console_is_usable(con, flags))
+				continue;
+
+			memset(ctxt, 0, sizeof(*ctxt));
+			ctxt->console			= con;
+			ctxt->spinwait_max_us		= 2000;
+			ctxt->prio			= cpu_state->prio;
+			ctxt->allow_unsafe_takeover	= allow_unsafe_takeover;
+
+			progress = nbcon_atomic_emit_one(&wctxt);
+			if (!progress)
+				continue;
+			any_progress = true;
+		}
+		console_srcu_read_unlock(cookie);
+	} while (any_progress);
+}
+
+/**
+ * nbcon_atomic_flush_all - Flush all nbcon consoles in atomic mode
+ *
+ * Context:	Any context where migration is disabled.
+ */
+void nbcon_atomic_flush_all(void)
+{
+	__nbcon_atomic_flush_all(false);
+}
+
 /**
  * nbcon_atomic_enter - Enter a context that enforces atomic printing
  * @prio:	Priority of the context
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH printk v2 06/11] printk: nbcon: Wire up nbcon console atomic flushing
  2023-09-19 23:08 [PATCH printk v2 00/11] wire up nbcon atomic printing John Ogness
                   ` (4 preceding siblings ...)
  2023-09-19 23:08 ` [PATCH printk v2 05/11] printk: nbcon: Provide function for atomic flushing John Ogness
@ 2023-09-19 23:08 ` John Ogness
  2023-09-22 17:41   ` Petr Mladek
  2023-09-19 23:08 ` [PATCH printk v2 07/11] printk: nbcon: Wire up nbcon into console_flush_all() John Ogness
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 44+ messages in thread
From: John Ogness @ 2023-09-19 23:08 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel

From: Thomas Gleixner <tglx@linutronix.de>

Perform nbcon console atomic flushing at key call sites:

nbcon_atomic_exit() - When exiting from the outermost atomic
	printing section. If the priority was NBCON_PRIO_PANIC,
	attempt a second flush allowing unsafe hostile
	takeovers.

console_flush_on_panic() - Called from several call sites to
	trigger ringbuffer dumping in an urgent situation.

console_flush_on_panic() - Typically the panic() function will
	take care of atomic flushing the nbcon consoles on
	panic. However, there are several users of
	console_flush_on_panic() outside of panic().

printk_trigger_flush() - Used in urgent situations to trigger a
	dump in an irq_work context. However, the atomic
	flushing part happens in the calling context since an
	alternative context is not required.

Co-developed-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Thomas Gleixner (Intel) <tglx@linutronix.de>
---
 kernel/printk/nbcon.c  | 10 ++++++++++
 kernel/printk/printk.c |  6 ++++++
 2 files changed, 16 insertions(+)

diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c
index 2a9ff18fc78c..82e6a1678363 100644
--- a/kernel/printk/nbcon.c
+++ b/kernel/printk/nbcon.c
@@ -1092,6 +1092,11 @@ void nbcon_atomic_flush_all(void)
  * Return:	The previous priority that needs to be fed into
  *		the corresponding nbcon_atomic_exit()
  * Context:	Any context. Disables migration.
+ *
+ * When within an atomic printing section, no atomic printing occurs. This
+ * is to allow all emergency messages to be dumped into the ringbuffer before
+ * flushing the ringbuffer. The actual atomic printing occurs when exiting
+ * the outermost atomic printing section.
  */
 enum nbcon_prio nbcon_atomic_enter(enum nbcon_prio prio)
 {
@@ -1130,8 +1135,13 @@ void nbcon_atomic_exit(enum nbcon_prio prio, enum nbcon_prio prev_prio)
 {
 	struct nbcon_cpu_state *cpu_state;
 
+	__nbcon_atomic_flush_all(false);
+
 	cpu_state = nbcon_get_cpu_state();
 
+	if (cpu_state->prio == NBCON_PRIO_PANIC)
+		__nbcon_atomic_flush_all(true);
+
 	/*
 	 * Undo the nesting of nbcon_atomic_enter() at the CPU state
 	 * priority.
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 6ef33cefa4d0..419c0fabc481 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -3159,6 +3159,8 @@ void console_flush_on_panic(enum con_flush_mode mode)
 		console_srcu_read_unlock(cookie);
 	}
 
+	nbcon_atomic_flush_all();
+
 	console_flush_all(false, &next_seq, &handover);
 }
 
@@ -3903,6 +3905,10 @@ void defer_console_output(void)
 
 void printk_trigger_flush(void)
 {
+	migrate_disable();
+	nbcon_atomic_flush_all();
+	migrate_enable();
+
 	defer_console_output();
 }
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH printk v2 07/11] printk: nbcon: Wire up nbcon into console_flush_all()
  2023-09-19 23:08 [PATCH printk v2 00/11] wire up nbcon atomic printing John Ogness
                   ` (5 preceding siblings ...)
  2023-09-19 23:08 ` [PATCH printk v2 06/11] printk: nbcon: Wire up nbcon console " John Ogness
@ 2023-09-19 23:08 ` John Ogness
  2023-09-26 11:34   ` Petr Mladek
  2023-09-19 23:08 ` [PATCH printk v2 08/11] panic: Add atomic write enforcement to warn/panic John Ogness
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 44+ messages in thread
From: John Ogness @ 2023-09-19 23:08 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel

In atomic printing sections, the atomic nbcon consoles have
their own behavior of allowing all urgent messages to be
stored into the ringbuffer and then flushing afterwards.

However, the nbcon consoles must also emit messages when not
within atomic printing sections. For this, the existing
console_flush_all() function can be used.

Provide nbcon_console_emit_next_record(), which acts as the
nbcon variant of console_emit_next_record(). Call this variant
within console_flush_all() for nbcon consoles.

Note that when in an atomic printing section,
nbcon_console_emit_next_record() does nothing. This is because
atomic printing sections will handle the nbcon flushing when
exiting the outermost atomic printing section.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
---
 kernel/printk/internal.h |  2 ++
 kernel/printk/nbcon.c    | 46 ++++++++++++++++++++++++++++++++++++++++
 kernel/printk/printk.c   | 26 +++++++++++++++--------
 3 files changed, 65 insertions(+), 9 deletions(-)

diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h
index 6780911fa8f2..a3c6b5ce80e4 100644
--- a/kernel/printk/internal.h
+++ b/kernel/printk/internal.h
@@ -77,6 +77,7 @@ void nbcon_seq_force(struct console *con, u64 seq);
 bool nbcon_alloc(struct console *con);
 void nbcon_init(struct console *con);
 void nbcon_free(struct console *con);
+bool nbcon_console_emit_next_record(struct console *con);
 
 /*
  * Check if the given console is currently capable and allowed to print
@@ -131,6 +132,7 @@ static inline void nbcon_seq_force(struct console *con, u64 seq) { }
 static inline bool nbcon_alloc(struct console *con) { return false; }
 static inline void nbcon_init(struct console *con) { }
 static inline void nbcon_free(struct console *con) { }
+static bool nbcon_console_emit_next_record(struct console *con) { return false; }
 
 static inline bool console_is_usable(struct console *con, short flags) { return false; }
 
diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c
index 82e6a1678363..de473a1003d8 100644
--- a/kernel/printk/nbcon.c
+++ b/kernel/printk/nbcon.c
@@ -1017,6 +1017,52 @@ static bool nbcon_atomic_emit_one(struct nbcon_write_context *wctxt)
 	return prb_read_valid(prb, ctxt->seq, NULL);
 }
 
+/**
+ * nbcon_console_emit_next_record - Print one record for an nbcon console
+ *					in atomic mode
+ * @con:	The console to print on
+ *
+ * Return:	True if a record could be printed, otherwise false.
+ *
+ * This function is meant to be called by console_flush_all() to atomically
+ * print records on nbcon consoles. Essentially it is the nbcon version of
+ * console_emit_next_record().
+ *
+ * This function also returns false if the current CPU is in an elevated
+ * atomic priority state in order to allow the CPU to get all of the
+ * emergency messages into the ringbuffer first.
+ */
+bool nbcon_console_emit_next_record(struct console *con)
+{
+	struct nbcon_cpu_state *cpu_state;
+	bool progress = false;
+
+	migrate_disable();
+
+	cpu_state = nbcon_get_cpu_state();
+
+	/*
+	 * Atomic printing from console_flush_all() only occurs if this
+	 * CPU is not in an elevated atomic priority state. If it is, the
+	 * atomic printing will occur when this CPU exits that state. This
+	 * allows a set of emergency messages to be completely stored in
+	 * the ringbuffer before this CPU begins flushing.
+	 */
+	if (cpu_state->prio <= NBCON_PRIO_NORMAL) {
+		struct nbcon_write_context wctxt = { };
+		struct nbcon_context *ctxt = &ACCESS_PRIVATE(&wctxt, ctxt);
+
+		ctxt->console	= con;
+		ctxt->prio	= NBCON_PRIO_NORMAL;
+
+		progress = nbcon_atomic_emit_one(&wctxt);
+	}
+
+	migrate_enable();
+
+	return progress;
+}
+
 /**
  * __nbcon_atomic_flush_all - Flush all nbcon consoles in atomic mode
  * @allow_unsafe_takeover:	True, to allow unsafe hostile takeovers
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 419c0fabc481..e5e192318b8e 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2930,24 +2930,32 @@ static bool console_flush_all(bool do_cond_resched, u64 *next_seq, bool *handove
 		cookie = console_srcu_read_lock();
 		for_each_console_srcu(con) {
 			short flags = console_srcu_read_flags(con);
+			u64 printk_seq;
 			bool progress;
 
 			if (!console_is_usable(con, flags))
 				continue;
 			any_usable = true;
 
-			progress = console_emit_next_record(con, handover, cookie);
+			if (flags & CON_NBCON) {
+				progress = nbcon_console_emit_next_record(con);
+				printk_seq = nbcon_seq_read(con);
+			} else {
+				progress = console_emit_next_record(con, handover, cookie);
 
-			/*
-			 * If a handover has occurred, the SRCU read lock
-			 * is already released.
-			 */
-			if (*handover)
-				return false;
+				/*
+				 * If a handover has occurred, the SRCU read
+				 * lock is already released.
+				 */
+				if (*handover)
+					return false;
+
+				printk_seq = con->seq;
+			}
 
 			/* Track the next of the highest seq flushed. */
-			if (con->seq > *next_seq)
-				*next_seq = con->seq;
+			if (printk_seq > *next_seq)
+				*next_seq = printk_seq;
 
 			if (!progress)
 				continue;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH printk v2 08/11] panic: Add atomic write enforcement to warn/panic
  2023-09-19 23:08 [PATCH printk v2 00/11] wire up nbcon atomic printing John Ogness
                   ` (6 preceding siblings ...)
  2023-09-19 23:08 ` [PATCH printk v2 07/11] printk: nbcon: Wire up nbcon into console_flush_all() John Ogness
@ 2023-09-19 23:08 ` John Ogness
  2023-09-27 12:02   ` Petr Mladek
  2023-09-19 23:08 ` [PATCH printk v2 09/11] panic: Add atomic write enforcement to oops John Ogness
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 44+ messages in thread
From: John Ogness @ 2023-09-19 23:08 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Kees Cook, Luis Chamberlain, Andrew Morton,
	Peter Zijlstra, Josh Poimboeuf, Andy Shevchenko,
	Guilherme G. Piccoli, Arnd Bergmann

From: Thomas Gleixner <tglx@linutronix.de>

Invoke the atomic write enforcement functions for warn/panic to
ensure that the information gets out to the consoles.

For the panic case, add explicit intermediate atomic flush
calls to ensure immediate flushing at important points.
Otherwise the atomic flushing only occurs when dropping out of
the elevated priority, which for panic may never happen.

It is important to note that if there are any legacy consoles
registered, they will be attempting to directly print from the
printk-caller context, which may jeopardize the reliability of
the atomic consoles. Optimally there should be no legacy
consoles registered.

Co-developed-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Thomas Gleixner (Intel) <tglx@linutronix.de>
---
 kernel/panic.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/kernel/panic.c b/kernel/panic.c
index 07239d4ad81e..86ed71ba8c4d 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -275,6 +275,7 @@ static void panic_other_cpus_shutdown(bool crash_kexec)
  */
 void panic(const char *fmt, ...)
 {
+	enum nbcon_prio prev_prio;
 	static char buf[1024];
 	va_list args;
 	long i, i_next = 0, len;
@@ -322,6 +323,8 @@ void panic(const char *fmt, ...)
 	if (old_cpu != PANIC_CPU_INVALID && old_cpu != this_cpu)
 		panic_smp_self_stop();
 
+	prev_prio = nbcon_atomic_enter(NBCON_PRIO_PANIC);
+
 	console_verbose();
 	bust_spinlocks(1);
 	va_start(args, fmt);
@@ -382,6 +385,8 @@ void panic(const char *fmt, ...)
 	if (_crash_kexec_post_notifiers)
 		__crash_kexec(NULL);
 
+	nbcon_atomic_flush_all();
+
 	console_unblank();
 
 	/*
@@ -406,6 +411,7 @@ void panic(const char *fmt, ...)
 		 * We can't use the "normal" timers since we just panicked.
 		 */
 		pr_emerg("Rebooting in %d seconds..\n", panic_timeout);
+		nbcon_atomic_flush_all();
 
 		for (i = 0; i < panic_timeout * 1000; i += PANIC_TIMER_STEP) {
 			touch_nmi_watchdog();
@@ -424,6 +430,7 @@ void panic(const char *fmt, ...)
 		 */
 		if (panic_reboot_mode != REBOOT_UNDEFINED)
 			reboot_mode = panic_reboot_mode;
+		nbcon_atomic_flush_all();
 		emergency_restart();
 	}
 #ifdef __sparc__
@@ -436,12 +443,16 @@ void panic(const char *fmt, ...)
 	}
 #endif
 #if defined(CONFIG_S390)
+	nbcon_atomic_flush_all();
 	disabled_wait();
 #endif
 	pr_emerg("---[ end Kernel panic - not syncing: %s ]---\n", buf);
 
 	/* Do not scroll important messages printed above */
 	suppress_printk = 1;
+
+	nbcon_atomic_exit(NBCON_PRIO_PANIC, prev_prio);
+
 	local_irq_enable();
 	for (i = 0; ; i += PANIC_TIMER_STEP) {
 		touch_softlockup_watchdog();
@@ -652,6 +663,10 @@ struct warn_args {
 void __warn(const char *file, int line, void *caller, unsigned taint,
 	    struct pt_regs *regs, struct warn_args *args)
 {
+	enum nbcon_prio prev_prio;
+
+	prev_prio = nbcon_atomic_enter(NBCON_PRIO_EMERGENCY);
+
 	disable_trace_on_warning();
 
 	if (file)
@@ -682,6 +697,8 @@ void __warn(const char *file, int line, void *caller, unsigned taint,
 
 	/* Just a warning, don't kill lockdep. */
 	add_taint(taint, LOCKDEP_STILL_OK);
+
+	nbcon_atomic_exit(NBCON_PRIO_EMERGENCY, prev_prio);
 }
 
 #ifdef CONFIG_BUG
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH printk v2 09/11] panic: Add atomic write enforcement to oops
  2023-09-19 23:08 [PATCH printk v2 00/11] wire up nbcon atomic printing John Ogness
                   ` (7 preceding siblings ...)
  2023-09-19 23:08 ` [PATCH printk v2 08/11] panic: Add atomic write enforcement to warn/panic John Ogness
@ 2023-09-19 23:08 ` John Ogness
  2023-09-19 23:36   ` John Ogness
                     ` (2 more replies)
  2023-09-19 23:08 ` [PATCH printk v2 10/11] rcu: Add atomic write enforcement for rcu stalls John Ogness
  2023-09-19 23:08 ` [PATCH printk v2 11/11] lockdep: Add atomic write enforcement for lockdep splats John Ogness
  10 siblings, 3 replies; 44+ messages in thread
From: John Ogness @ 2023-09-19 23:08 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Kees Cook, Luis Chamberlain, Andrew Morton,
	Peter Zijlstra, Josh Poimboeuf, Arnd Bergmann,
	Guilherme G. Piccoli, Andy Shevchenko

Invoke the atomic write enforcement functions for oops to
ensure that the information gets out to the consoles.

Since there is no single general function that calls both
oops_enter() and oops_exit(), the nesting feature of atomic
write sections is taken advantage of in order to guarantee
full coverage between the first oops_enter() and the last
oops_exit().

It is important to note that if there are any legacy consoles
registered, they will be attempting to directly print from the
printk-caller context, which may jeopardize the reliability of
the atomic consoles. Optimally there should be no legacy
consoles registered.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
---
 kernel/panic.c | 49 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 49 insertions(+)

diff --git a/kernel/panic.c b/kernel/panic.c
index 86ed71ba8c4d..e2879098645d 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -614,6 +614,10 @@ bool oops_may_print(void)
 	return pause_on_oops_flag == 0;
 }
 
+static atomic_t oops_cpu = ATOMIC_INIT(-1);
+static int oops_nesting;
+static enum nbcon_prio oops_prev_prio;
+
 /*
  * Called when the architecture enters its oops handler, before it prints
  * anything.  If this is the first CPU to oops, and it's oopsing the first
@@ -630,6 +634,36 @@ bool oops_may_print(void)
  */
 void oops_enter(void)
 {
+	enum nbcon_prio prev_prio;
+	int cpu = -1;
+
+	/*
+	 * If this turns out to be the first CPU in oops, this is the
+	 * beginning of the outermost atomic section. Otherwise it is
+	 * the beginning of an inner atomic section.
+	 */
+	prev_prio = nbcon_atomic_enter(NBCON_PRIO_EMERGENCY);
+
+	if (atomic_try_cmpxchg_relaxed(&oops_cpu, &cpu, smp_processor_id())) {
+		/*
+		 * This is the first CPU in oops. Save the outermost
+		 * @prev_prio in order to restore it on the outermost
+		 * matching oops_exit(), when @oops_nesting == 0.
+		 */
+		oops_prev_prio = prev_prio;
+
+		/*
+		 * Enter an inner atomic section that ends at the end of this
+		 * function. In this case, the nbcon_atomic_enter() above
+		 * began the outermost atomic section.
+		 */
+		prev_prio = nbcon_atomic_enter(NBCON_PRIO_EMERGENCY);
+	}
+
+	/* Track nesting when this CPU is the owner. */
+	if (cpu == -1 || cpu == smp_processor_id())
+		oops_nesting++;
+
 	tracing_off();
 	/* can't trust the integrity of the kernel anymore: */
 	debug_locks_off();
@@ -637,6 +671,9 @@ void oops_enter(void)
 
 	if (sysctl_oops_all_cpu_backtrace)
 		trigger_all_cpu_backtrace();
+
+	/* Exit inner atomic section. */
+	nbcon_atomic_exit(NBCON_PRIO_EMERGENCY, prev_prio);
 }
 
 static void print_oops_end_marker(void)
@@ -652,6 +689,18 @@ void oops_exit(void)
 {
 	do_oops_enter_exit();
 	print_oops_end_marker();
+
+	if (atomic_read(&oops_cpu) == smp_processor_id()) {
+		oops_nesting--;
+		if (oops_nesting == 0) {
+			atomic_set(&oops_cpu, -1);
+
+			/* Exit outmost atomic section. */
+			nbcon_atomic_exit(NBCON_PRIO_EMERGENCY, oops_prev_prio);
+		}
+	}
+	put_cpu();
+
 	kmsg_dump(KMSG_DUMP_OOPS);
 }
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH printk v2 10/11] rcu: Add atomic write enforcement for rcu stalls
  2023-09-19 23:08 [PATCH printk v2 00/11] wire up nbcon atomic printing John Ogness
                   ` (8 preceding siblings ...)
  2023-09-19 23:08 ` [PATCH printk v2 09/11] panic: Add atomic write enforcement to oops John Ogness
@ 2023-09-19 23:08 ` John Ogness
  2023-09-27 15:00   ` Petr Mladek
  2023-09-19 23:08 ` [PATCH printk v2 11/11] lockdep: Add atomic write enforcement for lockdep splats John Ogness
  10 siblings, 1 reply; 44+ messages in thread
From: John Ogness @ 2023-09-19 23:08 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Paul E. McKenney, Frederic Weisbecker,
	Neeraj Upadhyay, Joel Fernandes, Josh Triplett, Boqun Feng,
	Mathieu Desnoyers, Lai Jiangshan, Zqiang, rcu

Invoke the atomic write enforcement functions for rcu stalls to
ensure that the information gets out to the consoles.

It is important to note that if there are any legacy consoles
registered, they will be attempting to directly print from the
printk-caller context, which may jeopardize the reliability of
the atomic consoles. Optimally there should be no legacy
consoles registered.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
---
 kernel/rcu/tree_stall.h | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
index 6f06dc12904a..0a58f8b233d8 100644
--- a/kernel/rcu/tree_stall.h
+++ b/kernel/rcu/tree_stall.h
@@ -8,6 +8,7 @@
  */
 
 #include <linux/kvm_para.h>
+#include <linux/console.h>
 
 //////////////////////////////////////////////////////////////////////////////
 //
@@ -582,6 +583,7 @@ static void rcu_check_gp_kthread_expired_fqs_timer(void)
 
 static void print_other_cpu_stall(unsigned long gp_seq, unsigned long gps)
 {
+	enum nbcon_prio prev_prio;
 	int cpu;
 	unsigned long flags;
 	unsigned long gpa;
@@ -597,6 +599,8 @@ static void print_other_cpu_stall(unsigned long gp_seq, unsigned long gps)
 	if (rcu_stall_is_suppressed())
 		return;
 
+	prev_prio = nbcon_atomic_enter(NBCON_PRIO_EMERGENCY);
+
 	/*
 	 * OK, time to rat on our buddy...
 	 * See Documentation/RCU/stallwarn.rst for info on how to debug
@@ -651,6 +655,8 @@ static void print_other_cpu_stall(unsigned long gp_seq, unsigned long gps)
 	panic_on_rcu_stall();
 
 	rcu_force_quiescent_state();  /* Kick them all. */
+
+	nbcon_atomic_exit(NBCON_PRIO_EMERGENCY, prev_prio);
 }
 
 static void print_cpu_stall(unsigned long gps)
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH printk v2 11/11] lockdep: Add atomic write enforcement for lockdep splats
  2023-09-19 23:08 [PATCH printk v2 00/11] wire up nbcon atomic printing John Ogness
                   ` (9 preceding siblings ...)
  2023-09-19 23:08 ` [PATCH printk v2 10/11] rcu: Add atomic write enforcement for rcu stalls John Ogness
@ 2023-09-19 23:08 ` John Ogness
  2023-09-29  8:31   ` Petr Mladek
  10 siblings, 1 reply; 44+ messages in thread
From: John Ogness @ 2023-09-19 23:08 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Peter Zijlstra, Ingo Molnar, Will Deacon,
	Waiman Long, Boqun Feng

Invoke the atomic write enforcement functions for lockdep
splats to ensure that the information gets out to the consoles.

It is important to note that if there are any legacy consoles
registered, they will be attempting to directly print from the
printk-caller context, which may jeopardize the reliability of
the atomic consoles. Optimally there should be no legacy
consoles registered.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
---
 kernel/locking/lockdep.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index e85b5ad3e206..5310a94e3efd 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -56,6 +56,7 @@
 #include <linux/kprobes.h>
 #include <linux/lockdep.h>
 #include <linux/context_tracking.h>
+#include <linux/console.h>
 
 #include <asm/sections.h>
 
@@ -3967,9 +3968,13 @@ static void
 print_usage_bug(struct task_struct *curr, struct held_lock *this,
 		enum lock_usage_bit prev_bit, enum lock_usage_bit new_bit)
 {
+	enum nbcon_prio prev_prio;
+
 	if (!debug_locks_off() || debug_locks_silent)
 		return;
 
+	prev_prio = nbcon_atomic_enter(NBCON_PRIO_EMERGENCY);
+
 	pr_warn("\n");
 	pr_warn("================================\n");
 	pr_warn("WARNING: inconsistent lock state\n");
@@ -3998,6 +4003,8 @@ print_usage_bug(struct task_struct *curr, struct held_lock *this,
 
 	pr_warn("\nstack backtrace:\n");
 	dump_stack();
+
+	nbcon_atomic_exit(NBCON_PRIO_EMERGENCY, prev_prio);
 }
 
 /*
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH printk v2 09/11] panic: Add atomic write enforcement to oops
  2023-09-19 23:08 ` [PATCH printk v2 09/11] panic: Add atomic write enforcement to oops John Ogness
@ 2023-09-19 23:36   ` John Ogness
  2023-09-20 13:28   ` Andy Shevchenko
  2023-09-27 13:15   ` Petr Mladek
  2 siblings, 0 replies; 44+ messages in thread
From: John Ogness @ 2023-09-19 23:36 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Kees Cook, Luis Chamberlain, Andrew Morton,
	Peter Zijlstra, Josh Poimboeuf, Arnd Bergmann,
	Guilherme G. Piccoli, Andy Shevchenko

On 2023-09-20, John Ogness <john.ogness@linutronix.de> wrote:
> diff --git a/kernel/panic.c b/kernel/panic.c
> index 86ed71ba8c4d..e2879098645d 100644
> --- a/kernel/panic.c
> +++ b/kernel/panic.c
> @@ -630,6 +634,36 @@ bool oops_may_print(void)
>   */
>  void oops_enter(void)
>  {
> +	enum nbcon_prio prev_prio;
> +	int cpu = -1;

oops_exit() calls put_cpu(). Here used to be the corresponding
get_cpu(). Somehow I managed to drop it. Here are the inline fixups.

int cur_cpu = get_cpu();
int old_cpu = -1;

> +
> +	/*
> +	 * If this turns out to be the first CPU in oops, this is the
> +	 * beginning of the outermost atomic section. Otherwise it is
> +	 * the beginning of an inner atomic section.
> +	 */
> +	prev_prio = nbcon_atomic_enter(NBCON_PRIO_EMERGENCY);
> +
> +	if (atomic_try_cmpxchg_relaxed(&oops_cpu, &cpu, smp_processor_id())) {

if (atomic_try_cmpxchg_relaxed(&oops_cpu, &old_cpu, cur_cpu)) {

> +		/*
> +		 * This is the first CPU in oops. Save the outermost
> +		 * @prev_prio in order to restore it on the outermost
> +		 * matching oops_exit(), when @oops_nesting == 0.
> +		 */
> +		oops_prev_prio = prev_prio;
> +
> +		/*
> +		 * Enter an inner atomic section that ends at the end of this
> +		 * function. In this case, the nbcon_atomic_enter() above
> +		 * began the outermost atomic section.
> +		 */
> +		prev_prio = nbcon_atomic_enter(NBCON_PRIO_EMERGENCY);
> +	}
> +
> +	/* Track nesting when this CPU is the owner. */
> +	if (cpu == -1 || cpu == smp_processor_id())

if (old_cpu == -1 || old_cpu == cur_cpu)

> +		oops_nesting++;
> +
>  	tracing_off();
>  	/* can't trust the integrity of the kernel anymore: */
>  	debug_locks_off();

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH printk v2 09/11] panic: Add atomic write enforcement to oops
  2023-09-19 23:08 ` [PATCH printk v2 09/11] panic: Add atomic write enforcement to oops John Ogness
  2023-09-19 23:36   ` John Ogness
@ 2023-09-20 13:28   ` Andy Shevchenko
  2023-09-20 14:20     ` John Ogness
  2023-09-27 13:15   ` Petr Mladek
  2 siblings, 1 reply; 44+ messages in thread
From: Andy Shevchenko @ 2023-09-20 13:28 UTC (permalink / raw)
  To: John Ogness
  Cc: Petr Mladek, Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Kees Cook, Luis Chamberlain, Andrew Morton,
	Peter Zijlstra, Josh Poimboeuf, Arnd Bergmann,
	Guilherme G. Piccoli

On Wed, Sep 20, 2023 at 01:14:54AM +0206, John Ogness wrote:
> Invoke the atomic write enforcement functions for oops to
> ensure that the information gets out to the consoles.
> 
> Since there is no single general function that calls both
> oops_enter() and oops_exit(), the nesting feature of atomic
> write sections is taken advantage of in order to guarantee
> full coverage between the first oops_enter() and the last
> oops_exit().
> 
> It is important to note that if there are any legacy consoles
> registered, they will be attempting to directly print from the
> printk-caller context, which may jeopardize the reliability of
> the atomic consoles. Optimally there should be no legacy
> consoles registered.

...

> +	if (atomic_read(&oops_cpu) == smp_processor_id()) {
> +		oops_nesting--;
> +		if (oops_nesting == 0) {
> +			atomic_set(&oops_cpu, -1);

Between read and set the variable can change, can't it?
If not, why this variable is atomic then? Or, why it's not a problem?
If the latter is the case, perhaps a comment to explain this?

> +			/* Exit outmost atomic section. */
> +			nbcon_atomic_exit(NBCON_PRIO_EMERGENCY, oops_prev_prio);
> +		}
> +	}
> +	put_cpu();

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH printk v2 09/11] panic: Add atomic write enforcement to oops
  2023-09-20 13:28   ` Andy Shevchenko
@ 2023-09-20 14:20     ` John Ogness
  2023-09-20 14:45       ` Andy Shevchenko
  0 siblings, 1 reply; 44+ messages in thread
From: John Ogness @ 2023-09-20 14:20 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: Petr Mladek, Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Kees Cook, Luis Chamberlain, Andrew Morton,
	Peter Zijlstra, Josh Poimboeuf, Arnd Bergmann,
	Guilherme G. Piccoli

On 2023-09-20, Andy Shevchenko <andriy.shevchenko@linux.intel.com> wrote:
> On Wed, Sep 20, 2023 at 01:14:54AM +0206, John Ogness wrote:
>> Invoke the atomic write enforcement functions for oops to
>> ensure that the information gets out to the consoles.
>> 
>> Since there is no single general function that calls both
>> oops_enter() and oops_exit(), the nesting feature of atomic
>> write sections is taken advantage of in order to guarantee
>> full coverage between the first oops_enter() and the last
>> oops_exit().
>> 
>> It is important to note that if there are any legacy consoles
>> registered, they will be attempting to directly print from the
>> printk-caller context, which may jeopardize the reliability of
>> the atomic consoles. Optimally there should be no legacy
>> consoles registered.
>
> ...
>
>> +	if (atomic_read(&oops_cpu) == smp_processor_id()) {
>> +		oops_nesting--;
>> +		if (oops_nesting == 0) {
>> +			atomic_set(&oops_cpu, -1);
>
> Between read and set the variable can change, can't it?

CPU migration is disabled. @oops_cpu contains the CPU ID of the only CPU
that is printing the oops. (Perhaps the variable should be called
"oops_printing_cpu"?)

If this matches smp_processor_id(), then the current CPU is the only one
that is allowed to change it back to -1. So no, if the first condition
is true, it cannot change before atomic_set(). And if the second
condition is true, this is the only CPU+context that is allowed to
change it back to -1;

> If not, why this variable is atomic then? Or, why it's not a problem?
> If the latter is the case, perhaps a comment to explain this?

If not atomic, it will be a data race since one CPU might be changing
@oops_cpu and another is reading it. For type "int" such a data race
would be fine because it doesn't matter which side of the race the
reader was on, both values will not match the current CPU ID.

The reason that I didn't implement it using cmpxchg(),
data_race(READ_ONCE()), and WRITE_ONCE() is because I once learned that
you should never mix cmpxchg() with READ_ONCE()/WRITE_ONCE() because
there are architectures that do not support cmpxchg() as an atomic
instruction. The answer was always: "use atomic_t instead... that is
what it is for".

But AFAICT for this case it would be fine because obviously cmpxchg()
will not race with itself. And successfully reading a matching CPU ID
means there cannot be any cmpxchg() in progress. And writing only occurs
after seeing a matching CPU ID.

So I can change it from atomic_t to int. Although I do feel like that
might require explanation about why the data race is safe.

Or perhaps it is enough just to have something like this:

/**
 * oops_printing_cpu - The ID of the CPU responsible for printing the
 *                     OOPS message(s) to the consoles.
 *
 * This is atomic_t because multiple CPUs can read this variable
 * simultaneously when exiting OOPS while another CPU can be
 * modifying this variable to begin or end its printing duties.
 */
static atomic_t oops_printing_cpu = ATOMIC_INIT(-1);

John Ogness

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH printk v2 09/11] panic: Add atomic write enforcement to oops
  2023-09-20 14:20     ` John Ogness
@ 2023-09-20 14:45       ` Andy Shevchenko
  0 siblings, 0 replies; 44+ messages in thread
From: Andy Shevchenko @ 2023-09-20 14:45 UTC (permalink / raw)
  To: John Ogness
  Cc: Petr Mladek, Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Kees Cook, Luis Chamberlain, Andrew Morton,
	Peter Zijlstra, Josh Poimboeuf, Arnd Bergmann,
	Guilherme G. Piccoli

On Wed, Sep 20, 2023 at 04:26:12PM +0206, John Ogness wrote:
> On 2023-09-20, Andy Shevchenko <andriy.shevchenko@linux.intel.com> wrote:
> > On Wed, Sep 20, 2023 at 01:14:54AM +0206, John Ogness wrote:

...

> >> +	if (atomic_read(&oops_cpu) == smp_processor_id()) {
> >> +		oops_nesting--;
> >> +		if (oops_nesting == 0) {
> >> +			atomic_set(&oops_cpu, -1);
> >
> > Between read and set the variable can change, can't it?
> 
> CPU migration is disabled. @oops_cpu contains the CPU ID of the only CPU
> that is printing the oops. (Perhaps the variable should be called
> "oops_printing_cpu"?)
> 
> If this matches smp_processor_id(), then the current CPU is the only one
> that is allowed to change it back to -1. So no, if the first condition
> is true, it cannot change before atomic_set(). And if the second
> condition is true, this is the only CPU+context that is allowed to
> change it back to -1;
> 
> > If not, why this variable is atomic then? Or, why it's not a problem?
> > If the latter is the case, perhaps a comment to explain this?
> 
> If not atomic, it will be a data race since one CPU might be changing
> @oops_cpu and another is reading it. For type "int" such a data race
> would be fine because it doesn't matter which side of the race the
> reader was on, both values will not match the current CPU ID.
> 
> The reason that I didn't implement it using cmpxchg(),
> data_race(READ_ONCE()), and WRITE_ONCE() is because I once learned that
> you should never mix cmpxchg() with READ_ONCE()/WRITE_ONCE() because
> there are architectures that do not support cmpxchg() as an atomic
> instruction. The answer was always: "use atomic_t instead... that is
> what it is for".
> 
> But AFAICT for this case it would be fine because obviously cmpxchg()
> will not race with itself. And successfully reading a matching CPU ID
> means there cannot be any cmpxchg() in progress. And writing only occurs
> after seeing a matching CPU ID.
> 
> So I can change it from atomic_t to int. Although I do feel like that
> might require explanation about why the data race is safe.

Either way a comment is needed, but I think the usage of atomic above
is a bit confusing as you see I immediately rose the concern.

> Or perhaps it is enough just to have something like this:
> 
> /**
>  * oops_printing_cpu - The ID of the CPU responsible for printing the
>  *                     OOPS message(s) to the consoles.
>  *
>  * This is atomic_t because multiple CPUs can read this variable
>  * simultaneously when exiting OOPS while another CPU can be
>  * modifying this variable to begin or end its printing duties.
>  */
> static atomic_t oops_printing_cpu = ATOMIC_INIT(-1);

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH printk v2 01/11] printk: Make console_is_usable() available to nbcon
  2023-09-19 23:08 ` [PATCH printk v2 01/11] printk: Make console_is_usable() available to nbcon John Ogness
@ 2023-09-22  8:33   ` Petr Mladek
  0 siblings, 0 replies; 44+ messages in thread
From: Petr Mladek @ 2023-09-22  8:33 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel

On Wed 2023-09-20 01:14:46, John Ogness wrote:
> Move console_is_usable() as-is into internal.h so that it can
> be used by nbcon printing functions as well.
> 
> Signed-off-by: John Ogness <john.ogness@linutronix.de>

The function is relatively complex for being defined as inline
in a header file. IMHO, it is around a sane limit, especially
because it is not used in fast paths. Anyway...

Reviewed-by: Petr Mladek <pmladek@suse.com>

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH printk v2 02/11] printk: Let console_is_usable() handle nbcon
  2023-09-19 23:08 ` [PATCH printk v2 02/11] printk: Let console_is_usable() handle nbcon John Ogness
@ 2023-09-22  8:37   ` Petr Mladek
  0 siblings, 0 replies; 44+ messages in thread
From: Petr Mladek @ 2023-09-22  8:37 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel

On Wed 2023-09-20 01:14:47, John Ogness wrote:
> The nbcon consoles use a different printing callback. For nbcon
> consoles, check for the write_atomic() callback instead of
> write().
> 
> Signed-off-by: John Ogness <john.ogness@linutronix.de>

Reviewed-by: Petr Mladek <pmladek@suse.com>

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH printk v2 03/11] printk: Add @flags argument for console_is_usable()
  2023-09-19 23:08 ` [PATCH printk v2 03/11] printk: Add @flags argument for console_is_usable() John Ogness
@ 2023-09-22  8:41   ` Petr Mladek
  0 siblings, 0 replies; 44+ messages in thread
From: Petr Mladek @ 2023-09-22  8:41 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel

On Wed 2023-09-20 01:14:48, John Ogness wrote:
> The caller of console_is_usable() usually needs @console->flags
> for its own checks. Rather than having console_is_usable() read
> its own copy, make the caller pass in the @flags. This also
> ensures that the caller saw the same @flags value.
> 
> Signed-off-by: John Ogness <john.ogness@linutronix.de>

Reviewed-by: Petr Mladek <pmladek@suse.com>

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH printk v2 04/11] printk: nbcon: Provide functions to mark atomic write sections
  2023-09-19 23:08 ` [PATCH printk v2 04/11] printk: nbcon: Provide functions to mark atomic write sections John Ogness
@ 2023-09-22  9:33   ` Petr Mladek
  2023-09-25  9:25     ` John Ogness
  0 siblings, 1 reply; 44+ messages in thread
From: Petr Mladek @ 2023-09-22  9:33 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman

On Wed 2023-09-20 01:14:49, John Ogness wrote:
> From: Thomas Gleixner <tglx@linutronix.de>
> 
> WARN/OOPS/PANIC require printing out immediately since the
> regular printing method (and in the future, the printing
> threads) might not be able to run.
> 
> Add per-CPU state to denote the priority/urgency of the output
> and provide functions to mark the beginning and end of sections
> where the urgent messages are generated.
> 
> Note that when a CPU is in a priority elevated state, flushing
> only occurs when dropping back to a lower priority. This allows
> the full set of printk records (WARN/OOPS/PANIC output) to be
> stored in the ringbuffer before beginning to flush the backlog.

The above paragraph is a bit confusing. The code added by this patch
does not do any flushing. I guess that this last paragraph is supposed
to explain why the "nesting" array is needed. I would write
something like:

"The state also counts nesting of printing contexts per-priority.
It will be later used to prevent flushing in nested contexts."

That said, I am not sure if the special handling of nested contexts
is needed. But let's discuss it in the patch introducing the flush
funtions.

> --- a/kernel/printk/nbcon.c
> +++ b/kernel/printk/nbcon.c
> @@ -961,6 +961,95 @@ static bool nbcon_emit_next_record(struct nbcon_write_context *wctxt)
>  	return nbcon_context_exit_unsafe(ctxt);
>  }
>  
> +/**
> + * struct nbcon_cpu_state - Per CPU printk context state
> + * @prio:	The current context priority level
> + * @nesting:	Per priority nest counter
> + */
> +struct nbcon_cpu_state {
> +	enum nbcon_prio		prio;
> +	int			nesting[NBCON_PRIO_MAX];
> +};
> +
> +static DEFINE_PER_CPU(struct nbcon_cpu_state, nbcon_pcpu_state);
> +static struct nbcon_cpu_state early_nbcon_pcpu_state __initdata;
> +
> +/**
> + * nbcon_get_cpu_state - Get the per CPU console state pointer
> + *
> + * Returns either a pointer to the per CPU state of the current CPU or to
> + * the init data state during early boot.
> + */
> +static __ref struct nbcon_cpu_state *nbcon_get_cpu_state(void)
> +{
> +	if (!printk_percpu_data_ready())
> +		return &early_nbcon_pcpu_state;

My first thought, was that this was racy. I was afraid that
printk_percpu_data_ready() could change value inside
atomit_enter()/exit() area. But it actually could not happen.
Anyway, it might worth a comment. Something like:

	/*
	 * The value of __printk_percpu_data_ready is modified in normal
	 * context. As a result it could never change inside a nbcon
	 * atomic context.
	 */
	if (!printk_percpu_data_ready())
		return &early_nbcon_pcpu_state;

> +
> +	return this_cpu_ptr(&nbcon_pcpu_state);
> +}
> +
> +/**
> + * nbcon_atomic_exit - Exit a context that enforces atomic printing
> + * @prio:	Priority of the context to leave
> + * @prev_prio:	Priority of the previous context for restore
> + *
> + * Context:	Any context. Enables migration.
> + *
> + * @prev_prio is the priority returned by the corresponding
> + * nbcon_atomic_enter().
> + */
> +void nbcon_atomic_exit(enum nbcon_prio prio, enum nbcon_prio prev_prio)
> +{
> +	struct nbcon_cpu_state *cpu_state;
> +
> +	cpu_state = nbcon_get_cpu_state();

I would add a consistency check:

	WARN_ON_ONCE(cpu_state->nesting[cpu_state->prio] <= 0)

> +	/*
> +	 * Undo the nesting of nbcon_atomic_enter() at the CPU state
> +	 * priority.
> +	 */
> +	cpu_state->nesting[cpu_state->prio]--;
> +
> +	/*
> +	 * Restore the previous priority, which was returned by
> +	 * nbcon_atomic_enter().
> +	 */
> +	cpu_state->prio = prev_prio;
> +
> +	migrate_enable();
> +}

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH printk v2 05/11] printk: nbcon: Provide function for atomic flushing
  2023-09-19 23:08 ` [PATCH printk v2 05/11] printk: nbcon: Provide function for atomic flushing John Ogness
@ 2023-09-22 12:32   ` Petr Mladek
  2023-09-25 11:11     ` John Ogness
  0 siblings, 1 reply; 44+ messages in thread
From: Petr Mladek @ 2023-09-22 12:32 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel

On Wed 2023-09-20 01:14:50, John Ogness wrote:
> From: Thomas Gleixner <tglx@linutronix.de>
> 
> Provide nbcon_atomic_flush() to perform atomic write flushing
> of all registered nbcon consoles. Like with legacy consoles,
> the nbcon consoles are flushed one record per console. This
> allows all nbcon consoles to generate pseudo-simultaneously,
> rather than one console waiting for the full ringbuffer to
> dump to another console before printing anything.
> 
> Note that if the current CPU is in a nested elevated priority
> state (EMERGENCY/PANIC), nbcon_atomic_flush() does nothing.

This confused me a bit. It was not clear to me if it was
"nested and elevated" or "the elevated priority was nested".
Well, it is probably because I am not a native speaker.

I would describe it another way, see below.

> This is in case the printing itself generates urgent messages
> (OOPS/WARN/PANIC), that those messages are fully stored into
> the ringbuffer before any printing resumes.

This feels like it was an advantage. But I would say that it is
a limitation. IMHO, it simply works this way and we should describe
it as a limitation. See below.

> --- a/kernel/printk/nbcon.c
> +++ b/kernel/printk/nbcon.c
> @@ -988,6 +986,105 @@ static __ref struct nbcon_cpu_state *nbcon_get_cpu_state(void)
>  	return this_cpu_ptr(&nbcon_pcpu_state);
>  }
>  
> +/**
> + * nbcon_atomic_emit_one - Print one record for a console in atomic mode
> + * @wctxt:			An initialized write context struct to use
> + *				for this context
> + *
> + * Returns false if the given console could not print a record or there are
> + * no more records to print, otherwise true.
> + *
> + * This is an internal helper to handle the locking of the console before
> + * calling nbcon_emit_next_record().
> + */
> +static bool nbcon_atomic_emit_one(struct nbcon_write_context *wctxt)
> +{
> +	struct nbcon_context *ctxt = &ACCESS_PRIVATE(wctxt, ctxt);
> +
> +	if (!nbcon_context_try_acquire(ctxt))
> +		return false;
> +
> +	/*
> +	 * nbcon_emit_next_record() returns false when the console was
> +	 * handed over or taken over. In both cases the context is no
> +	 * longer valid.
> +	 */
> +	if (!nbcon_emit_next_record(wctxt))
> +		return false;
> +
> +	nbcon_context_release(ctxt);
> +
> +	return prb_read_valid(prb, ctxt->seq, NULL);

IMHO, it should be enough to check ctxt->backlog. I mean to do:

	return !!ctxt->backlog;

We are here only when nbcon_emit_next_record() owned the context and
was able to call printk_get_next_message(). nbcon_emit_next_record()
and nbcon_atomic_emit_one() would work consistently especially
when prb_read_valid() is not called under the console lock here.


> +}
> +
> +/**
> + * __nbcon_atomic_flush_all - Flush all nbcon consoles in atomic mode
> + * @allow_unsafe_takeover:	True, to allow unsafe hostile takeovers
> + */
> +static void __nbcon_atomic_flush_all(bool allow_unsafe_takeover)
> +{
> +	struct nbcon_write_context wctxt = { };
> +	struct nbcon_context *ctxt = &ACCESS_PRIVATE(&wctxt, ctxt);
> +	struct nbcon_cpu_state *cpu_state;
> +	struct console *con;
> +	bool any_progress;
> +	int cookie;
> +
> +	cpu_state = nbcon_get_cpu_state();
> +
> +	/*
> +	 * Let the outermost flush of this priority print. This avoids
> +	 * nasty hackery for nested WARN() where the printing itself
> +	 * generates one and ensures such nested messages are stored to
> +	 * the ringbuffer before any printing resumes.

It is not clear to me what hackery was meant. The fact is that
only printk_once() or WARN_ONCE() should be used in the console
emit/flush code paths. Any non-once printk might block consoles
and even these nesting checks probably would not help much.

Anyway, I believe that we do not need this nesting counter.
The nesting is already prevented by nbcon_context_try_acquire().
It would not allow to take the nested lock with the same priority.

I guess that this extra nesting counter made some sense only in the earlier
variants of the per-cpu console lock.

I would personally just describe the behavior in the commit message
and in the comment above __nbcon_atomic_flush_all():

	* The messages are flushed only when this context is able to
	* get the per-console lock. Namely, it works only when the
	* lock is free or when this context has a higher priority
	* than the current owner.

> +	 *
> +	 * cpu_state->prio <= NBCON_PRIO_NORMAL is not subject to nesting
> +	 * and can proceed in order to allow any atomic printing for
> +	 * regular kernel messages.
> +	 */
> +	if (cpu_state->prio > NBCON_PRIO_NORMAL &&
> +	    cpu_state->nesting[cpu_state->prio] != 1)
> +		return;
> +
> +	do {
> +		any_progress = false;
> +
> +		cookie = console_srcu_read_lock();
> +		for_each_console_srcu(con) {
> +			short flags = console_srcu_read_flags(con);
> +			bool progress;
> +
> +			if (!(flags & CON_NBCON))
> +				continue;
> +
> +			if (!console_is_usable(con, flags))
> +				continue;
> +
> +			memset(ctxt, 0, sizeof(*ctxt));
> +			ctxt->console			= con;
> +			ctxt->spinwait_max_us		= 2000;
> +			ctxt->prio			= cpu_state->prio;
> +			ctxt->allow_unsafe_takeover	= allow_unsafe_takeover;
> +
> +			progress = nbcon_atomic_emit_one(&wctxt);
> +			if (!progress)
> +				continue;
> +			any_progress = true;
> +		}
> +		console_srcu_read_unlock(cookie);
> +	} while (any_progress);
> +}
> +
> +/**
> + * nbcon_atomic_flush_all - Flush all nbcon consoles in atomic mode
> + *
> + * Context:	Any context where migration is disabled.

We should make it more clear what migration is meant here. For
example:

 * Context:	Any context which could not be migrated to another CPU.

> + */
> +void nbcon_atomic_flush_all(void)
> +{
> +	__nbcon_atomic_flush_all(false);
> +}
> +

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH printk v2 06/11] printk: nbcon: Wire up nbcon console atomic flushing
  2023-09-19 23:08 ` [PATCH printk v2 06/11] printk: nbcon: Wire up nbcon console " John Ogness
@ 2023-09-22 17:41   ` Petr Mladek
  2023-09-25 13:37     ` John Ogness
  0 siblings, 1 reply; 44+ messages in thread
From: Petr Mladek @ 2023-09-22 17:41 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel

On Wed 2023-09-20 01:14:51, John Ogness wrote:
> From: Thomas Gleixner <tglx@linutronix.de>
> 
> Perform nbcon console atomic flushing at key call sites:
> 
> nbcon_atomic_exit() - When exiting from the outermost atomic
> 	printing section.

IMHO, it would not help because it all depends
whether nbcon_context_try_acquire() succeeds or now, see below.

> 	If the priority was NBCON_PRIO_PANIC,
> 	attempt a second flush allowing unsafe hostile
> 	takeovers.

I was first scared that this would be called by any printk()
in panic(). But it seems that nbcon_atomic_exit() is called
only one after disabling printk(). It might deserve a comment
where it is supposed to be used. See a proposal below.


> console_flush_on_panic() - Called from several call sites to
> 	trigger ringbuffer dumping in an urgent situation.
> 
> console_flush_on_panic() - Typically the panic() function will

This is a second description of console_flush_of_panic() which
looks like a mistake.

> 	take care of atomic flushing the nbcon consoles on
> 	panic. However, there are several users of
> 	console_flush_on_panic() outside of panic().

The generic panic() seems to use console_flush_on_panic() correctly
at the very end.

Hmm, I see that console_flush_on_panic() is called also in
arch/loongarch/kernel/reset.c and arch/powerpc/kernel/traps.c.

The loongarch code uses it in halt(). IMHO, it would be
better to use the normal flush. But maybe they accept the risk.
I know nothing about this architecture and who uses it.

The function defined on powerpc is then used in:

  + arch/powerpc/platforms/powernv/opal.c:

     IMHO, it should be replaced there by normal flush. It seems
     to call the generic panic() later.

  + arch/powerpc/platforms/ps3/setup.c:

      This seems to be used as a panic notifier ppc_panic_platform_handler().
      I am not completely sure but it does not look like the final flush.

  + arch/powerpc/platforms/pseries/setup.c:

      Same as ps3/setup.c. Also "just" a panic notifier.

Anyway, we should make clear that console_flush_on_panic() might break
the system and should be called as the last attempt to flush consoles.
The above arch-specific users are worth review.

> --- a/kernel/printk/nbcon.c
> +++ b/kernel/printk/nbcon.c
> @@ -1092,6 +1092,11 @@ void nbcon_atomic_flush_all(void)
>   * Return:	The previous priority that needs to be fed into
>   *		the corresponding nbcon_atomic_exit()
>   * Context:	Any context. Disables migration.
> + *
> + * When within an atomic printing section, no atomic printing occurs. This
> + * is to allow all emergency messages to be dumped into the ringbuffer before
> + * flushing the ringbuffer.

The comment sounds like it is an advantage. But I think that it would be
a disadvantage.

Fortunately, this is not true. Even the atomic context tries
to flush the messages immediately when it is able to get
the per-cpu lock. It happes via console_flush_all() called
from console_unlock().

> + * The actual atomic printing occurs when exiting
> + * the outermost atomic printing section.
>   */
>  enum nbcon_prio nbcon_atomic_enter(enum nbcon_prio prio)
>  {
> @@ -1130,8 +1135,13 @@ void nbcon_atomic_exit(enum nbcon_prio prio, enum nbcon_prio prev_prio)
>  {
>  	struct nbcon_cpu_state *cpu_state;
>  
> +	__nbcon_atomic_flush_all(false);

IMHO, this would not help. Either this context was able to acquire the
lock and flush each message directly. Or it would fail to get the lock
even here.

> +
>  	cpu_state = nbcon_get_cpu_state();
>  
> +	if (cpu_state->prio == NBCON_PRIO_PANIC)
> +		__nbcon_atomic_flush_all(true);

It would deserve a comment that nbcon_atomic_exit() is used
in panic() to calm down the consoles completely, similar effect
as setting the variable "suppress_printk".

> +
>  	/*
>  	 * Undo the nesting of nbcon_atomic_enter() at the CPU state
>  	 * priority.
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 6ef33cefa4d0..419c0fabc481 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -3159,6 +3159,8 @@ void console_flush_on_panic(enum con_flush_mode mode)
>  		console_srcu_read_unlock(cookie);
>  	}
>  
> +	nbcon_atomic_flush_all();
> +
>  	console_flush_all(false, &next_seq, &handover);

It seems that console_flush_all() tries to flush nbcon conosoles
as well. And nbcon_atomic_flush_all() is explicitely called
close above this flush_on_panic(). This is from panic()
after this patchset is applied.


void panic(const char *fmt, ...)
{
[...]
	nbcon_atomic_flush_all();

	console_unblank();

	/*
	 * We may have ended up stopping the CPU holding the lock (in
	 * smp_send_stop()) while still having some valuable data in the console
	 * buffer.  Try to acquire the lock then release it regardless of the
	 * result.  The release will also print the buffers out.  Locks debug
	 * should be disabled to avoid reporting bad unlock balance when
	 * panic() is not being callled from OOPS.
	 */
	debug_locks_off();
	console_flush_on_panic(CONSOLE_FLUSH_PENDING);


By other words, we try to flush nbcon consoles 3 times almost
immediately after each other. I believe that there is a motivation
to do so. Anyway, I want to make sure that it was on purpose.

It would deserve some comment what is the purpose. Otherwise, people
would tend to remove the "redundant" calls.

>  }
>  
> @@ -3903,6 +3905,10 @@ void defer_console_output(void)
>  
>  void printk_trigger_flush(void)
>  {
> +	migrate_disable();
> +	nbcon_atomic_flush_all();
> +	migrate_enable();

Can this be actually called in NMI?

> +
>  	defer_console_output();
>  }

This function would deserve some description because it looks strange.
There are three names which are contradicting each other:

  + trigger_flush:    it sounds like it tells someone to do the flush

  + nbcon_atomic_flush_all: does the flush immediately

  + defer_console_output: sounds like it prevents the current context
	to flush the consoles immediately

We should either choose better names and/or add comments:

/**
 * console_flush_or_trigger() - Make sure that the consoles will get flushed.
 *
 * Try to flush consoles when possible or queue flushing consoles like
 * in the deferred printk.
 *
 * Context: Can be used in any context (including NMI?)
 */
void printk_flush_or_trigger(void)
{
	/*
	 * Try to flush consoles which do not depend on console_lock()
	 * and support safe atomic_write()
	 */
	migrate_disable();
	nbcon_atomic_flush_all();
	migrate_enable();

	/* Try flushing legacy consoles or queue the deferred handling. */
	if (!in_nmi() && console_trylock())
		console_unlock();
	else
		defer_console_output();
}


All in all. I feel a bit overwhelmed by the many *flush*() functions at
the moment. Some flush only nbcon consoles. Some flush all. Some
are using only the safe takeover/handover and some allow also
harsh takeover. Some ignore port->lock because of bust_spinlock().
Some are ignoring console_lock. They are called on different
locations. The nbcon variants are called explicitly and also
inside the generic *flush*() functions.

It is partly because it is Friday evening. Anyway, I need to think
more about it.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH printk v2 04/11] printk: nbcon: Provide functions to mark atomic write sections
  2023-09-22  9:33   ` Petr Mladek
@ 2023-09-25  9:25     ` John Ogness
  2023-09-25 16:04       ` Petr Mladek
  0 siblings, 1 reply; 44+ messages in thread
From: John Ogness @ 2023-09-25  9:25 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman

On 2023-09-22, Petr Mladek <pmladek@suse.com> wrote:
>> Note that when a CPU is in a priority elevated state, flushing
>> only occurs when dropping back to a lower priority. This allows
>> the full set of printk records (WARN/OOPS/PANIC output) to be
>> stored in the ringbuffer before beginning to flush the backlog.
>
> The above paragraph is a bit confusing. The code added by this patch
> does not do any flushing.

You are right. I should put this patch after patch 5 "printk: nbcon:
Provide function for atomic flushing" to simplify the introduction.

> I guess that this last paragraph is supposed to explain why the
> "nesting" array is needed.

No, it is explaining how this feature works in general. The term
"priority elevated state" means the CPU is in an atomic write section.

The "nesting" array is needed in order to support a feature that is not
explained in the commit message: If nested OOPS/WARN/PANIC occur, only
the outermost OOPS/WARN/PANIC will do the flushing. I will add this
information to the commit message.

>> +static __ref struct nbcon_cpu_state *nbcon_get_cpu_state(void)
>> +{
>> +	if (!printk_percpu_data_ready())
>> +		return &early_nbcon_pcpu_state;
>
> it might worth a comment. Something like:
>
> 	/*
> 	 * The value of __printk_percpu_data_ready is modified in normal
> 	 * context. As a result it could never change inside a nbcon
> 	 * atomic context.
> 	 */
> 	if (!printk_percpu_data_ready())
> 		return &early_nbcon_pcpu_state;

OK.

>> +void nbcon_atomic_exit(enum nbcon_prio prio, enum nbcon_prio prev_prio)
>> +{
>> +	struct nbcon_cpu_state *cpu_state;
>> +
>> +	cpu_state = nbcon_get_cpu_state();
>
> I would add a consistency check:
>
> 	WARN_ON_ONCE(cpu_state->nesting[cpu_state->prio] <= 0)

OK.

John

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH printk v2 05/11] printk: nbcon: Provide function for atomic flushing
  2023-09-22 12:32   ` Petr Mladek
@ 2023-09-25 11:11     ` John Ogness
  0 siblings, 0 replies; 44+ messages in thread
From: John Ogness @ 2023-09-25 11:11 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel

On 2023-09-22, Petr Mladek <pmladek@suse.com> wrote:
>> Note that if the current CPU is in a nested elevated priority
>> state (EMERGENCY/PANIC), nbcon_atomic_flush() does nothing.
>
> This confused me a bit. It was not clear to me if it was
> "nested and elevated" or "the elevated priority was nested".

Elevated priority within an elevated priority. Or put another way: an
atomic printing section within an atomic printing section. Maybe the
"elevated priority" terminology is confusing. I can just use "atomic
printing section" instead if that helps.

>> This is in case the printing itself generates urgent messages
>> (OOPS/WARN/PANIC), that those messages are fully stored into
>> the ringbuffer before any printing resumes.
>
> This feels like it was an advantage. But I would say that it is
> a limitation. IMHO, it simply works this way and we should describe
> it as a limitation.

The "atomic printing section" feature was the result of designing this
advantage. It "simply works this way" because that it how it was
designed.

Actually, this is explaining the nesting variable that you asked about
in the previous patch commit message. When I reverse the patch order,
this paragraph will be moved into that patch commit message.

>> +/**
>> + * nbcon_atomic_emit_one - Print one record for a console in atomic mode
>> + * @wctxt:			An initialized write context struct to use
>> + *				for this context
>> + *
>> + * Returns false if the given console could not print a record or there are
>> + * no more records to print, otherwise true.
>> + *
>> + * This is an internal helper to handle the locking of the console before
>> + * calling nbcon_emit_next_record().
>> + */
>> +static bool nbcon_atomic_emit_one(struct nbcon_write_context *wctxt)
>> +{
>> +	struct nbcon_context *ctxt = &ACCESS_PRIVATE(wctxt, ctxt);
>> +
>> +	if (!nbcon_context_try_acquire(ctxt))
>> +		return false;
>> +
>> +	/*
>> +	 * nbcon_emit_next_record() returns false when the console was
>> +	 * handed over or taken over. In both cases the context is no
>> +	 * longer valid.
>> +	 */
>> +	if (!nbcon_emit_next_record(wctxt))
>> +		return false;
>> +
>> +	nbcon_context_release(ctxt);
>> +
>> +	return prb_read_valid(prb, ctxt->seq, NULL);
>
> IMHO, it should be enough to check ctxt->backlog. I mean to do:
>
> 	return !!ctxt->backlog;
>
> We are here only when nbcon_emit_next_record() owned the context and
> was able to call printk_get_next_message().

Yes, but ctxt->backlog is set before the printing begins. If any nested
atomic printing occurs (i.e. just adding records to the ringbuffer),
these also need to be atomically printed.

For example, console_unlock() deals with that situation with:

                /*
                 * Some context may have added new records after
                 * console_flush_all() but before unlocking the console.
                 * Re-check if there is a new record to flush. If the trylock
                 * fails, another context is already handling the printing.
                 */
        } while (prb_read_valid(prb, next_seq, NULL) && console_trylock());

The prb_read_valid() here corresponds to the prb_read_valid() in
console_unlock(). I can add a similar comment here for that.

>> +static void __nbcon_atomic_flush_all(bool allow_unsafe_takeover)
>> +{
>> +	struct nbcon_write_context wctxt = { };
>> +	struct nbcon_context *ctxt = &ACCESS_PRIVATE(&wctxt, ctxt);
>> +	struct nbcon_cpu_state *cpu_state;
>> +	struct console *con;
>> +	bool any_progress;
>> +	int cookie;
>> +
>> +	cpu_state = nbcon_get_cpu_state();
>> +
>> +	/*
>> +	 * Let the outermost flush of this priority print. This avoids
>> +	 * nasty hackery for nested WARN() where the printing itself
>> +	 * generates one and ensures such nested messages are stored to
>> +	 * the ringbuffer before any printing resumes.
>
> It is not clear to me what hackery was meant.

Hackery = Trying to implement this feature without tracking CPU state
priorities.

> The fact is that only printk_once() or WARN_ONCE() should be used in
> the console emit/flush code paths. Any non-once printk might block
> consoles and even these nesting checks probably would not help much.

I am not sure what that has to do with it. This is a flush function,
which (for example) will be called when a warning is hit. We do _not_
want to flush the console if something more important (a panic) is
already in the process of being added to the ringbuffer.

> Anyway, I believe that we do not need this nesting counter.
> The nesting is already prevented by nbcon_context_try_acquire().
> It would not allow to take the nested lock with the same priority.

You are mixing 2 different things:

The acquire is related to ownership of a console.

The nesting is related to urgency state of a CPU.

> I would personally just describe the behavior in the commit message
> and in the comment above __nbcon_atomic_flush_all():
>
> 	* The messages are flushed only when this context is able to
> 	* get the per-console lock. Namely, it works only when the
> 	* lock is free or when this context has a higher priority
> 	* than the current owner.

Your comment is stating the obvious. All messages are only written by a
context when that context can acquire ownership.

What the check here is doing is refusing to write messages even if it
_could_ acquire ownership. It isn't about console ownership. It is about
not _wanting_ to print in nested atomic printing sections.

>> +	if (cpu_state->prio > NBCON_PRIO_NORMAL &&
>> +	    cpu_state->nesting[cpu_state->prio] != 1)
>> +		return;

[...]

>> +/**
>> + * nbcon_atomic_flush_all - Flush all nbcon consoles in atomic mode
>> + *
>> + * Context:	Any context where migration is disabled.
>
> We should make it more clear what migration is meant here. For
> example:
>
>  * Context:	Any context which could not be migrated to another CPU.

OK.

John

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH printk v2 06/11] printk: nbcon: Wire up nbcon console atomic flushing
  2023-09-22 17:41   ` Petr Mladek
@ 2023-09-25 13:37     ` John Ogness
  2023-09-26 12:14       ` Petr Mladek
  0 siblings, 1 reply; 44+ messages in thread
From: John Ogness @ 2023-09-25 13:37 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel

On 2023-09-22, Petr Mladek <pmladek@suse.com> wrote:
>> Perform nbcon console atomic flushing at key call sites:
>> 
>> nbcon_atomic_exit() - When exiting from the outermost atomic
>> 	printing section.
>
> IMHO, it would not help because it all depends
> whether nbcon_context_try_acquire() succeeds or now, see below.

Sure, but it first attempts a _safe_ flush on all nbcon consoles that
alow it, _then_ it falls back to an _unsafe_ flush (which cannot fail,
but might explode).

>> 	If the priority was NBCON_PRIO_PANIC,
>> 	attempt a second flush allowing unsafe hostile
>> 	takeovers.
>
> I was first scared that this would be called by any printk()
> in panic(). But it seems that nbcon_atomic_exit() is called
> only one after disabling printk(). It might deserve a comment
> where it is supposed to be used. See a proposal below.

OK.

>> console_flush_on_panic() - Called from several call sites to
>> 	trigger ringbuffer dumping in an urgent situation.
>> 
>> console_flush_on_panic() - Typically the panic() function will
>
> This is a second description of console_flush_of_panic() which
> looks like a mistake.

Oops. The first one should not be there.

>> 	take care of atomic flushing the nbcon consoles on
>> 	panic. However, there are several users of
>> 	console_flush_on_panic() outside of panic().
>
> The generic panic() seems to use console_flush_on_panic() correctly
> at the very end.
>
> Hmm, I see that console_flush_on_panic() is called also in

[...]

> Anyway, we should make clear that console_flush_on_panic() might break
> the system and should be called as the last attempt to flush consoles.
> The above arch-specific users are worth review.

In an upcoming series you will see that console_flush_on_panic() only
takes the console lock if there are legacy consoles. Ideally, eventually
there will only be nbcon consoles, so your concern would disappear.

And if those users continue to use legacy consoles, then the risks will
be the same as now.

>>   * Return:	The previous priority that needs to be fed into
>>   *		the corresponding nbcon_atomic_exit()
>>   * Context:	Any context. Disables migration.
>> + *
>> + * When within an atomic printing section, no atomic printing occurs. This
>> + * is to allow all emergency messages to be dumped into the ringbuffer before
>> + * flushing the ringbuffer.
>
> The comment sounds like it is an advantage. But I think that it would be
> a disadvantage.

Please explain. At LPC2022 we agreed it was an advantage and it is
implemented on purpose using the atomic printing sections.

> Fortunately, this is not true. Even the atomic context tries
> to flush the messages immediately when it is able to get
> the per-cpu lock. It happes via console_flush_all() called
> from console_unlock().

You are talking about legacy consoles.

The goal of the nbcon consoles is to introduce a _new_ console type to
support controlled decisions for emergency printing. Legacy consoles
will continue to work (or not work) as before.

>> @@ -1130,8 +1135,13 @@ void nbcon_atomic_exit(enum nbcon_prio prio, enum nbcon_prio prev_prio)
>>  {
>>  	struct nbcon_cpu_state *cpu_state;
>>  
>> +	__nbcon_atomic_flush_all(false);
>
> IMHO, this would not help. Either this context was able to acquire the
> lock and flush each message directly.

What do you mean by "help"? We are exiting an atomic printing
section. This is where the accumulated emergency messages are to be
printed. If this is a nested atomic printing section, it does nothing
because the outermost atomic printing section will flush.

> Or it would fail to get the lock even here.

If it fails to get the lock, we are talking about the worst case
scenario. That scenario demands unsafe printing, which is only allowed
in panic. If we are not in panic, it is assumed the current owner will
take care of it.

>> +
>>  	cpu_state = nbcon_get_cpu_state();
>>  
>> +	if (cpu_state->prio == NBCON_PRIO_PANIC)
>> +		__nbcon_atomic_flush_all(true);
>
> It would deserve a comment that nbcon_atomic_exit() is used
> in panic() to calm down the consoles completely, similar effect
> as setting the variable "suppress_printk".

Actually, it is nbcon_atomic_enter() that does the calming down, but
only for the emergency CPU. Other CPUs are allowed to print as wildly as
they want. And all CPUs are allowed to continue to fill the ringbuffer.

@suppress_printk is quite different as it globally blocks any new
messages from being stored into the ringbuffer. This is not calming down
the consoles, but rather putting a finite limit on what they produce.

>> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
>> index 6ef33cefa4d0..419c0fabc481 100644
>> --- a/kernel/printk/printk.c
>> +++ b/kernel/printk/printk.c
>> @@ -3159,6 +3159,8 @@ void console_flush_on_panic(enum con_flush_mode mode)
>>  		console_srcu_read_unlock(cookie);
>>  	}
>>  
>> +	nbcon_atomic_flush_all();
>> +
>>  	console_flush_all(false, &next_seq, &handover);
>
> It seems that console_flush_all() tries to flush nbcon conosoles
> as well. And nbcon_atomic_flush_all() is explicitely called
> close above this flush_on_panic(). This is from panic()
> after this patchset is applied.
>
>
> void panic(const char *fmt, ...)
> {
> [...]
> 	nbcon_atomic_flush_all();
>
> 	console_unblank();
>
> 	/*
> 	 * We may have ended up stopping the CPU holding the lock (in
> 	 * smp_send_stop()) while still having some valuable data in the console
> 	 * buffer.  Try to acquire the lock then release it regardless of the
> 	 * result.  The release will also print the buffers out.  Locks debug
> 	 * should be disabled to avoid reporting bad unlock balance when
> 	 * panic() is not being callled from OOPS.
> 	 */
> 	debug_locks_off();
> 	console_flush_on_panic(CONSOLE_FLUSH_PENDING);
>
>
> By other words, we try to flush nbcon consoles 3 times almost
> immediately after each other.

"Almost immediately" != "immediately". Keep in mind that this is in
atomic printing context. Generally speaking, no messages will be seen on
the consoles yet. Anytime time there is a signficant piece (such as
console_unblank()) it would probably be wise to flush.

We will probably want even more flush points. But I wanted to at least
start with the risky points (such as console_unblank() when legacy
consoles still exist).

> It would deserve some comment what is the purpose. Otherwise, people
> would tend to remove the "redundant" calls.

OK.

>> @@ -3903,6 +3905,10 @@ void defer_console_output(void)
>>  
>>  void printk_trigger_flush(void)
>>  {
>> +	migrate_disable();
>> +	nbcon_atomic_flush_all();
>> +	migrate_enable();
>
> Can this be actually called in NMI?

No. I need to add in_nmi() checks.

>>  	defer_console_output();
>>  }
>
> This function would deserve some description because it looks strange.
> There are three names which are contradicting each other:
>
>   + trigger_flush:    it sounds like it tells someone to do the flush
>
>   + nbcon_atomic_flush_all: does the flush immediately
>
>   + defer_console_output: sounds like it prevents the current context
> 	to flush the consoles immediately

Agreed.

> We should either choose better names and/or add comments:
>
> /**
>  * console_flush_or_trigger() - Make sure that the consoles will get flushed.
>  *
>  * Try to flush consoles when possible or queue flushing consoles like
>  * in the deferred printk.
>  *
>  * Context: Can be used in any context (including NMI?)
>  */
> void printk_flush_or_trigger(void)
> {
> 	/*
> 	 * Try to flush consoles which do not depend on console_lock()
> 	 * and support safe atomic_write()
> 	 */
 	if (!in_nmi()) migrate_disable();
> 	nbcon_atomic_flush_all();
 	if (!in_nmi()) migrate_enable();
>
> 	/* Try flushing legacy consoles or queue the deferred handling. */
> 	if (!in_nmi() && console_trylock())
> 		console_unlock();
> 	else
> 		defer_console_output();
> }

I would be OK with this name and (fixed up for NMI) implementation.

> All in all. I feel a bit overwhelmed by the many *flush*() functions at
> the moment. Some flush only nbcon consoles. Some flush all. Some
> are using only the safe takeover/handover and some allow also
> harsh takeover. Some ignore port->lock because of bust_spinlock().
> Some are ignoring console_lock. They are called on different
> locations. The nbcon variants are called explicitly and also
> inside the generic *flush*() functions.

Agreed. This madness exists in part because we are continuing to support
legacy consoles. I tried to keep the two separate as much as
possible. But when it comes to flushing, there will be some overlap.

When working on the code, I tend to either look at the legacy consoles
_or_ the nbcon consoles. If you try to visualize the whole picture on a
system with legacy and nbcon consoles, it is a bit overwhelming. I
recommend focussing on them separately. I.e. when talking about nbcon
consoles, there is no reason to be concerned about the state of the
console lock or the port lock. When talking about legacy consoles, there
is no reason to be concerned about CPU atomic printing states.

John

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH printk v2 04/11] printk: nbcon: Provide functions to mark atomic write sections
  2023-09-25  9:25     ` John Ogness
@ 2023-09-25 16:04       ` Petr Mladek
  2023-10-05 12:51         ` John Ogness
  0 siblings, 1 reply; 44+ messages in thread
From: Petr Mladek @ 2023-09-25 16:04 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman

On Mon 2023-09-25 11:31:54, John Ogness wrote:
> On 2023-09-22, Petr Mladek <pmladek@suse.com> wrote:
> >> Note that when a CPU is in a priority elevated state, flushing
> >> only occurs when dropping back to a lower priority. This allows
> >> the full set of printk records (WARN/OOPS/PANIC output) to be
> >> stored in the ringbuffer before beginning to flush the backlog.
> >
> > The above paragraph is a bit confusing. The code added by this patch
> > does not do any flushing.
> 
> You are right. I should put this patch after patch 5 "printk: nbcon:
> Provide function for atomic flushing" to simplify the introduction.
> 
> > I guess that this last paragraph is supposed to explain why the
> > "nesting" array is needed.
> 
> No, it is explaining how this feature works in general. The term
> "priority elevated state" means the CPU is in an atomic write section.

This did not help me much after at first. But I got it later after
connection more dots ;-)

IMHO, the problem was that the commit message introduced the terms
in the following order:

   + WARN/OOPS/PANIC require printing out immediately

   + per-CPU state to denote the priority/urgency

   + functions to mark the beginning/end where the urgent messages
     are generated

   + when CPU is in priority elevated state, flushing only occurs
     when dropping back to a lower priority

From my POV:

   + It did not mention/explained "atomic write" at all

   + It said that the urgent messages required immediate printing.
     And Later, it said that they would get flushed later. Which
     is contradicting each other.

   + The handling of priorities is not only about CPU nesting.
     The same rules should apply also when other CPU is printing
     messages in a higher priority context.

My proposal:

  + Use the term "higher priority context" for all WARN/OOPS/PANIC
    messages.

  + Use "emergency priority context" or "nbcon emergency context"
    when talking about NBCON_PRIO_EMERGENCY context to avoid
    confusion with the printk log levels.

  + use "panic priority context or "nbcon panic context" for the panic
    one if we really want to add enter/exit for the panic context.

  + We must somewhere explain the "atomic context" and "atomic_write".
    callback. One important question is why it is atomic. Is it because it

      + _can_ be called in atomic context?
      + _must_ be called in atomic context?

    It is called also from console_unlock() for boot messages
    so it need not be in atomic context.

    What about renaming it to "nbcon_write" to avoid this confusion?


> The "nesting" array is needed in order to support a feature that is not
> explained in the commit message: If nested OOPS/WARN/PANIC occur, only
> the outermost OOPS/WARN/PANIC will do the flushing. I will add this
> information to the commit message.

What is the motivation for the feature, please?

1. Either we want to flush the messages in the higher priority context
   ASAP.

   The per-CPU lock has been designed exactly for this purpose. It
   would not need any extra nesting counter. And it would work
   consistently also when the lock is acquired on another CPU.

   It is simple, the context will either get the per-console lock or
   not. The (nested) context might get the lock only when it has higher
   priority. The flush would be called directly from vprintk_emit().

   I always thought that this was the planned behavior.

   IMHO, we want it. A nested panic() should be able to takeover
   the console and flush messages ASAP. It will never return.


2. Or we want to wait until all messages in the higher priority context
   are stored into the ring buffer and flush them by the caller.

   Who would actually flush the higher messages?
   WARN() caller?
   The timer callback which detected softlockup?
   Or a completely different context?
   Who would flush panic() messages when panic() interrupted
   locked CPU?


My proposal:

There are only two higher priority contexts:

  + NBCON_PRIO_PANIC should be used when panic_cpu == raw_smp_processor_id()

  + NBCON_PRIO_EMERGENCY contex would require some enter/exit wrappers
    and tracking. But it does not necessarily need to be per-CPU
    variable.

I think about adding "int printk_state" into struct task_struct.
It might be useful also for other things, e.g. for storing the last
log level of non-finished message. Entering section with enforced
minimal loglevel or so.


Then we could have:

#define PRINTK_STATE_EMERGENCY_MASK   0x000000ff
#define PRINTK_STATE_EMERGENCY_OFFSET 0x00000001

void nbcon_emergency_enter(&printk_state)
{
	*printk_state = current->printk_state;

	WARN_ON_ONCE((*printk_state & PRINTK_STATE_EMERGENCY_MASK) == PRINTK_STATE_EMERGENCY_MASK);

	current->printk_state = *printk_state + PRINTK_STATE_EMERGENCY_OFFSET;
}

void nbcon_emergency_exit(printk_state)
{
	WARN_ON_ONCE(!(current->printk_state & PRINTK_STATE_EMERGENCY_MASK))

	current->printk_state = printk_state;
}

enum nbcon_prio nbcon_get_default_prio(void)
{
	if (panic_cpu == raw_smp_processor_id()
		return NBCON_PANIC_PRIO;

	/* Does current always exist? What about fork? */
	if (current && (current->printk_state && PRINTK_STATE_EMERGENCY_MASK))
		return NBCON_PRIO_EMERGENCY;

	return NBCON_PRIO_NORMAL;
}

IMHO, it should be race free. And we could use it to initialize
struct nbcon_context. The final decision will be left for
the nbcon_ctxt_try_acquire().

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH printk v2 07/11] printk: nbcon: Wire up nbcon into console_flush_all()
  2023-09-19 23:08 ` [PATCH printk v2 07/11] printk: nbcon: Wire up nbcon into console_flush_all() John Ogness
@ 2023-09-26 11:34   ` Petr Mladek
  0 siblings, 0 replies; 44+ messages in thread
From: Petr Mladek @ 2023-09-26 11:34 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel

On Wed 2023-09-20 01:14:52, John Ogness wrote:
> In atomic printing sections, the atomic nbcon consoles have
> their own behavior of allowing all urgent messages to be
> stored into the ringbuffer and then flushing afterwards.
> 
> However, the nbcon consoles must also emit messages when not
> within atomic printing sections. For this, the existing
> console_flush_all() function can be used.

I am not sure if I get it correctly. My understanding
of the above two paragraphs is that urgent messages will be
buffered and flushed later while normal messages would
be flushed immediately. This is exactly the opposite
from what I would expect.

My expectations are that printk() would try to flush
also nbcon consoles during early boot when the kthreads
are not available or in the emergency and panic contexts.

Of course, it might fail when the console has another owner
and it has the same or higher priority or when it is in
an unsafe state.

> Provide nbcon_console_emit_next_record(), which acts as the
> nbcon variant of console_emit_next_record(). Call this variant
> within console_flush_all() for nbcon consoles.

  + nbcon_console_emit_next_record()
    + nbcon_atomic_emit_one()
      + nbcon_emit_next_record()
	+ con->write_atomic()

It interleaves API with and without "atomic" in the name.
Also I guess that nbcon_emit_next_record() will be used
in the printk thread in "non-atomic" mode. But I might
be wrong.

The more I think about it the more it looks that the term
"atomic" creates more harm (confusion) than good.

> Note that when in an atomic printing section,
> nbcon_console_emit_next_record() does nothing. This is because
> atomic printing sections will handle the nbcon flushing when
> exiting the outermost atomic printing section.
> 
> Signed-off-by: John Ogness <john.ogness@linutronix.de>
> ---
>  kernel/printk/internal.h |  2 ++
>  kernel/printk/nbcon.c    | 46 ++++++++++++++++++++++++++++++++++++++++
>  kernel/printk/printk.c   | 26 +++++++++++++++--------
>  3 files changed, 65 insertions(+), 9 deletions(-)
> 
> diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h
> index 6780911fa8f2..a3c6b5ce80e4 100644
> --- a/kernel/printk/internal.h
> +++ b/kernel/printk/internal.h
> @@ -77,6 +77,7 @@ void nbcon_seq_force(struct console *con, u64 seq);
>  bool nbcon_alloc(struct console *con);
>  void nbcon_init(struct console *con);
>  void nbcon_free(struct console *con);
> +bool nbcon_console_emit_next_record(struct console *con);
>  
>  /*
>   * Check if the given console is currently capable and allowed to print
> @@ -131,6 +132,7 @@ static inline void nbcon_seq_force(struct console *con, u64 seq) { }
>  static inline bool nbcon_alloc(struct console *con) { return false; }
>  static inline void nbcon_init(struct console *con) { }
>  static inline void nbcon_free(struct console *con) { }
> +static bool nbcon_console_emit_next_record(struct console *con) { return false; }
>  
>  static inline bool console_is_usable(struct console *con, short flags) { return false; }
>  
> diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c
> index 82e6a1678363..de473a1003d8 100644
> --- a/kernel/printk/nbcon.c
> +++ b/kernel/printk/nbcon.c
> @@ -1017,6 +1017,52 @@ static bool nbcon_atomic_emit_one(struct nbcon_write_context *wctxt)
>  	return prb_read_valid(prb, ctxt->seq, NULL);
>  }
>  
> +/**
> + * nbcon_console_emit_next_record - Print one record for an nbcon console
> + *					in atomic mode

I am completely lost now. The API does not have "atomic" in the
name but it is supposed to be called in the atomic conext whatever
it means.

> + * @con:	The console to print on
> + *
> + * Return:	True if a record could be printed, otherwise false.
> + *
> + * This function is meant to be called by console_flush_all() to atomically
> + * print records on nbcon consoles. Essentially it is the nbcon version of
> + * console_emit_next_record().
> + *
> + * This function also returns false if the current CPU is in an elevated
> + * atomic priority state in order to allow the CPU to get all of the
> + * emergency messages into the ringbuffer first.
> + */
> +bool nbcon_console_emit_next_record(struct console *con)
> +{
> +	struct nbcon_cpu_state *cpu_state;
> +	bool progress = false;
> +
> +	migrate_disable();
> +
> +	cpu_state = nbcon_get_cpu_state();
> +
> +	/*
> +	 * Atomic printing from console_flush_all() only occurs if this
> +	 * CPU is not in an elevated atomic priority state. If it is, the
> +	 * atomic printing will occur when this CPU exits that state. This
> +	 * allows a set of emergency messages to be completely stored in
> +	 * the ringbuffer before this CPU begins flushing.
> +	 */
> +	if (cpu_state->prio <= NBCON_PRIO_NORMAL) {

It does something only when the CPU in "default normal state" or
"explicitely forced normal state" .

Where the "explicitely forced normal" probably happens after
nbcon_atomic_enter(NBCON_PRIO_NORMAL). Do we want anyone call
this? I would expect that the explicit context with direct flush
would be used only for emergency and panic messages.

> +		struct nbcon_write_context wctxt = { };
> +		struct nbcon_context *ctxt = &ACCESS_PRIVATE(&wctxt, ctxt);
> +
> +		ctxt->console	= con;
> +		ctxt->prio	= NBCON_PRIO_NORMAL;
> +
> +		progress = nbcon_atomic_emit_one(&wctxt);
> +	}
> +
> +	migrate_enable();
> +
> +	return progress;
> +}


My proposal:

1. Add the API for entering emergency context and getting the context.
   I have already mentioned this in reply to 4rt patch. Here it is:

/* Allow to enter printk context with emergency prio */
#define PRINTK_STATE_EMERGENCY_MASK		0x000000ff
#define PRINTK_STATE_EMERGENCY_OFFSET		0x00000001

/* Allow unsafe takeover */
#define PRINTK_STATE_ALLOW_UNSAFE_TAKEOVER_BIT	0x00000100

void nbcon_emergency_enter(&printk_state)
{
	*printk_state = current->printk_state;

	WARN_ON_ONCE((*printk_state & PRINTK_STATE_EMERGENCY_MASK) == PRINTK_STATE_EMERGENCY_MASK);

	current->printk_state = *printk_state + PRINTK_STATE_EMERGENCY_OFFSET;
}

void nbcon_emergency_exit(printk_state)
{
	WARN_ON_ONCE(!(current->printk_state & PRINTK_STATE_EMERGENCY_MASK))

	current->printk_state = printk_state;
}

enum nbcon_prio nbcon_get_default_prio(void)
{
	if (panic_cpu == raw_smp_processor_id()
		return NBCON_PANIC_PRIO;

	/* Does current always exist? What about fork? */
	if (current && (current->printk_state && PRINTK_STATE_EMERGENCY_MASK))
		return NBCON_PRIO_EMERGENCY;

	return NBCON_PRIO_NORMAL;
}

/*
 * Extend this by adding a bit which would allow to enable the unsafe
 * takeover.
 */
bool nbcon_is_allowed_unsafe_takeover(void)
{
	bool allow_unsafe_takeover;

	allow_unsafe_takeover = (current->printk_state &
				  PRINTK_STATE_ALLOW_UNSAFE_TAKEOVER_BIT);

	if (allow_unsafe_takeover)
		if (WARN_ON_ONCE(raw_smp_processor_id() != panic_cpu))
			allow_unsafe_takeover - false;

	return allow_hostile_takeover;
}


2. Have a function for emitting a single line in the given write
   context and with already acquired console lock.

   Simply rename the existing:

   nbcon_emit_next_record() -> nbcon_wctxt_emit_next_record_acquired()
    or                      -> nbcon_wctxt_emit_next_record_locked()



3. Add the following APIs

/**
 * nbcon_wctxt_emit_next_record - try to emit next record using the
 *		given write context.
 *
 * @wctxt:			An initialized write context struct to use
 *				for this context
 *
 * Returns false if the given console could not print a record or there are
 * no more records to print, otherwise true.
 */
bool nbcon_wctxt_emit_next_record(struct nbcon_write_context *wctxt)
{
	struct nbcon_context *ctxt = &ACCESS_PRIVATE(&wctxt, ctxt);

	if (!nbcon_context_try_acquire(ctxt))
		return false;

	if (!nbcon_wctxt_emit_next_record_acquired(wctxt))
		return false;

	nbcon_context_release(ctxt);

	return prb_read_valid(prb, ctxt->seq, NULL);
}

/**
 * nbcon_emit_next_record_no_wait - try to emit next record on the given console
 *				  It will not wait when the console is locked.
 * @con:	The console to print on
 *
 * Return:	True if a record could be printed, otherwise false.
 */
bool nbcon_emit_next_record(struct console *con)
{
	struct nbcon_write_context wctxt = { };
	enum nbcon_prio;
	bool allow_unnsafe_takeover;

	prio = nbcon_get_default_prio();
	allow_unsafe_takeover = nbcon_allowed_unsafe_takeover();

	ctxt->console = con;
	ctxt->prio    = prio;

	return nbcon_wctxt_emit_next_record(wctxt);
}

/* Add to the existing console_flush_all() */
static bool console_flush_all(bool do_cond_resched, u64 *next_seq, bool *handover)
{
[...]
	do {
		any_progress = false;

		cookie = console_srcu_read_lock();
		for_each_console_srcu(con) {
			short flags = console_srcu_read_flags(con);
			u64 printk_seq;
			bool progress;

			if (!console_is_usable(con, flags))
				continue;
			any_usable = true;

+ 			if (flags & CON_NBCON) {
+ 				progress = nbcon_emit_next_record_no_wait(con);
+ 				printk_seq = nbcon_seq_read(con);
			} else {
				progress = console_emit_next_record(con, handover, cookie);
[...]
}

/*
 * Like console_flush_all() but flushing only nbcon consoles.
 *
 * It might be used in console_flush_on_panic()
 * before flushing also legacy consoles.
 */
void nbcon_flush_all(void)
{
	struct nbcon_write_context wctxt = { };
	struct nbcon_context *ctxt = &ACCESS_PRIVATE(&wctxt, ctxt);
	struct console *con;
	bool any_progress;
	enum nbcon_prio;
	int cookie;

	prio = nbcon_get_default_prio();
	allow_unsafe_takeover = nbcon_is_allowed_unsafe_takeover();

	do {
		any_progress = false;

		cookie = console_srcu_read_lock();
		for_each_console_srcu(con) {
			short flags = console_srcu_read_flags(con);
			bool progress;

			if (!(flags & CON_NBCON))
				continue;

			if (!console_is_usable(con, flags))
				continue;

			memset(ctxt, 0, sizeof(*ctxt));
			ctxt->console			= con;
			ctxt->spinwait_max_us		= 2000;
			ctxt->prio			= prio;
			ctxt->allow_unsafe_takeover	= allow_unsafe_takeover;

			progress = nbcon_wctxt_emit_next_record(&wctxt);
			if (!progress)
				continue;
			any_progress = true;
		}
		console_srcu_read_unlock(cookie);
	} while (any_progress);
}


void nbcon_flush_on_panic(void)
{
	if (WARN_ON_ONCE(raw_smp_processor_id() != panic_cpu))
		return;

	current->printk_state |= PRINTK_STATE_ALLOW_UNSAFE_TAKEOVER_BIT;

	nbcon_flush_all();

	/*
	 * Maybe we should not clear it. This should be called at
	 * the very end of panic. If we allowed an unsafe takeover once
	 * then we might also keep it allowed.
	 *
	 * Then the additional explicit flushes would not be needed.
	 * I mean the one after "Rebooting in %d seconds.." or
	 * after "---[ end Kernel panic - not syncing.."
	 *
	 * Also it would allow to see messages printed in
	 * emergency_restart().
	 *
	 * By other words. The unsafe flush either worked or not.
	 * It can't be worse when trying more unsafe takeovers.
	 */
	/* current->printk_state &= ~PRINTK_STATE_ALLOW_UNSAFE_TAKEOVER_BIT; */
}

IMHO, this would simplify the API and logic a lot.

Note that all the complexity with different priorities and unsafe
flushes is already handled by nbcon_context_try_acquire(). I do
not see any reason to add yet another logic above.

Also I would prefer to rename

  con->atomic_write()  -> con->nbcon_write()

and get rid of the "atomic" word completely. I know that it was
mentioned during the original session with Linus. But it is just
a name. The behavior is more important. And I think that the word
"atomic" adds quite some confusion and even complexity in this
implementation.

Does it make any sense, please?

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH printk v2 06/11] printk: nbcon: Wire up nbcon console atomic flushing
  2023-09-25 13:37     ` John Ogness
@ 2023-09-26 12:14       ` Petr Mladek
  2023-10-05 13:59         ` John Ogness
  0 siblings, 1 reply; 44+ messages in thread
From: Petr Mladek @ 2023-09-26 12:14 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel

On Mon 2023-09-25 15:43:03, John Ogness wrote:
> On 2023-09-22, Petr Mladek <pmladek@suse.com> wrote:
> >> console_flush_on_panic() - Called from several call sites to
> >> 	trigger ringbuffer dumping in an urgent situation.
> >> 
> >> console_flush_on_panic() - Typically the panic() function will
> >
> > This is a second description of console_flush_of_panic() which
> > looks like a mistake.
> 
> Oops. The first one should not be there.
> 
> >> 	take care of atomic flushing the nbcon consoles on
> >> 	panic. However, there are several users of
> >> 	console_flush_on_panic() outside of panic().
> >
> > The generic panic() seems to use console_flush_on_panic() correctly
> > at the very end.
> >
> > Hmm, I see that console_flush_on_panic() is called also in
> 
> [...]
> 
> > Anyway, we should make clear that console_flush_on_panic() might break
> > the system and should be called as the last attempt to flush consoles.
> > The above arch-specific users are worth review.
> 
> In an upcoming series you will see that console_flush_on_panic() only
> takes the console lock if there are legacy consoles. Ideally, eventually
> there will only be nbcon consoles, so your concern would disappear.

The legacy consoles have two risk levels:

  1. post->lock is ignored after bust_spinlocks()
  2. even console_lock is ignored in console_flush_on_panic()

The nbcon consoles have only one risk level:

  1. unsafe takeover is allowed

First, I thought that we wanted to allow the unsafe takeover in
console_flush_on_panic(). In that case, this function would
be dangerous even for nbcon consoles.

Now, I remember that we wanted to allow it only before entering
the infinite loop (blinking diodes). In this case,
console_flush_on_panic() would be really safe for nbcon consoles.


> And if those users continue to use legacy consoles, then the risks will
> be the same as now.
> 
> >>   * Return:	The previous priority that needs to be fed into
> >>   *		the corresponding nbcon_atomic_exit()
> >>   * Context:	Any context. Disables migration.
> >> + *
> >> + * When within an atomic printing section, no atomic printing occurs. This
> >> + * is to allow all emergency messages to be dumped into the ringbuffer before
> >> + * flushing the ringbuffer.
> >
> > The comment sounds like it is an advantage. But I think that it would be
> > a disadvantage.
> 
> Please explain. At LPC2022 we agreed it was an advantage and it is
> implemented on purpose using the atomic printing sections.

I am sorry but I do not remember the details. Do you remember
the motivation, please?

In each case, we can't just say that this works by design
because someone somewhere agreed on it. We must explain
why this is better and I do not see it at the moment.

I am terribly sorry if I agreed with this and I disagree now.
I have never been good in life discussion because there is
no enough time to think about all consequences.

Anyway, the proposed behavior (agreed on LPC2022) seems
to contradict the original plan from LPC 2019, see
https://lore.kernel.org/all/87k1acz5rx.fsf@linutronix.de/
Namely:

  --- cut ---
  3. Rather than defining emergency _messages_, we define an emergency
  _state_ where the kernel wants to flush the messages immediately before
  dying. Unlike oops_in_progress, this state will not be visible to
  anything outside of the printk infrastructure.

  4. When in emergency state, the kernel will use a new console callback
  write_atomic() to flush the messages in whatever context the CPU is in
  at that moment. Only consoles that implement the NMI-safe write_atomic()
  will be able to flush in this state.
  --- cut ---

We wanted to flush it ASAP.

I wonder if we discussed some limitations where the messages
could not be flushed immediately. Maybe, we discussed a scenario
when there are many pending messages which would delay
flushing the emergency ones. But we need to flush them anyway.

Now, I do not see any real advantage to first store all messages
and flush them later in the same context.

OK, flushign them immediately might cause delay when flushing the
first emergency one. But storing all might cause overwrinting
the first emergency messages.

I hope that my proposed change would actually make things easier
and will not affect that much the upcoming patchsets.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH printk v2 08/11] panic: Add atomic write enforcement to warn/panic
  2023-09-19 23:08 ` [PATCH printk v2 08/11] panic: Add atomic write enforcement to warn/panic John Ogness
@ 2023-09-27 12:02   ` Petr Mladek
  0 siblings, 0 replies; 44+ messages in thread
From: Petr Mladek @ 2023-09-27 12:02 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Kees Cook, Luis Chamberlain, Andrew Morton,
	Peter Zijlstra, Josh Poimboeuf, Andy Shevchenko,
	Guilherme G. Piccoli, Arnd Bergmann

On Wed 2023-09-20 01:14:53, John Ogness wrote:
> From: Thomas Gleixner <tglx@linutronix.de>
> 
> Invoke the atomic write enforcement functions for warn/panic to
> ensure that the information gets out to the consoles.
> 
> For the panic case, add explicit intermediate atomic flush
> calls to ensure immediate flushing at important points.
> Otherwise the atomic flushing only occurs when dropping out of
> the elevated priority, which for panic may never happen.

It would be great to avoid the need for the explicit flushes
except for the final unsafe one. Otherwise, we would play
another Whack-a-mole game. People would report that
panic() failed and they needed to add another explicit flush
to find the culprit...

> It is important to note that if there are any legacy consoles
> registered, they will be attempting to directly print from the
> printk-caller context, which may jeopardize the reliability of
> the atomic consoles. Optimally there should be no legacy
> consoles registered.

> --- a/kernel/panic.c
> +++ b/kernel/panic.c
> @@ -275,6 +275,7 @@ static void panic_other_cpus_shutdown(bool crash_kexec)
>   */
>  void panic(const char *fmt, ...)
>  {
> +	enum nbcon_prio prev_prio;
>  	static char buf[1024];
>  	va_list args;
>  	long i, i_next = 0, len;
> @@ -322,6 +323,8 @@ void panic(const char *fmt, ...)
>  	if (old_cpu != PANIC_CPU_INVALID && old_cpu != this_cpu)
>  		panic_smp_self_stop();
>  
> +	prev_prio = nbcon_atomic_enter(NBCON_PRIO_PANIC);
> +

It would make sense to flush nbcon consoles the safe way
at this point before we allow dangerous games for legacy
consoles via bust_spinlock().

>  	console_verbose();
>  	bust_spinlocks(1);
>  	va_start(args, fmt);
> @@ -382,6 +385,8 @@ void panic(const char *fmt, ...)
>  	if (_crash_kexec_post_notifiers)
>  		__crash_kexec(NULL);
>  
> +	nbcon_atomic_flush_all();

There should be one more safe flush after dump_stack() just
in case the kexec fails. The legacy consoles also tried
to flush the consoles in each called printk().

...

I do not want to review or comment each added or needed
nbcon_atomic_flush_all(). I hope that they won't be needed
in the end.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH printk v2 09/11] panic: Add atomic write enforcement to oops
  2023-09-19 23:08 ` [PATCH printk v2 09/11] panic: Add atomic write enforcement to oops John Ogness
  2023-09-19 23:36   ` John Ogness
  2023-09-20 13:28   ` Andy Shevchenko
@ 2023-09-27 13:15   ` Petr Mladek
  2 siblings, 0 replies; 44+ messages in thread
From: Petr Mladek @ 2023-09-27 13:15 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Kees Cook, Luis Chamberlain, Andrew Morton,
	Peter Zijlstra, Josh Poimboeuf, Arnd Bergmann,
	Guilherme G. Piccoli, Andy Shevchenko

On Wed 2023-09-20 01:14:54, John Ogness wrote:
> Invoke the atomic write enforcement functions for oops to
> ensure that the information gets out to the consoles.
> 
> Since there is no single general function that calls both
> oops_enter() and oops_exit(), the nesting feature of atomic
> write sections is taken advantage of in order to guarantee
> full coverage between the first oops_enter() and the last
> oops_exit().
> 
> It is important to note that if there are any legacy consoles
> registered, they will be attempting to directly print from the
> printk-caller context, which may jeopardize the reliability of
> the atomic consoles. Optimally there should be no legacy
> consoles registered.
> 
> --- a/kernel/panic.c
> +++ b/kernel/panic.c
> @@ -630,6 +634,36 @@ bool oops_may_print(void)
>   */
>  void oops_enter(void)
>  {
> +	enum nbcon_prio prev_prio;
> +	int cpu = -1;
> +
> +	/*
> +	 * If this turns out to be the first CPU in oops, this is the
> +	 * beginning of the outermost atomic section. Otherwise it is
> +	 * the beginning of an inner atomic section.
> +	 */

This sounds strange. What is the advantage of having the inner
atomic context, please? It covers only messages printed inside
oops_enter() and not the whole oops_enter()/exit(). Also see below.

> +	prev_prio = nbcon_atomic_enter(NBCON_PRIO_EMERGENCY);
> +
> +	if (atomic_try_cmpxchg_relaxed(&oops_cpu, &cpu, smp_processor_id())) {
> +		/*
> +		 * This is the first CPU in oops. Save the outermost
> +		 * @prev_prio in order to restore it on the outermost
> +		 * matching oops_exit(), when @oops_nesting == 0.
> +		 */
> +		oops_prev_prio = prev_prio;
> +
> +		/*
> +		 * Enter an inner atomic section that ends at the end of this
> +		 * function. In this case, the nbcon_atomic_enter() above
> +		 * began the outermost atomic section.
> +		 */
> +		prev_prio = nbcon_atomic_enter(NBCON_PRIO_EMERGENCY);
> +	}
> +
> +	/* Track nesting when this CPU is the owner. */
> +	if (cpu == -1 || cpu == smp_processor_id())
> +		oops_nesting++;
> +
>  	tracing_off();
>  	/* can't trust the integrity of the kernel anymore: */
>  	debug_locks_off();
> @@ -637,6 +671,9 @@ void oops_enter(void)
>  
>  	if (sysctl_oops_all_cpu_backtrace)
>  		trigger_all_cpu_backtrace();
> +
> +	/* Exit inner atomic section. */
> +	nbcon_atomic_exit(NBCON_PRIO_EMERGENCY, prev_prio);

This will not flush the messages when:

   + This CPU owns oops_cpu. The flush will have to wait for exiting
     the outer loop.

     In this case, the inner atomic context is not needed.


   + oops_cpu is owner by another CPU, the other CPU is
     just flushing the messages and block the per-console
     lock.

     The good thing is that the messages printed by this oops_enter()
     would likely get flushed by the other CPU.

     The bad thing is that oops_exit() on this CPU won't call
     nbcon_atomic_exit() so that the following OOPS messages
     from this CPU might need to wait for the printk kthread.
     IMHO, this is not what we want.


One solution would be to store prev_prio in per-CPU array
so that each CPU could call its own nbcon_atomic_exit().

But I start liking more and more the idea with storing
and counting nested emergency contexts in struct task_struct.
It is the alternative implementation in reply to the 7th patch,
https://lore.kernel.org/r/ZRLBxsXPCym2NC5Q@alley

Then it will be enough to simply call:

   + nbcon_emergency_enter() in oops_enter()
   + nbcon_emergency_exit() in oops_enter()

Best Regards,
Petr

PS: I just hope that you didn't add all this complexity just because
    we preferred this behavior at LPC 2022. Especially I hope
    that it was not me who proposed and preferred this.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH printk v2 10/11] rcu: Add atomic write enforcement for rcu stalls
  2023-09-19 23:08 ` [PATCH printk v2 10/11] rcu: Add atomic write enforcement for rcu stalls John Ogness
@ 2023-09-27 15:00   ` Petr Mladek
  0 siblings, 0 replies; 44+ messages in thread
From: Petr Mladek @ 2023-09-27 15:00 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Paul E. McKenney, Frederic Weisbecker,
	Neeraj Upadhyay, Joel Fernandes, Josh Triplett, Boqun Feng,
	Mathieu Desnoyers, Lai Jiangshan, Zqiang, rcu

On Wed 2023-09-20 01:14:55, John Ogness wrote:
> Invoke the atomic write enforcement functions for rcu stalls to
> ensure that the information gets out to the consoles.
> 
> It is important to note that if there are any legacy consoles
> registered, they will be attempting to directly print from the
> printk-caller context, which may jeopardize the reliability of
> the atomic consoles. Optimally there should be no legacy
> consoles registered.
> 
> Signed-off-by: John Ogness <john.ogness@linutronix.de>
> ---
>  kernel/rcu/tree_stall.h | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
> index 6f06dc12904a..0a58f8b233d8 100644
> --- a/kernel/rcu/tree_stall.h
> +++ b/kernel/rcu/tree_stall.h
> @@ -8,6 +8,7 @@
>   */
>  
>  #include <linux/kvm_para.h>
> +#include <linux/console.h>
>  
>  //////////////////////////////////////////////////////////////////////////////
>  //
> @@ -582,6 +583,7 @@ static void rcu_check_gp_kthread_expired_fqs_timer(void)
>  
>  static void print_other_cpu_stall(unsigned long gp_seq, unsigned long gps)
>  {
> +	enum nbcon_prio prev_prio;
>  	int cpu;
>  	unsigned long flags;
>  	unsigned long gpa;
> @@ -597,6 +599,8 @@ static void print_other_cpu_stall(unsigned long gp_seq, unsigned long gps)
>  	if (rcu_stall_is_suppressed())
>  		return;
>  
> +	prev_prio = nbcon_atomic_enter(NBCON_PRIO_EMERGENCY);
> +
>  	/*
>  	 * OK, time to rat on our buddy...
>  	 * See Documentation/RCU/stallwarn.rst for info on how to debug
> @@ -651,6 +655,8 @@ static void print_other_cpu_stall(unsigned long gp_seq, unsigned long gps)
>  	panic_on_rcu_stall();
>  
>  	rcu_force_quiescent_state();  /* Kick them all. */
> +
> +	nbcon_atomic_exit(NBCON_PRIO_EMERGENCY, prev_prio);

The locations looks reasonable to me. I just hope that we would
use another API: nbcon_emergency_enter()/exit() in the end.

Note that the new API it would allow to flush the messages in
the emergency context immediately from printk().

In that case, we would to handle nmi_trigger_cpumask_backtrace()
some special way.

This function would be called from the emergency context but
the nmi_cpu_backtrace() callbacks would be called on other
CPUs in normal context.

For this case I would add something like:

void nbcon_flush_all_emergency(void)
{
	emum nbcon_prio = nbcon_get_default_prio();

	if (nbcon_prio >= NBCON_PRIO_EMERGENCY)
		nbcon_flush_all();
}

, where the POC of nbcon_get_default_prio() and nbcon_flush_all()
was in the replay to the 7th patch, see
https://lore.kernel.org/all/ZRLBxsXPCym2NC5Q@alley/


Best Regards,
Petr

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH printk v2 11/11] lockdep: Add atomic write enforcement for lockdep splats
  2023-09-19 23:08 ` [PATCH printk v2 11/11] lockdep: Add atomic write enforcement for lockdep splats John Ogness
@ 2023-09-29  8:31   ` Petr Mladek
  0 siblings, 0 replies; 44+ messages in thread
From: Petr Mladek @ 2023-09-29  8:31 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Peter Zijlstra, Ingo Molnar, Will Deacon,
	Waiman Long, Boqun Feng

On Wed 2023-09-20 01:14:56, John Ogness wrote:
> Invoke the atomic write enforcement functions for lockdep
> splats to ensure that the information gets out to the consoles.
> 
> --- a/kernel/locking/lockdep.c
> +++ b/kernel/locking/lockdep.c
> @@ -3967,9 +3968,13 @@ static void
>  print_usage_bug(struct task_struct *curr, struct held_lock *this,
>  		enum lock_usage_bit prev_bit, enum lock_usage_bit new_bit)
>  {
> +	enum nbcon_prio prev_prio;
> +
>  	if (!debug_locks_off() || debug_locks_silent)
>  		return;
>  
> +	prev_prio = nbcon_atomic_enter(NBCON_PRIO_EMERGENCY);
> +
>  	pr_warn("\n");
>  	pr_warn("================================\n");
>  	pr_warn("WARNING: inconsistent lock state\n");
> @@ -3998,6 +4003,8 @@ print_usage_bug(struct task_struct *curr, struct held_lock *this,
>  
>  	pr_warn("\nstack backtrace:\n");
>  	dump_stack();
> +
> +	nbcon_atomic_exit(NBCON_PRIO_EMERGENCY, prev_prio);
>  }

The location of the emergency context looks good. I have just proposed
another way for tracking the emergency context. It would allow to
call nbcon_emergency_enter()/exit() without any parameter,
see https://lore.kernel.org/r/ZRLBxsXPCym2NC5Q@alley

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH printk v2 04/11] printk: nbcon: Provide functions to mark atomic write sections
  2023-09-25 16:04       ` Petr Mladek
@ 2023-10-05 12:51         ` John Ogness
  2023-10-06 12:51           ` panic context: was: " Petr Mladek
                             ` (2 more replies)
  0 siblings, 3 replies; 44+ messages in thread
From: John Ogness @ 2023-10-05 12:51 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman

On 2023-09-25, Petr Mladek <pmladek@suse.com> wrote:
> From my POV:
>
>    + It did not mention/explained "atomic write" at all

Agreed.

>    + It said that the urgent messages required immediate printing.
>      And Later, it said that they would get flushed later. Which
>      is contradicting each other.

I agree that the wording needs to be dramatically improved. The term
"urgent message" was not meant to represent a single printk() call.

>    + The handling of priorities is not only about CPU nesting.
>      The same rules should apply also when other CPU is printing
>      messages in a higher priority context.

Sorry, I do not understand what you mean here. Each CPU is responsible
for its own state. If another CPU is in a higher priority state, that
CPU will be responsible for ensuring its own WARN/OOPS is stored and
flushed. (From v2 I see that the CPU does not try hard enough. I would
fix that for v3.)

> My proposal:
>
>   + Use the term "higher priority context" for all WARN/OOPS/PANIC
>     messages.
>
>   + Use "emergency priority context" or "nbcon emergency context"
>     when talking about NBCON_PRIO_EMERGENCY context to avoid
>     confusion with the printk log levels.
>
>   + use "panic priority context or "nbcon panic context" for the panic
>     one if we really want to add enter/exit for the panic context.

There are 3 different types of priority:

1. The printk record priority: KERN_ERR, KERN_WARNING, etc.

2. The priority of a console owner: used for controlled takeovers.

3. The priority state of a CPU: only elevated for urgent messages, used
to store all urgent messages and then afterwards print directly by
taking ownership of consoles.

I need to choose terminology carefully to make it easy to distinguish
between these 3 types. v2 failed to name, describe, and document this
correctly.

>   + We must somewhere explain the "atomic context" and "atomic_write".
>     callback. One important question is why it is atomic. Is it because it
>
>       + _can_ be called in atomic context?
>       + _must_ be called in atomic context?

Its main reason for existing is because it can be called from atomic
(including NMI) contexts. But like atomic_t, it can be used from any
context.

>     It is called also from console_unlock() for boot messages
>     so it need not be in atomic context.
>
>     What about renaming it to "nbcon_write" to avoid this confusion?

When we introduce the threads, there will be a 2nd callback (currently
planned write_thread()). This callback is guaranteed to be called from a
printing kthread, which for console drivers like fbcon will prove to be
helpful in cleaning up its code.

I will reserve the word "atomic" _only_ when talking about which
printing callback is used. That should help to avoid associating the
callback with a certain context or priority. But I think the name
"write_atomic" is appropriate.

>> The "nesting" array is needed in order to support a feature that is not
>> explained in the commit message: If nested OOPS/WARN/PANIC occur, only
>> the outermost OOPS/WARN/PANIC will do the flushing. I will add this
>> information to the commit message.
>
> What is the motivation for the feature, please?

During the demo at LPC2022 we had the situation that there was a large
backlog when a WARN was hit. With current mainline the first line of the
WARN is put into the ringbuffer and then the entire backlog is flushed
before storing the rest of the WARN into the ringbuffer. At the time it
was obvious that we should finish storing the WARN message and then
start flushing the backlog.

> 1. Either we want to flush the messages in the higher priority context
>    ASAP.
>
>    The per-CPU lock has been designed exactly for this purpose. It
>    would not need any extra nesting counter. And it would work
>    consistently also when the lock is acquired on another CPU.
>
>    It is simple, the context will either get the per-console lock or
>    not. The (nested) context might get the lock only when it has higher
>    priority. The flush would be called directly from vprintk_emit().
>
>    I always thought that this was the planned behavior.

It was. But then it was suggested by Thomas and acked by Linus that we
were taking the wrong approach. Rather than a global state for all the
consoles, each console should have separate owners with states, allowing
handovers/takeovers. And CPUs should have urgency states to control how
the ringbuffer is stored and when direct printing should take place for
that CPU. This idea is similar to the cpu_sync approach, but with
timeouts, handovers/takeovers, and is per-console.

This approach gives us a lot of flexibility to enforce logical and safe
policies for storing and printing in different situations. And if
named/documented correctly, I think it will be easy to understand.

> 2. Or we want to wait until all messages in the higher priority context
>    are stored into the ring buffer and flush them by the caller.

Yes.

>    Who would actually flush the higher messages?
>    WARN() caller?

Yes.

>    The timer callback which detected softlockup?

Yes.

>    Or a completely different context?

Another CPU could do some helpful backlog printing if it sees new
messages and is able to become an owner. But it is not the CPU that is
responsible for making certain that the messages have been printed.

>    Who would flush panic() messages when panic() interrupted
>    locked CPU?

The panic CPU can takeover (as a last resort, unsafely if
necessary). The panic CPU is responsible for getting those messages out.

> My proposal:
>
> There are only two higher priority contexts:
>
>   + NBCON_PRIO_PANIC should be used when panic_cpu == raw_smp_processor_id()

This is the case with v2.

>   + NBCON_PRIO_EMERGENCY contex would require some enter/exit wrappers
>     and tracking. But it does not necessarily need to be per-CPU
>     variable.
<
> I think about adding "int printk_state" into struct task_struct.
> It might be useful also for other things, e.g. for storing the last
> log level of non-finished message. Entering section with enforced
> minimal loglevel or so.

printk() calls are quite rare. And warning states are even more
rare. IMHO adding such a field to every task is a huge price to
pay. Also, printk operates in non-task contexts (hardirq, nmi). Although
@current probably points to something existing, there could be some
tricky corner cases.

IMHO per-cpu urgency state variables and per-console owner state
variables allows a much simpler picture in all contexts.

My proposal:

You have provided excellent feedback regarding naming and
documentation. Allow me to fix these things to clarify the various
functions and their roles.

John

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH printk v2 06/11] printk: nbcon: Wire up nbcon console atomic flushing
  2023-09-26 12:14       ` Petr Mladek
@ 2023-10-05 13:59         ` John Ogness
  0 siblings, 0 replies; 44+ messages in thread
From: John Ogness @ 2023-10-05 13:59 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel

On 2023-09-26, Petr Mladek <pmladek@suse.com> wrote:
> Anyway, the proposed behavior (agreed on LPC2022) seems
> to contradict the original plan from LPC 2019, see
> https://lore.kernel.org/all/87k1acz5rx.fsf@linutronix.de/
> Namely:
>
>   --- cut ---
>   3. Rather than defining emergency _messages_, we define an emergency
>   _state_ where the kernel wants to flush the messages immediately before
>   dying. Unlike oops_in_progress, this state will not be visible to
>   anything outside of the printk infrastructure.
>
>   4. When in emergency state, the kernel will use a new console callback
>   write_atomic() to flush the messages in whatever context the CPU is in
>   at that moment. Only consoles that implement the NMI-safe write_atomic()
>   will be able to flush in this state.
>   --- cut ---
>
> We wanted to flush it ASAP.

In 2019 we didn't have all the code yet. Yes, we assumed that flushing
each individual printk() call would be how to do it. But in 2022, during
a live demo with real code, we saw that it was a bad idea.

I disagree that the 2022 decisions are contradicting the 2019
decisions. We have added more details because now we have a real
implementation.

v2 establishes an emergency state before dying. v2 flushes the messages
safely before dying (and unsafely _immediately_ before dying). v2
flushes the messages in whatever context the CPU is in. v2 uses
write_atomic() to perform the flush.

John

^ permalink raw reply	[flat|nested] 44+ messages in thread

* panic context: was: Re: [PATCH printk v2 04/11] printk: nbcon: Provide functions to mark atomic write sections
  2023-10-05 12:51         ` John Ogness
@ 2023-10-06 12:51           ` Petr Mladek
  2023-10-06 12:53           ` Petr Mladek
  2023-10-06 15:52           ` Petr Mladek
  2 siblings, 0 replies; 44+ messages in thread
From: Petr Mladek @ 2023-10-06 12:51 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman

Adding Linus into Cc. I would like to be sure about the flushing
of atomic consoles in panic context.

> During the demo at LPC2022 we had the situation that there was a large
> backlog when a WARN was hit. With current mainline the first line of the
> WARN is put into the ringbuffer and then the entire backlog is flushed
> before storing the rest of the WARN into the ringbuffer. At the time it
> was obvious that we should finish storing the WARN message and then
> start flushing the backlog.

This talks about the "emergency" context (WARN/OOPS/watchdog).
The system might be in big troubles but it would still try to continue.

Do we really want to defer the flush also for panic() context?

I ask because I was not on LPC 2022 in person and I do not remember
all details.

Anyway, the deferred flush works relatively well for the "emergency" context:

   + flushed from nbcon_atomic_exit()
   + printk kthread might emit the messages while they are being added

But it is tricky in panic(), see 8th patch at
https://lore.kernel.org/r/20230919230856.661435-9-john.ogness@linutronix.de

   + nbcon_atomic_exit() is called only in one code path.

   + nbcon_atomic_flush_all() is used in other paths. It looks like
     a "Whack a mole" game to me.

   + messages are never emitted by printk kthread either because
     CPUs are stopped or the kthread is not allowed to get the lock[*]

I see only one positive of the explicit flush. The consoles would
not delay crash_exec() and the crash dump might be closer to
the point where panic() was called.

Otherwise I see only negatives => IMHO, we want to flush atomic
consoles synchronously from printk() in panic().

Does anyone really want explicit flushes in panic()?

[*] Emitting messages is explicitly blocked on non-panic CPUs. It
    increases the change that panic-CPU would be able to take
    the console lock the safe way.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 44+ messages in thread

* panic context: was: Re: [PATCH printk v2 04/11] printk: nbcon: Provide functions to mark atomic write sections
  2023-10-05 12:51         ` John Ogness
  2023-10-06 12:51           ` panic context: was: " Petr Mladek
@ 2023-10-06 12:53           ` Petr Mladek
  2023-10-08 10:13             ` John Ogness
  2023-10-06 15:52           ` Petr Mladek
  2 siblings, 1 reply; 44+ messages in thread
From: Petr Mladek @ 2023-10-06 12:53 UTC (permalink / raw)
  To: John Ogness, Linus Torvalds
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman

(2nd attempt with with Linus really in Cc).

Adding Linus into Cc. I would like to be sure about the flushing
of atomic consoles in panic context.

> During the demo at LPC2022 we had the situation that there was a large
> backlog when a WARN was hit. With current mainline the first line of the
> WARN is put into the ringbuffer and then the entire backlog is flushed
> before storing the rest of the WARN into the ringbuffer. At the time it
> was obvious that we should finish storing the WARN message and then
> start flushing the backlog.

This talks about the "emergency" context (WARN/OOPS/watchdog).
The system might be in big troubles but it would still try to continue.

Do we really want to defer the flush also for panic() context?

I ask because I was not on LPC 2022 in person and I do not remember
all details.

Anyway, the deferred flush works relatively well for the "emergency" context:

   + flushed from nbcon_atomic_exit()
   + printk kthread might emit the messages while they are being added

But it is tricky in panic(), see 8th patch at
https://lore.kernel.org/r/20230919230856.661435-9-john.ogness@linutronix.de

   + nbcon_atomic_exit() is called only in one code path.

   + nbcon_atomic_flush_all() is used in other paths. It looks like
     a "Whack a mole" game to me.

   + messages are never emitted by printk kthread either because
     CPUs are stopped or the kthread is not allowed to get the lock[*]

I see only one positive of the explicit flush. The consoles would
not delay crash_exec() and the crash dump might be closer to
the point where panic() was called.

Otherwise I see only negatives => IMHO, we want to flush atomic
consoles synchronously from printk() in panic().

Does anyone really want explicit flushes in panic()?

[*] Emitting messages is explicitly blocked on non-panic CPUs. It
    increases the change that panic-CPU would be able to take
    the console lock the safe way.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH printk v2 04/11] printk: nbcon: Provide functions to mark atomic write sections
  2023-10-05 12:51         ` John Ogness
  2023-10-06 12:51           ` panic context: was: " Petr Mladek
  2023-10-06 12:53           ` Petr Mladek
@ 2023-10-06 15:52           ` Petr Mladek
  2 siblings, 0 replies; 44+ messages in thread
From: Petr Mladek @ 2023-10-06 15:52 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman

On Thu 2023-10-05 14:57:47, John Ogness wrote:
> On 2023-09-25, Petr Mladek <pmladek@suse.com> wrote:
> > From my POV:
> >
> >    + It did not mention/explained "atomic write" at all
> 
> Agreed.
> 
> >    + It said that the urgent messages required immediate printing.
> >      And Later, it said that they would get flushed later. Which
> >      is contradicting each other.
> 
> I agree that the wording needs to be dramatically improved. The term
> "urgent message" was not meant to represent a single printk() call.
> 
> >    + The handling of priorities is not only about CPU nesting.
> >      The same rules should apply also when other CPU is printing
> >      messages in a higher priority context.
> 
> Sorry, I do not understand what you mean here. Each CPU is responsible
> for its own state. If another CPU is in a higher priority state, that
> CPU will be responsible for ensuring its own WARN/OOPS is stored and
> flushed.

You are right that my comment was confusing. I had in mind that
flushing one emergency context would flush all pending messages
from other CPUs and even from other emergency context. Your
explanation is better.

> (From v2 I see that the CPU does not try hard enough. I would
> fix that for v3.)

Yes, this should be fixed.

> There are 3 different types of priority:
> 
> 1. The printk record priority: KERN_ERR, KERN_WARNING, etc.
> 
> 2. The priority of a console owner: used for controlled takeovers.
> 
> 3. The priority state of a CPU: only elevated for urgent messages, used
> to store all urgent messages and then afterwards print directly by
> taking ownership of consoles.

I know that the you want to distinguish 2. and 3. But I think that
they should be the same. I mean that nbcon_context_try_acquire()
should use the PRIO according to what context it is in.

Is there any situation where it should be different, please?

IMHO, it might simplify some logic when they are the same.

> I need to choose terminology carefully to make it easy to distinguish
> between these 3 types. v2 failed to name, describe, and document this
> correctly.

> >   + We must somewhere explain the "atomic context" and "atomic_write".
> >     callback. One important question is why it is atomic. Is it because it
> >
> >       + _can_ be called in atomic context?
> >       + _must_ be called in atomic context?
> 
> Its main reason for existing is because it can be called from atomic
> (including NMI) contexts. But like atomic_t, it can be used from any
> context.
> 
> >     It is called also from console_unlock() for boot messages
> >     so it need not be in atomic context.
> >
> >     What about renaming it to "nbcon_write" to avoid this confusion?
> 
> When we introduce the threads, there will be a 2nd callback (currently
> planned write_thread()). This callback is guaranteed to be called from a
> printing kthread, which for console drivers like fbcon will prove to be
> helpful in cleaning up its code.

I see.

> I will reserve the word "atomic" _only_ when talking about which
> printing callback is used. That should help to avoid associating the
> callback with a certain context or priority. But I think the name
> "write_atomic" is appropriate.

Sounds good.

> >> The "nesting" array is needed in order to support a feature that is not
> >> explained in the commit message: If nested OOPS/WARN/PANIC occur, only
> >> the outermost OOPS/WARN/PANIC will do the flushing. I will add this
> >> information to the commit message.
> >
> > What is the motivation for the feature, please?
> 
> During the demo at LPC2022 we had the situation that there was a large
> backlog when a WARN was hit. With current mainline the first line of the
> WARN is put into the ringbuffer and then the entire backlog is flushed
> before storing the rest of the WARN into the ringbuffer. At the time it
> was obvious that we should finish storing the WARN message and then
> start flushing the backlog.

This talks about contenxt using NBCON_PRIO_EMERGENCY. I am pretty sure
that we want to flush the messages synchronously from printk() in
panic(). Let's discuss this in the other thread with Linus in Cc [1].

> > My proposal:
> >
> > There are only two higher priority contexts:
> >
> >   + NBCON_PRIO_PANIC should be used when panic_cpu == raw_smp_processor_id()
> 
> This is the case with v2.

Ah, I see, nbcon_atomic_enter(NBCON_PRIO_PANIC) is called right after
setting panic_cpu. But this is actually another reason to get rid
of the nbcon_atomic_enter() call why?

  + The information about entering context with NBCON_PRIO_PANIC is
    already provided by panic_cpu variable.

  + Nesting is not possible because only one one context is allowed
    to acquire panic_cpu.

  + nbcon_atomic_exit() is almost useless because it is called only
    in one code path. Explicit nbcon_atomic_flush_all() calls
    are needed in other code paths.

IMHO, getting rid of nbcon_atomic_enter(NBCON_PRIO_PANIC) would have
several advantages:

  + enter()/exit() API would be needed only for NBCON_PRIO_EMERGENCY.
    We could call it nbcon_emergency_enter()/exit() and avoid
    the confusing atomic_enter name.

  + Nesting is important only for NBCON_PRIO_EMERGENCY context =>
    the per-CPU variable might be a simple refecence counter =>
    easier code and no need to remember the previous value
    and pass it to _exit() as a parameter.

> >   + NBCON_PRIO_EMERGENCY contex would require some enter/exit wrappers
> >     and tracking. But it does not necessarily need to be per-CPU
> >     variable.
> <
> > I think about adding "int printk_state" into struct task_struct.
> > It might be useful also for other things, e.g. for storing the last
> > log level of non-finished message. Entering section with enforced
> > minimal loglevel or so.
> 
> printk() calls are quite rare. And warning states are even more
> rare. IMHO adding such a field to every task is a huge price to
> pay. Also, printk operates in non-task contexts (hardirq, nmi). Although
> @current probably points to something existing, there could be some
> tricky corner cases.

Fair enough. Per-CPU variables are fine after all.

> My proposal:
> 
> You have provided excellent feedback regarding naming and
> documentation. Allow me to fix these things to clarify the various
> functions and their roles.

We should first agree on the approach in panic() in the other thread [1].

IMHO, the context using NBCON_PRIO_EMERGENCY should be the only one
where we need enter/exit and the deferred flush. In this case,
the per-CPU variable would be a simple nesting counter. And maybe
the explicit flush would be needed only in emergency context.
=> easier code, logic, naming.

[1] https://lore.kernel.org/r/ZSADUKp8oJ2Ws2vC@alley

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: panic context: was: Re: [PATCH printk v2 04/11] printk: nbcon: Provide functions to mark atomic write sections
  2023-10-06 12:53           ` Petr Mladek
@ 2023-10-08 10:13             ` John Ogness
  2023-10-09 15:32               ` Petr Mladek
  2023-10-16  8:58                 ` Dave Young
  0 siblings, 2 replies; 44+ messages in thread
From: John Ogness @ 2023-10-08 10:13 UTC (permalink / raw)
  To: Petr Mladek, Linus Torvalds
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman

Hi Petr,

On 2023-10-06, Petr Mladek <pmladek@suse.com> wrote:
>> During the demo at LPC2022 we had the situation that there was a large
>> backlog when a WARN was hit. With current mainline the first line of the
>> WARN is put into the ringbuffer and then the entire backlog is flushed
>> before storing the rest of the WARN into the ringbuffer. At the time it
>> was obvious that we should finish storing the WARN message and then
>> start flushing the backlog.
>
> This talks about the "emergency" context (WARN/OOPS/watchdog).
> The system might be in big troubles but it would still try to continue.
>
> Do we really want to defer the flush also for panic() context?

We can start flushing right after the backtrace is in the
ringbuffer. But flushing the backlog _before_ putting the backtrace into
the ringbuffer was not desired because if there is a large backlog, the
machine may not survive to get the backtrace out. And in that case it
won't even be in the ringbuffer to be used by other debugging
tools.

> I ask because I was not on LPC 2022 in person and I do not remember
> all details.

The LPC2022 demo/talk was recorded:

https://www.youtube.com/watch?v=TVhNcKQvzxI

At 55:55 is where the situation occurred and triggered the conversation,
ultimately leading to this new feature.

You may also want to reread my summary:

https://lore.kernel.org/lkml/875yheqh6v.fsf@jogness.linutronix.de

as well as Linus' follow-up message:

https://lore.kernel.org/lkml/CAHk-=wieXPMGEm7E=Sz2utzZdW1d=9hJBwGYAaAipxnMXr0Hvg@mail.gmail.com

> But it is tricky in panic(), see 8th patch at
> https://lore.kernel.org/r/20230919230856.661435-9-john.ogness@linutronix.de
>
>    + nbcon_atomic_exit() is called only in one code path.

Correct. When panic() is complete and the machine goes into its infinite
loop. This is also the point where it will attempt an unsafe flush, if
it could not get the messages out yet.

>    + nbcon_atomic_flush_all() is used in other paths. It looks like
>      a "Whack a mole" game to me.

Several different outputs occur during panic(). The flush is everywhere
where something significant has been put into the ringbuffer and now it
would be OK to flush it.

>    + messages are never emitted by printk kthread either because
>      CPUs are stopped or the kthread is not allowed to get the lock

Correct.

> I see only one positive of the explicit flush. The consoles would
> not delay crash_exec() and the crash dump might be closer to
> the point where panic() was called.

It's only about getting the critical messages into the ringbuffer before
flushing. And since various things can go wrong during the many actions
within panic(), it makes sense to flush in between those actions.

> Otherwise I see only negatives => IMHO, we want to flush atomic
> consoles synchronously from printk() in panic().
>
> Does anyone really want explicit flushes in panic()?

So far you are the only one speaking against it. I expect as time goes
on it will get even more complex as it becomes tunable (also something
we talked about during the demo).

John

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: panic context: was: Re: [PATCH printk v2 04/11] printk: nbcon: Provide functions to mark atomic write sections
  2023-10-08 10:13             ` John Ogness
@ 2023-10-09 15:32               ` Petr Mladek
  2023-10-10 16:02                 ` John Ogness
  2023-10-16  8:58                 ` Dave Young
  1 sibling, 1 reply; 44+ messages in thread
From: Petr Mladek @ 2023-10-09 15:32 UTC (permalink / raw)
  To: John Ogness
  Cc: Linus Torvalds, Sergey Senozhatsky, Steven Rostedt,
	Thomas Gleixner, linux-kernel, Greg Kroah-Hartman

On Sun 2023-10-08 12:19:21, John Ogness wrote:
> Hi Petr,
> 
> On 2023-10-06, Petr Mladek <pmladek@suse.com> wrote:
> >> During the demo at LPC2022 we had the situation that there was a large
> >> backlog when a WARN was hit. With current mainline the first line of the
> >> WARN is put into the ringbuffer and then the entire backlog is flushed
> >> before storing the rest of the WARN into the ringbuffer. At the time it
> >> was obvious that we should finish storing the WARN message and then
> >> start flushing the backlog.
> >
> > This talks about the "emergency" context (WARN/OOPS/watchdog).
> > The system might be in big troubles but it would still try to continue.
> >
> > Do we really want to defer the flush also for panic() context?
> 
> We can start flushing right after the backtrace is in the
> ringbuffer. But flushing the backlog _before_ putting the backtrace into
> the ringbuffer was not desired because if there is a large backlog, the
> machine may not survive to get the backtrace out. And in that case it
> won't even be in the ringbuffer to be used by other debugging
> tools.

We really have to distinguish emergency and panic context!

> > I ask because I was not on LPC 2022 in person and I do not remember
> > all details.
> 
> The LPC2022 demo/talk was recorded:
> 
> https://www.youtube.com/watch?v=TVhNcKQvzxI
> 
> At 55:55 is where the situation occurred and triggered the conversation,
> ultimately leading to this new feature.

Thanks for the link. My understanding is that the following scenario
has happened:

1. System was booting and messages were being flushed using the kthread.

2. WARN() happened. It printed the 1st line, took over the per-console
   console lock and started flushing the backlog. There were still
   many pending messages from the boot.

3. NMI came and caused panic(). The panic context printed its first line,
   took over the console from the WARN context, flushed the rest
   of the backlog and continued printing/flushing more messages from
   the panic() context.


Problem:

People complained that they saw only the first line from WARN().
The related detailed info, including backtrace, was missing.

It was because panic() interrupted WARN() before it was able
to flush the backlog and print/flush all WARN() messages.


Proposed solution:

WARN()/emergency context should first store the messages and
flush them at the end.

It would increase the chance that all WARN() messages will
be stored in the ring buffer before NMI/panic() is called.

panic() would then flush all pending messages including
the stored WARN() messages.


Important:

The problem is that panic() interrupted WARN().

There is _no_ problem with interrupting panic().
Let me repeat, nested panic() is not possible!


> You may also want to reread my summary:
>
> https://lore.kernel.org/lkml/875yheqh6v.fsf@jogness.linutronix.de

Again, thanks for the pointer. Let me paste 2 paragraphs here:

<paste>
- Printing the backlog is important! If some emergency situation occurs,
  make sure the backlog gets printed.

- When an emergency occurs, put the full backtrace into the ringbuffer
  before flushing any backlog. This ensures that the backtrace makes it
  into the ringbuffer in case a panic occurs while flushing the backlog.
</paste>

My understanding is:

1st paragraph is the reason why:

   + we have three priorities: normal, emergency, panic

   + messages in normal context might be offloaded to kthread

   + emergency and panic context should try to flush the messages
     from this context.


2nd paragraph talks about that emergency context should first store
the messages and flush them later. And the important part:

     "in case a panic occurs while flushing the backlog.

     => panic might interrupt emergency

It clearly distinguishes emergency and panic context.


> as well as Linus' follow-up message:
> 
> https://lore.kernel.org/lkml/CAHk-=wieXPMGEm7E=Sz2utzZdW1d=9hJBwGYAaAipxnMXr0Hvg@mail.gmail.com

IMHO, the important part is:

<paste>
Yeah, I really liked the notion of doing the oops with just filling
the back buffer but still getting it printed out if something goes
wrong in the middle.
</paste>

He was talking about oops => emergency context

Also he wanted to get it out when something goes wrong in the middle
   => panic in the middle ?


And another paragraph:

<paste>
I doubt it ends up being an issue in practice, but basically I wanted
to just pipe up and say that the exact details of how much of the back
buffer needs to be flushed first _could_ be tweaked if it ever does
come up as an issue.
</paste>

Linus had doubts that there might be problems with too long backlog
in practice. And I have the doubts as well.

And this is my view. The deferred flush is trying to solve a corner
case and we are forgetting what blocked printk kthreads >10 years.

> > But it is tricky in panic(), see 8th patch at
> > https://lore.kernel.org/r/20230919230856.661435-9-john.ogness@linutronix.de
> >
> >    + nbcon_atomic_exit() is called only in one code path.
> 
> Correct. When panic() is complete and the machine goes into its infinite
> loop. This is also the point where it will attempt an unsafe flush, if
> it could not get the messages out yet.
> 
> >    + nbcon_atomic_flush_all() is used in other paths. It looks like
> >      a "Whack a mole" game to me.
> 
> Several different outputs occur during panic(). The flush is everywhere
> where something significant has been put into the ringbuffer and now it
> would be OK to flush it.

No, we could _never_ say that the flush is everywhere where something
important happens.

panic() might fail anywhere. The last printed message might be
an important clue. And it can be any message.


> >    + messages are never emitted by printk kthread either because
> >      CPUs are stopped or the kthread is not allowed to get the lock
> 
> Correct.
> 
> > I see only one positive of the explicit flush. The consoles would
> > not delay crash_exec() and the crash dump might be closer to
> > the point where panic() was called.
> 
> It's only about getting the critical messages into the ringbuffer before
> flushing. And since various things can go wrong during the many actions
> within panic(), it makes sense to flush in between those actions.

I am glad that we agree that "various things can go wrong during panic".

Are we sure that the patchset added the explicit flush right after
each possible problematic place? What about crash_exec(),
various kmsg dumpers, notifiers?


> > Otherwise I see only negatives => IMHO, we want to flush atomic
> > consoles synchronously from printk() in panic().
> >
> > Does anyone really want explicit flushes in panic()?
> 
> So far you are the only one speaking against it. I expect as time goes
> on it will get even more complex as it becomes tunable (also something
> we talked about during the demo).

From my POV:

1. We are just two people discussing it at the moment
      => word against word.

2. Please, try to read my reply again. I agreed with delaying the
   flush in emergency context.

   But I am strongly against explicit flushes during panic at
   least in the initial implementation.


3. IMHO, there might be a lot of misunderstanding caused by
   the word "emergency" context. Some people might see
   it as OOPs/WARN/panic and other might want to handle
   panic special way.

   I see it:

      + emergency - huge danger, medical check needed, patient might
	      survive

      + panic - patient is about to die, try to get some secrets from
	      him before he dies.

   Any sentence of the secret might be important. It would be pity
   to ignore his last A4 page just because it was not complete.


Sigh, I did my best to explain why the nesting problem is only in
the emergency context and why panic is different.

Feel free to ask if anything is still unclear.


Anyway, I am _not_ going to accept the explicit flushes in panic()
unless you show me a problem with interrupted panic() or
Linus explicitly says that the explicit flushes make sense.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: panic context: was: Re: [PATCH printk v2 04/11] printk: nbcon: Provide functions to mark atomic write sections
  2023-10-09 15:32               ` Petr Mladek
@ 2023-10-10 16:02                 ` John Ogness
  0 siblings, 0 replies; 44+ messages in thread
From: John Ogness @ 2023-10-10 16:02 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Linus Torvalds, Sergey Senozhatsky, Steven Rostedt,
	Thomas Gleixner, linux-kernel, Greg Kroah-Hartman

On 2023-10-09, Petr Mladek <pmladek@suse.com> wrote:
> We really have to distinguish emergency and panic context!

OK.

>> The LPC2022 demo/talk was recorded:
>> 
>> https://www.youtube.com/watch?v=TVhNcKQvzxI
>> 
>> At 55:55 is where the situation occurred and triggered the conversation,
>> ultimately leading to this new feature.
>
> Thanks for the link. My understanding is that the following scenario
> has happened:
>
> 1. System was booting and messages were being flushed using the kthread.
>
> 2. WARN() happened. It printed the 1st line, took over the per-console
>    console lock and started flushing the backlog. There were still
>    many pending messages from the boot.
>
> 3. NMI came and caused panic(). The panic context printed its first line,
>    took over the console from the WARN context, flushed the rest
>    of the backlog and continued printing/flushing more messages from
>    the panic() context.
>
>
> Problem:
>
> People complained that they saw only the first line from WARN().
> The related detailed info, including backtrace, was missing.
>
> It was because panic() interrupted WARN() before it was able
> to flush the backlog and print/flush all WARN() messages.

Thanks for taking the time to review it in detail.

> Proposed solution:
>
> WARN()/emergency context should first store the messages and
> flush them at the end.
>
> It would increase the chance that all WARN() messages will
> be stored in the ring buffer before NMI/panic() is called.
>
> panic() would then flush all pending messages including
> the stored WARN() messages.

OK.

> Important:
>
> The problem is that panic() interrupted WARN().

Ack.

>> You may also want to reread my summary:
>>
>> https://lore.kernel.org/lkml/875yheqh6v.fsf@jogness.linutronix.de
>
> Again, thanks for the pointer. Let me paste 2 paragraphs here:
>
> <paste>
> - Printing the backlog is important! If some emergency situation occurs,
>   make sure the backlog gets printed.
>
> - When an emergency occurs, put the full backtrace into the ringbuffer
>   before flushing any backlog. This ensures that the backtrace makes it
>   into the ringbuffer in case a panic occurs while flushing the backlog.
> </paste>
>
> My understanding is:
>
> 1st paragraph is the reason why:
>
>    + we have three priorities: normal, emergency, panic
>
>    + messages in normal context might be offloaded to kthread
>
>    + emergency and panic context should try to flush the messages
>      from this context.
>
>
> 2nd paragraph talks about that emergency context should first store
> the messages and flush them later. And the important part:
>
>      "in case a panic occurs while flushing the backlog.
>
>      => panic might interrupt emergency
>
> It clearly distinguishes emergency and panic context.
>
>
>> as well as Linus' follow-up message:
>> 
>> https://lore.kernel.org/lkml/CAHk-=wieXPMGEm7E=Sz2utzZdW1d=9hJBwGYAaAipxnMXr0Hvg@mail.gmail.com
>
> IMHO, the important part is:
>
> <paste>
> Yeah, I really liked the notion of doing the oops with just filling
> the back buffer but still getting it printed out if something goes
> wrong in the middle.
> </paste>
>
> He was talking about oops => emergency context
>
> Also he wanted to get it out when something goes wrong in the middle
>    => panic in the middle ?
>
>
> And another paragraph:
>
> <paste>
> I doubt it ends up being an issue in practice, but basically I wanted
> to just pipe up and say that the exact details of how much of the back
> buffer needs to be flushed first _could_ be tweaked if it ever does
> come up as an issue.
> </paste>
>
> Linus had doubts that there might be problems with too long backlog
> in practice. And I have the doubts as well.
>
> And this is my view. The deferred flush is trying to solve a corner
> case and we are forgetting what blocked printk kthreads >10 years.

OK. Thank you for the detailed analysis.

For v3 I will do something similar to what you proposed [0], except that
I will use a per-cpu variable (to track printk emergency nesting)
instead of adding a new field to the task struct.

John Ogness

[0] https://lore.kernel.org/lkml/ZRLBxsXPCym2NC5Q@alley/

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: panic context: was: Re: [PATCH printk v2 04/11] printk: nbcon: Provide functions to mark atomic write sections
  2023-10-08 10:13             ` John Ogness
@ 2023-10-16  8:58                 ` Dave Young
  2023-10-16  8:58                 ` Dave Young
  1 sibling, 0 replies; 44+ messages in thread
From: Dave Young @ 2023-10-16  8:58 UTC (permalink / raw)
  To: John Ogness
  Cc: Petr Mladek, Linus Torvalds, Sergey Senozhatsky, Steven Rostedt,
	Thomas Gleixner, linux-kernel, Greg Kroah-Hartman, kexec, bhe,
	prudo, ebiederm, vgoyal

[Added more people in cc]

On 10/08/23 at 12:19pm, John Ogness wrote:
> Hi Petr,
> 
> On 2023-10-06, Petr Mladek <pmladek@suse.com> wrote:
> >> During the demo at LPC2022 we had the situation that there was a large
> >> backlog when a WARN was hit. With current mainline the first line of the
> >> WARN is put into the ringbuffer and then the entire backlog is flushed
> >> before storing the rest of the WARN into the ringbuffer. At the time it
> >> was obvious that we should finish storing the WARN message and then
> >> start flushing the backlog.
> >
> > This talks about the "emergency" context (WARN/OOPS/watchdog).
> > The system might be in big troubles but it would still try to continue.
> >
> > Do we really want to defer the flush also for panic() context?
> 
> We can start flushing right after the backtrace is in the
> ringbuffer. But flushing the backlog _before_ putting the backtrace into
> the ringbuffer was not desired because if there is a large backlog, the
> machine may not survive to get the backtrace out. And in that case it
> won't even be in the ringbuffer to be used by other debugging
> tools.
> 
> > I ask because I was not on LPC 2022 in person and I do not remember
> > all details.
> 
> The LPC2022 demo/talk was recorded:
> 
> https://www.youtube.com/watch?v=TVhNcKQvzxI
> 
> At 55:55 is where the situation occurred and triggered the conversation,
> ultimately leading to this new feature.
> 
> You may also want to reread my summary:
> 
> https://lore.kernel.org/lkml/875yheqh6v.fsf@jogness.linutronix.de
> 
> as well as Linus' follow-up message:
> 
> https://lore.kernel.org/lkml/CAHk-=wieXPMGEm7E=Sz2utzZdW1d=9hJBwGYAaAipxnMXr0Hvg@mail.gmail.com
> 
> > But it is tricky in panic(), see 8th patch at
> > https://lore.kernel.org/r/20230919230856.661435-9-john.ogness@linutronix.de
> >
> >    + nbcon_atomic_exit() is called only in one code path.
> 
> Correct. When panic() is complete and the machine goes into its infinite
> loop. This is also the point where it will attempt an unsafe flush, if
> it could not get the messages out yet.
> 
> >    + nbcon_atomic_flush_all() is used in other paths. It looks like
> >      a "Whack a mole" game to me.
> 
> Several different outputs occur during panic(). The flush is everywhere
> where something significant has been put into the ringbuffer and now it
> would be OK to flush it.
> 
> >    + messages are never emitted by printk kthread either because
> >      CPUs are stopped or the kthread is not allowed to get the lock
> 
> Correct.
> 
> > I see only one positive of the explicit flush. The consoles would
> > not delay crash_exec() and the crash dump might be closer to
> > the point where panic() was called.
> 
> It's only about getting the critical messages into the ringbuffer before
> flushing. And since various things can go wrong during the many actions
> within panic(), it makes sense to flush in between those actions.
> 
> > Otherwise I see only negatives => IMHO, we want to flush atomic
> > consoles synchronously from printk() in panic().
> >
> > Does anyone really want explicit flushes in panic()?
> 
> So far you are the only one speaking against it. I expect as time goes
> on it will get even more complex as it becomes tunable (also something
> we talked about during the demo).

Flush consoles in panic kexec case sounds not good, but I have no
deep understanding about the atomic printk series, added kexec list and
reviewers in cc.

> 
> John
> 

Thanks
Dave


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: panic context: was: Re: [PATCH printk v2 04/11] printk: nbcon: Provide functions to mark atomic write sections
@ 2023-10-16  8:58                 ` Dave Young
  0 siblings, 0 replies; 44+ messages in thread
From: Dave Young @ 2023-10-16  8:58 UTC (permalink / raw)
  To: John Ogness
  Cc: Petr Mladek, Linus Torvalds, Sergey Senozhatsky, Steven Rostedt,
	Thomas Gleixner, linux-kernel, Greg Kroah-Hartman, kexec, bhe,
	prudo, ebiederm, vgoyal

[Added more people in cc]

On 10/08/23 at 12:19pm, John Ogness wrote:
> Hi Petr,
> 
> On 2023-10-06, Petr Mladek <pmladek@suse.com> wrote:
> >> During the demo at LPC2022 we had the situation that there was a large
> >> backlog when a WARN was hit. With current mainline the first line of the
> >> WARN is put into the ringbuffer and then the entire backlog is flushed
> >> before storing the rest of the WARN into the ringbuffer. At the time it
> >> was obvious that we should finish storing the WARN message and then
> >> start flushing the backlog.
> >
> > This talks about the "emergency" context (WARN/OOPS/watchdog).
> > The system might be in big troubles but it would still try to continue.
> >
> > Do we really want to defer the flush also for panic() context?
> 
> We can start flushing right after the backtrace is in the
> ringbuffer. But flushing the backlog _before_ putting the backtrace into
> the ringbuffer was not desired because if there is a large backlog, the
> machine may not survive to get the backtrace out. And in that case it
> won't even be in the ringbuffer to be used by other debugging
> tools.
> 
> > I ask because I was not on LPC 2022 in person and I do not remember
> > all details.
> 
> The LPC2022 demo/talk was recorded:
> 
> https://www.youtube.com/watch?v=TVhNcKQvzxI
> 
> At 55:55 is where the situation occurred and triggered the conversation,
> ultimately leading to this new feature.
> 
> You may also want to reread my summary:
> 
> https://lore.kernel.org/lkml/875yheqh6v.fsf@jogness.linutronix.de
> 
> as well as Linus' follow-up message:
> 
> https://lore.kernel.org/lkml/CAHk-=wieXPMGEm7E=Sz2utzZdW1d=9hJBwGYAaAipxnMXr0Hvg@mail.gmail.com
> 
> > But it is tricky in panic(), see 8th patch at
> > https://lore.kernel.org/r/20230919230856.661435-9-john.ogness@linutronix.de
> >
> >    + nbcon_atomic_exit() is called only in one code path.
> 
> Correct. When panic() is complete and the machine goes into its infinite
> loop. This is also the point where it will attempt an unsafe flush, if
> it could not get the messages out yet.
> 
> >    + nbcon_atomic_flush_all() is used in other paths. It looks like
> >      a "Whack a mole" game to me.
> 
> Several different outputs occur during panic(). The flush is everywhere
> where something significant has been put into the ringbuffer and now it
> would be OK to flush it.
> 
> >    + messages are never emitted by printk kthread either because
> >      CPUs are stopped or the kthread is not allowed to get the lock
> 
> Correct.
> 
> > I see only one positive of the explicit flush. The consoles would
> > not delay crash_exec() and the crash dump might be closer to
> > the point where panic() was called.
> 
> It's only about getting the critical messages into the ringbuffer before
> flushing. And since various things can go wrong during the many actions
> within panic(), it makes sense to flush in between those actions.
> 
> > Otherwise I see only negatives => IMHO, we want to flush atomic
> > consoles synchronously from printk() in panic().
> >
> > Does anyone really want explicit flushes in panic()?
> 
> So far you are the only one speaking against it. I expect as time goes
> on it will get even more complex as it becomes tunable (also something
> we talked about during the demo).

Flush consoles in panic kexec case sounds not good, but I have no
deep understanding about the atomic printk series, added kexec list and
reviewers in cc.

> 
> John
> 

Thanks
Dave


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: panic context: was: Re: [PATCH printk v2 04/11] printk: nbcon: Provide functions to mark atomic write sections
  2023-10-16  8:58                 ` Dave Young
@ 2023-10-16 10:09                   ` John Ogness
  -1 siblings, 0 replies; 44+ messages in thread
From: John Ogness @ 2023-10-16 10:09 UTC (permalink / raw)
  To: Dave Young
  Cc: Petr Mladek, Linus Torvalds, Sergey Senozhatsky, Steven Rostedt,
	Thomas Gleixner, linux-kernel, Greg Kroah-Hartman, kexec, bhe,
	prudo, ebiederm, vgoyal

Hi Dave,

On 2023-10-16, Dave Young <dyoung@redhat.com> wrote:
>> > Does anyone really want explicit flushes in panic()?
>> 
>> So far you are the only one speaking against it. I expect as time
>> goes on it will get even more complex as it becomes tunable (also
>> something we talked about during the demo).
>
> Flush consoles in panic kexec case sounds not good, but I have no deep
> understanding about the atomic printk series, added kexec list and
> reviewers in cc.

Currently every printk() message tries to flush immediately.

This series introduced a new method of first allowing a set of printk()
messages to be stored to the ringbuffer and then flushing the full
set. That is what this discussion was about.

The issue with allowing a set of printk() messages to be stored is that
you need to explicitly mark in code where the actual flushing should
occur. Petr's argument is that we do not want to insert "flush points"
into the panic() function and instead we should do as we do now: flush
each printk() message immediately.

In the end (for my upcoming v3 series) I agreed with Petr. We will
continue to keep things as they are now: flush each printk() message
immediately.

Currently consoles try to flush unsafely before kexec. With the atomic
printk series our goal is to only perform _safe_ flushing until all
other panic operations are complete. Only at the very end of panic()
would unsafe flushing be attempted.

John

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: panic context: was: Re: [PATCH printk v2 04/11] printk: nbcon: Provide functions to mark atomic write sections
@ 2023-10-16 10:09                   ` John Ogness
  0 siblings, 0 replies; 44+ messages in thread
From: John Ogness @ 2023-10-16 10:09 UTC (permalink / raw)
  To: Dave Young
  Cc: Petr Mladek, Linus Torvalds, Sergey Senozhatsky, Steven Rostedt,
	Thomas Gleixner, linux-kernel, Greg Kroah-Hartman, kexec, bhe,
	prudo, ebiederm, vgoyal

Hi Dave,

On 2023-10-16, Dave Young <dyoung@redhat.com> wrote:
>> > Does anyone really want explicit flushes in panic()?
>> 
>> So far you are the only one speaking against it. I expect as time
>> goes on it will get even more complex as it becomes tunable (also
>> something we talked about during the demo).
>
> Flush consoles in panic kexec case sounds not good, but I have no deep
> understanding about the atomic printk series, added kexec list and
> reviewers in cc.

Currently every printk() message tries to flush immediately.

This series introduced a new method of first allowing a set of printk()
messages to be stored to the ringbuffer and then flushing the full
set. That is what this discussion was about.

The issue with allowing a set of printk() messages to be stored is that
you need to explicitly mark in code where the actual flushing should
occur. Petr's argument is that we do not want to insert "flush points"
into the panic() function and instead we should do as we do now: flush
each printk() message immediately.

In the end (for my upcoming v3 series) I agreed with Petr. We will
continue to keep things as they are now: flush each printk() message
immediately.

Currently consoles try to flush unsafely before kexec. With the atomic
printk series our goal is to only perform _safe_ flushing until all
other panic operations are complete. Only at the very end of panic()
would unsafe flushing be attempted.

John

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2023-10-16 10:10 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-19 23:08 [PATCH printk v2 00/11] wire up nbcon atomic printing John Ogness
2023-09-19 23:08 ` [PATCH printk v2 01/11] printk: Make console_is_usable() available to nbcon John Ogness
2023-09-22  8:33   ` Petr Mladek
2023-09-19 23:08 ` [PATCH printk v2 02/11] printk: Let console_is_usable() handle nbcon John Ogness
2023-09-22  8:37   ` Petr Mladek
2023-09-19 23:08 ` [PATCH printk v2 03/11] printk: Add @flags argument for console_is_usable() John Ogness
2023-09-22  8:41   ` Petr Mladek
2023-09-19 23:08 ` [PATCH printk v2 04/11] printk: nbcon: Provide functions to mark atomic write sections John Ogness
2023-09-22  9:33   ` Petr Mladek
2023-09-25  9:25     ` John Ogness
2023-09-25 16:04       ` Petr Mladek
2023-10-05 12:51         ` John Ogness
2023-10-06 12:51           ` panic context: was: " Petr Mladek
2023-10-06 12:53           ` Petr Mladek
2023-10-08 10:13             ` John Ogness
2023-10-09 15:32               ` Petr Mladek
2023-10-10 16:02                 ` John Ogness
2023-10-16  8:58               ` Dave Young
2023-10-16  8:58                 ` Dave Young
2023-10-16 10:09                 ` John Ogness
2023-10-16 10:09                   ` John Ogness
2023-10-06 15:52           ` Petr Mladek
2023-09-19 23:08 ` [PATCH printk v2 05/11] printk: nbcon: Provide function for atomic flushing John Ogness
2023-09-22 12:32   ` Petr Mladek
2023-09-25 11:11     ` John Ogness
2023-09-19 23:08 ` [PATCH printk v2 06/11] printk: nbcon: Wire up nbcon console " John Ogness
2023-09-22 17:41   ` Petr Mladek
2023-09-25 13:37     ` John Ogness
2023-09-26 12:14       ` Petr Mladek
2023-10-05 13:59         ` John Ogness
2023-09-19 23:08 ` [PATCH printk v2 07/11] printk: nbcon: Wire up nbcon into console_flush_all() John Ogness
2023-09-26 11:34   ` Petr Mladek
2023-09-19 23:08 ` [PATCH printk v2 08/11] panic: Add atomic write enforcement to warn/panic John Ogness
2023-09-27 12:02   ` Petr Mladek
2023-09-19 23:08 ` [PATCH printk v2 09/11] panic: Add atomic write enforcement to oops John Ogness
2023-09-19 23:36   ` John Ogness
2023-09-20 13:28   ` Andy Shevchenko
2023-09-20 14:20     ` John Ogness
2023-09-20 14:45       ` Andy Shevchenko
2023-09-27 13:15   ` Petr Mladek
2023-09-19 23:08 ` [PATCH printk v2 10/11] rcu: Add atomic write enforcement for rcu stalls John Ogness
2023-09-27 15:00   ` Petr Mladek
2023-09-19 23:08 ` [PATCH printk v2 11/11] lockdep: Add atomic write enforcement for lockdep splats John Ogness
2023-09-29  8:31   ` Petr Mladek

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.