linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch RFC 00/29] printk: A new approach - WIP
@ 2022-09-10 22:27 Thomas Gleixner
  2022-09-10 22:27 ` [patch RFC 01/29] printk: Make pr_flush() static Thomas Gleixner
                   ` (32 more replies)
  0 siblings, 33 replies; 62+ messages in thread
From: Thomas Gleixner @ 2022-09-10 22:27 UTC (permalink / raw)
  To: LKML
  Cc: John Ogness, Petr Mladek, Sergey Senozhatsky, Steven Rostedt,
	Linus Torvalds, Peter Zijlstra, Paul E. McKenney, Daniel Vetter,
	Greg Kroah-Hartman, Helge Deller, Jason Wessel, Daniel Thompson

Folks!

After the recent revert of the threaded printk patches, John and I sat down
and did a thorough analysis of the failure(s). It turned out that there are
several reasons why this failed:

  1) The blurry semantics of console lock which triggers unpleasant
     memories of BKL. That in turn inspired me to flag the new consoles
     with CON_NOBKL and use the nobkl theme throughout the series as I
     could not come up with a better name. :)

  2) The assumption that seperating a printk thread out from console lock,
     but at the same time partially depending on console lock. That's not
     really working out.

  3) The operation of consoles and printk threads was depending solely on
     global state and that state is on/off so it does not really qualify as
     stateful and is therefore not really useful to create a stable
     mechanism.

So I have to correct myself and admit that the revert was the Right Thing
to do. My other statements in that mail [1] still stand.

Nevertheless threaded printing is not only the last missing prerequisite
for enabling RT in mainline, it is also a long standing request from the
enterprise departement.

Synchronous printk is slow especially with serial consoles (which are at
the same time the most reliable ones) where a single character takes ~87
microseconds at 115200 Baud. With enough noise this causes lockup detectors
to trigger and the current "load balancing" approach of handing over the
consoles after one line to an different CPU is just a bandaid which "works"
by chance and makes CPUs spinwait for milliseconds. 

After taking a step back we decided to go for a radically different
approach:

  Create infrastructure which is designed to overcome the current
  console/printk shortcomings and convert drivers one by one.

  A CPU hotplug dejavu. And yes we should have gone for that right away.
  Water down the bridge...

The infrastructure we implemented comes with the following properties:

  - It shares the console list, but only for registration/unregistration
    purposes. Consoles which are utilizing the new infrastructure are
    ignored by the existing mechanisms and vice versa. This allows to
    reuse as much code as possible and preserves the printk semantics
    except for the threaded printing part.

  - The console list walk becomes SRCU protected to avoid any restrictions
    on contexts

  - Consoles become stateful to handle handover and takeover in a graceful
    way.

  - All console state operations rely solely on atomic*_try_cmpxchg() so
    they work in any context.

  - Console locking is per console to allow friendly handover or "safe"
    hostile takeover in emergency/panic situations. Console lock is not
    relevant for consoles converted to the new infrastructure.

  - The core provides interfaces for console drivers to query whether they
    can proceed and to denote 'unsafe' sections in the console state, which
    is unavoidable for some console drivers.

    In fact there is not a single regular (non-early) console driver today
    which is reentrancy safe under all circumstances, which enforces that
    NMI context is excluded from printing directly. TBH, that's a sad state
    of affairs.

    The unsafe state bit allows to make informed decisions in the core
    code, e.g. to postpone printing if there are consoles available which
    are safe to acquire. In case of a hostile takeover the unsafe state bit
    is handed to the atomic write callback so that the console driver can
    act accordingly.
    
  - Printing is delegated to a per console printer thread except for the
    following situations:

       - Early boot
       - Emergency printing (WARN/OOPS)
       - Panic printing

The integration is complete, but without any fancy things, like locking all
consoles when entering a WARN, print and unlock when done. Such things only
make sense once all drivers are converted over because that conflicts with
the way how the existing console lock mechanics work.

For testing we used the most simple driver: a hacked up version of the
early uart8250 console as we wanted to concentrate on validating the core
mechanisms of friendly handover and hostile takeovers instead of dealing
with the horrors of port locks or whatever at the same time. That's the
next challenge. Hack patch will be provided in a reply.

Here is sample output where we let the atomic and thread write functions
prepend each line with the printing context (A=atomic, T=thread):

A[    0.394066] ... fixed-purpose events:   3
A[    0.395130] ... event mask:             000000070000000f

End of early boot, thread starts

TA[    0.396821] rcu: Hierarchical SRCU implementation.

^^ Thread starts printing and immediately raises a warning, so atomic
   context at the emergency priority takes over and continues printing.

   This is a forceful, but safe takeover scenario as the WARN context
   is obviously on the same CPU as the printing thread where friendly
   is not an option.

A[    0.397133] rcu: 	Max phase no-delay instances is 400.
A[    0.397640] ------------[ cut here ]------------
A[    0.398072] WARNING: CPU: 0 PID: 13 at drivers/tty/serial/8250/8250_early.c:123 __early_serial8250_write.isra.0+0x80/0xa0
....
A[    0.440131]  ret_from_fork+0x1f/0x30
A[    0.441133]  </TASK>
A[    0.441867] ---[ end trace 0000000000000000 ]---
T[    0.443493] smp: Bringing up secondary CPUs ...

After the warning the thread continues printing.

....

T[    1.916873] input: ImExPS/2 Generic Explorer Mouse as /devices/platform/i8042/serio1/input/input3
T[    1.918719] md: Waiting for all devices to be available before autod

A[    1.918719] md: Waiting for all devices to be available before autodetect

System panics because it can't find a root file system. Panic printing
takes over the console from the printer thread immediately and reprints
the interrupted line.

This case is a friendly handover from the printing thread to the panic
context because the printing thread was not running on the panic CPU, but
handed over gracefully.

A[    1.919942] md: If you don't use raid, use raid=noautodetect
A[    1.921030] md: Autodetecting RAID arrays.
A[    1.921919] md: autorun ...
A[    1.922686] md: ... autorun DONE.
A[    1.923761] /dev/root: Can't open blockdev

So far the implemented state handling machinery holds up on the various
handover and hostile takeover situations we enforced for testing.

Hostile takeover is nevertheless a special case. If the driver is in an
unsafe region that's something which needs to be dealt with per driver.
There is not much the core code can do here except of trying a friendly
handover first and only enforcing it after a timeout or not trying to print
on such consoles.

This needs some thought, but we explicitely did not implement any takeover
policy into the core state handling mechanism as this is really a decision
which needs to be made at the call site. See patch 28.

We are soliciting feedback on that approach and we hope that we can
organize a BOF in Dublin on short notice.

The series is also available from git:

    git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git printk

The series has the following parts:

   Patches  1 - 5:   Cleanups

   Patches  6 - 12:  Locking and list conversion

   Patches 13 - 18:  Improved output buffer handling to prepare for
   	      	     code sharing

   Patches 19 - 29:  New infrastructure implementation

Most of the preparatory patches 1-18 have probably a value on their own.

Don't be scared about the patch stats below. We added kernel doc and
extensive comments to the code:

  kernel/printk/printk_nobkl.c: Code: 668 lines, Comments: 697 lines, Ratio: 1:1.043

Of course the code is trivial and straight forward as any other facility
which has to deal with concurrency and the twist of being safe in any
context. :)

Comments welcome.

Thanks,

	tglx
---
[1] https://lore.kernel.org/lkml/87r11qp63n.ffs@tglx/
---
 arch/parisc/include/asm/pdc.h  |    2 
 arch/parisc/kernel/pdc_cons.c  |   53 -
 arch/parisc/kernel/traps.c     |   17 
 b/kernel/printk/printk_nobkl.c | 1564 +++++++++++++++++++++++++++++++++++++++++
 drivers/tty/serial/kgdboc.c    |    7 
 drivers/tty/tty_io.c           |    6 
 fs/proc/consoles.c             |   12 
 fs/proc/kmsg.c                 |    2 
 include/linux/console.h        |  375 +++++++++
 include/linux/printk.h         |    9 
 include/linux/syslog.h         |    3 
 kernel/debug/kdb/kdb_io.c      |    7 
 kernel/panic.c                 |   12 
 kernel/printk/printk.c         |  485 ++++++++----
 14 files changed, 2304 insertions(+), 250 deletions(-)

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [patch RFC 01/29] printk: Make pr_flush() static
  2022-09-10 22:27 [patch RFC 00/29] printk: A new approach - WIP Thomas Gleixner
@ 2022-09-10 22:27 ` Thomas Gleixner
  2022-09-14 11:27   ` Sergey Senozhatsky
  2022-09-10 22:27 ` [patch RFC 02/29] printk: Declare log_wait properly Thomas Gleixner
                   ` (31 subsequent siblings)
  32 siblings, 1 reply; 62+ messages in thread
From: Thomas Gleixner @ 2022-09-10 22:27 UTC (permalink / raw)
  To: LKML
  Cc: John Ogness, Petr Mladek, Sergey Senozhatsky, Steven Rostedt,
	Linus Torvalds, Peter Zijlstra, Paul E. McKenney, Daniel Vetter,
	Greg Kroah-Hartman, Helge Deller, Jason Wessel, Daniel Thompson

No user outside the printk code and no reason to export this.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/printk.h |    7 -------
 kernel/printk/printk.c |    5 +++--
 2 files changed, 3 insertions(+), 9 deletions(-)

--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -169,8 +169,6 @@ extern void __printk_safe_exit(void);
 #define printk_deferred_enter __printk_safe_enter
 #define printk_deferred_exit __printk_safe_exit
 
-extern bool pr_flush(int timeout_ms, bool reset_on_progress);
-
 /*
  * Please don't use printk_ratelimit(), because it shares ratelimiting state
  * with all other unrelated printk_ratelimit() callsites.  Instead use
@@ -221,11 +219,6 @@ static inline void printk_deferred_exit(
 {
 }
 
-static inline bool pr_flush(int timeout_ms, bool reset_on_progress)
-{
-	return true;
-}
-
 static inline int printk_ratelimit(void)
 {
 	return 0;
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2296,6 +2296,7 @@ asmlinkage __visible int _printk(const c
 }
 EXPORT_SYMBOL(_printk);
 
+static bool pr_flush(int timeout_ms, bool reset_on_progress);
 static bool __pr_flush(struct console *con, int timeout_ms, bool reset_on_progress);
 
 #else /* CONFIG_PRINTK */
@@ -2330,6 +2331,7 @@ static void call_console_driver(struct c
 {
 }
 static bool suppress_message_printing(int level) { return false; }
+static bool pr_flush(int timeout_ms, bool reset_on_progress) { return true; }
 static bool __pr_flush(struct console *con, int timeout_ms, bool reset_on_progress) { return true; }
 
 #endif /* CONFIG_PRINTK */
@@ -3438,11 +3440,10 @@ static bool __pr_flush(struct console *c
  * Context: Process context. May sleep while acquiring console lock.
  * Return: true if all enabled printers are caught up.
  */
-bool pr_flush(int timeout_ms, bool reset_on_progress)
+static bool pr_flush(int timeout_ms, bool reset_on_progress)
 {
 	return __pr_flush(NULL, timeout_ms, reset_on_progress);
 }
-EXPORT_SYMBOL(pr_flush);
 
 /*
  * Delayed printk version, for scheduler-internal messages:


^ permalink raw reply	[flat|nested] 62+ messages in thread

* [patch RFC 02/29] printk: Declare log_wait properly
  2022-09-10 22:27 [patch RFC 00/29] printk: A new approach - WIP Thomas Gleixner
  2022-09-10 22:27 ` [patch RFC 01/29] printk: Make pr_flush() static Thomas Gleixner
@ 2022-09-10 22:27 ` Thomas Gleixner
  2022-09-14 11:29   ` Sergey Senozhatsky
  2022-09-10 22:27 ` [patch RFC 03/29] printk: Remove write only variable nr_ext_console_drivers Thomas Gleixner
                   ` (30 subsequent siblings)
  32 siblings, 1 reply; 62+ messages in thread
From: Thomas Gleixner @ 2022-09-10 22:27 UTC (permalink / raw)
  To: LKML
  Cc: John Ogness, Petr Mladek, Sergey Senozhatsky, Steven Rostedt,
	Linus Torvalds, Peter Zijlstra, Paul E. McKenney, Daniel Vetter,
	Greg Kroah-Hartman, Helge Deller, Jason Wessel, Daniel Thompson

kernel/printk/printk.c:365:1: warning: symbol 'log_wait' was not declared. Should it be static?

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 fs/proc/kmsg.c         |    2 --
 include/linux/syslog.h |    3 +++
 2 files changed, 3 insertions(+), 2 deletions(-)

--- a/fs/proc/kmsg.c
+++ b/fs/proc/kmsg.c
@@ -17,8 +17,6 @@
 
 #include <asm/io.h>
 
-extern wait_queue_head_t log_wait;
-
 static int kmsg_open(struct inode * inode, struct file * file)
 {
 	return do_syslog(SYSLOG_ACTION_OPEN, NULL, 0, SYSLOG_FROM_PROC);
--- a/include/linux/syslog.h
+++ b/include/linux/syslog.h
@@ -8,6 +8,8 @@
 #ifndef _LINUX_SYSLOG_H
 #define _LINUX_SYSLOG_H
 
+#include <linux/wait.h>
+
 /* Close the log.  Currently a NOP. */
 #define SYSLOG_ACTION_CLOSE          0
 /* Open the log. Currently a NOP. */
@@ -35,5 +37,6 @@
 #define SYSLOG_FROM_PROC             1
 
 int do_syslog(int type, char __user *buf, int count, int source);
+extern wait_queue_head_t log_wait;
 
 #endif /* _LINUX_SYSLOG_H */


^ permalink raw reply	[flat|nested] 62+ messages in thread

* [patch RFC 03/29] printk: Remove write only variable nr_ext_console_drivers
  2022-09-10 22:27 [patch RFC 00/29] printk: A new approach - WIP Thomas Gleixner
  2022-09-10 22:27 ` [patch RFC 01/29] printk: Make pr_flush() static Thomas Gleixner
  2022-09-10 22:27 ` [patch RFC 02/29] printk: Declare log_wait properly Thomas Gleixner
@ 2022-09-10 22:27 ` Thomas Gleixner
  2022-09-14 11:33   ` Sergey Senozhatsky
  2022-09-10 22:27 ` [patch RFC 04/29] printk: Remove bogus comment vs. boot consoles Thomas Gleixner
                   ` (29 subsequent siblings)
  32 siblings, 1 reply; 62+ messages in thread
From: Thomas Gleixner @ 2022-09-10 22:27 UTC (permalink / raw)
  To: LKML
  Cc: John Ogness, Petr Mladek, Sergey Senozhatsky, Steven Rostedt,
	Linus Torvalds, Peter Zijlstra, Paul E. McKenney, Daniel Vetter,
	Greg Kroah-Hartman, Helge Deller, Jason Wessel, Daniel Thompson

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/printk/printk.c |    9 ---------
 1 file changed, 9 deletions(-)

--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -220,9 +220,6 @@ int devkmsg_sysctl_set_loglvl(struct ctl
 }
 #endif /* CONFIG_PRINTK && CONFIG_SYSCTL */
 
-/* Number of registered extended console drivers. */
-static int nr_ext_console_drivers;
-
 /*
  * Helper macros to handle lockdep when locking/unlocking console_sem. We use
  * macros instead of functions so that _RET_IP_ contains useful information.
@@ -3188,9 +3185,6 @@ void register_console(struct console *ne
 		console_drivers->next = newcon;
 	}
 
-	if (newcon->flags & CON_EXTENDED)
-		nr_ext_console_drivers++;
-
 	newcon->dropped = 0;
 	if (newcon->flags & CON_PRINTBUFFER) {
 		/* Get a consistent copy of @syslog_seq. */
@@ -3256,9 +3250,6 @@ int unregister_console(struct console *c
 	if (res)
 		goto out_disable_unlock;
 
-	if (console->flags & CON_EXTENDED)
-		nr_ext_console_drivers--;
-
 	/*
 	 * If this isn't the last console and it has CON_CONSDEV set, we
 	 * need to set it on the next preferred console.


^ permalink raw reply	[flat|nested] 62+ messages in thread

* [patch RFC 04/29] printk: Remove bogus comment vs. boot consoles
  2022-09-10 22:27 [patch RFC 00/29] printk: A new approach - WIP Thomas Gleixner
                   ` (2 preceding siblings ...)
  2022-09-10 22:27 ` [patch RFC 03/29] printk: Remove write only variable nr_ext_console_drivers Thomas Gleixner
@ 2022-09-10 22:27 ` Thomas Gleixner
  2022-09-14 11:40   ` Sergey Senozhatsky
  2022-09-10 22:27 ` [patch RFC 05/29] printk: Mark __printk percpu data ready __ro_after_init Thomas Gleixner
                   ` (28 subsequent siblings)
  32 siblings, 1 reply; 62+ messages in thread
From: Thomas Gleixner @ 2022-09-10 22:27 UTC (permalink / raw)
  To: LKML
  Cc: John Ogness, Petr Mladek, Sergey Senozhatsky, Steven Rostedt,
	Linus Torvalds, Peter Zijlstra, Paul E. McKenney, Daniel Vetter,
	Greg Kroah-Hartman, Helge Deller, Jason Wessel, Daniel Thompson

The comment about unregistering boot consoles is just not matching the
reality. Remove it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/printk/printk.c |    3 ---
 1 file changed, 3 deletions(-)

--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -3213,9 +3213,6 @@ void register_console(struct console *ne
 	if (bootcon_enabled &&
 	    ((newcon->flags & (CON_CONSDEV | CON_BOOT)) == CON_CONSDEV) &&
 	    !keep_bootcon) {
-		/* We need to iterate through all boot consoles, to make
-		 * sure we print everything out, before we unregister them.
-		 */
 		for_each_console(con)
 			if (con->flags & CON_BOOT)
 				unregister_console(con);


^ permalink raw reply	[flat|nested] 62+ messages in thread

* [patch RFC 05/29] printk: Mark __printk percpu data ready __ro_after_init
  2022-09-10 22:27 [patch RFC 00/29] printk: A new approach - WIP Thomas Gleixner
                   ` (3 preceding siblings ...)
  2022-09-10 22:27 ` [patch RFC 04/29] printk: Remove bogus comment vs. boot consoles Thomas Gleixner
@ 2022-09-10 22:27 ` Thomas Gleixner
  2022-09-14 11:41   ` Sergey Senozhatsky
  2022-09-10 22:27 ` [patch RFC 06/29] printk: Protect [un]register_console() with a mutex Thomas Gleixner
                   ` (27 subsequent siblings)
  32 siblings, 1 reply; 62+ messages in thread
From: Thomas Gleixner @ 2022-09-10 22:27 UTC (permalink / raw)
  To: LKML
  Cc: John Ogness, Petr Mladek, Sergey Senozhatsky, Steven Rostedt,
	Linus Torvalds, Peter Zijlstra, Paul E. McKenney, Daniel Vetter,
	Greg Kroah-Hartman, Helge Deller, Jason Wessel, Daniel Thompson

This variable cannot change post boot.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/printk/printk.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -430,7 +430,7 @@ static struct printk_ringbuffer *prb = &
  * per_cpu_areas are initialised. This variable is set to true when
  * it's safe to access per-CPU data.
  */
-static bool __printk_percpu_data_ready __read_mostly;
+static bool __printk_percpu_data_ready __ro_after_init;
 
 bool printk_percpu_data_ready(void)
 {


^ permalink raw reply	[flat|nested] 62+ messages in thread

* [patch RFC 06/29] printk: Protect [un]register_console() with a mutex
  2022-09-10 22:27 [patch RFC 00/29] printk: A new approach - WIP Thomas Gleixner
                   ` (4 preceding siblings ...)
  2022-09-10 22:27 ` [patch RFC 05/29] printk: Mark __printk percpu data ready __ro_after_init Thomas Gleixner
@ 2022-09-10 22:27 ` Thomas Gleixner
  2022-09-14 12:05   ` Sergey Senozhatsky
                     ` (2 more replies)
  2022-09-10 22:27 ` [patch RFC 07/29] printk: Convert console list walks for readers to list lock Thomas Gleixner
                   ` (26 subsequent siblings)
  32 siblings, 3 replies; 62+ messages in thread
From: Thomas Gleixner @ 2022-09-10 22:27 UTC (permalink / raw)
  To: LKML
  Cc: John Ogness, Petr Mladek, Sergey Senozhatsky, Steven Rostedt,
	Linus Torvalds, Peter Zijlstra, Paul E. McKenney, Daniel Vetter,
	Greg Kroah-Hartman, Helge Deller, Jason Wessel, Daniel Thompson

Unprotected list walks are a brilliant idea. Especially in the context of
hotpluggable consoles.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/console.h |   30 ++++++++++++++++--
 kernel/printk/printk.c  |   79 ++++++++++++++++++++++++++++++++++++++++--------
 2 files changed, 93 insertions(+), 16 deletions(-)

--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -157,10 +157,34 @@ struct console {
 	struct	 console *next;
 };
 
-/*
- * for_each_console() allows you to iterate on each console
+#ifdef CONFIG_LOCKDEP
+extern void lockdep_assert_console_list_lock_held(void);
+#else
+static inline void lockdep_assert_console_list_lock_held(void) { }
+#endif
+
+extern void console_list_lock(void) __acquires(console_mutex);
+extern void console_list_unlock(void) __releases(console_mutex);
+
+/**
+ * for_each_registered_console() - Iterator over registered consoles
+ * @con:	struct console pointer used as loop cursor
+ *
+ * Requires console_list_lock to be held. Can only be invoked from
+ * preemptible context.
+ */
+#define for_each_registered_console(con)				\
+	lockdep_assert_console_list_lock_held();			\
+	for (con = console_drivers; con != NULL; con = con->next)
+
+/**
+ * for_each_console() - Iterator over registered consoles
+ * @con:	struct console pointer used as loop cursor
+ *
+ * Requires console_lock to be held which guarantees that the
+ * list is immutable.
  */
-#define for_each_console(con) \
+#define for_each_console(con)						\
 	for (con = console_drivers; con != NULL; con = con->next)
 
 extern int console_set_on_cmdline;
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -79,10 +79,14 @@ int oops_in_progress;
 EXPORT_SYMBOL(oops_in_progress);
 
 /*
- * console_sem protects the console_drivers list, and also
- * provides serialisation for access to the entire console
- * driver system.
+ * console_sem protects the console_drivers list, and also provides
+ * serialization for access to the entire console driver system.
+ *
+ * console_mutex serializes register/unregister. console_sem has to be
+ * taken for any list manipulation inside the console_mutex locked
+ * section to keep the console BKL machinery happy.
  */
+static DEFINE_MUTEX(console_mutex);
 static DEFINE_SEMAPHORE(console_sem);
 struct console *console_drivers;
 EXPORT_SYMBOL_GPL(console_drivers);
@@ -103,6 +107,12 @@ static int __read_mostly suppress_panic_
 static struct lockdep_map console_lock_dep_map = {
 	.name = "console_lock"
 };
+
+void lockdep_assert_console_list_lock_held(void)
+{
+	lockdep_assert_held(&console_mutex);
+}
+
 #endif
 
 enum devkmsg_log_bits {
@@ -220,6 +230,26 @@ int devkmsg_sysctl_set_loglvl(struct ctl
 }
 #endif /* CONFIG_PRINTK && CONFIG_SYSCTL */
 
+/**
+ * console_list_lock - Lock the console list
+ *
+ * For non-console related list walks, e.g. procfs, sysfs...
+ */
+void console_list_lock(void)
+{
+	mutex_lock(&console_mutex);
+}
+
+/**
+ * console_list_unlock - Unlock the console list
+ *
+ * Counterpart to console_list_lock()
+ */
+void console_list_unlock(void)
+{
+	mutex_unlock(&console_mutex);
+}
+
 /*
  * Helper macros to handle lockdep when locking/unlocking console_sem. We use
  * macros instead of functions so that _RET_IP_ contains useful information.
@@ -2978,17 +3008,21 @@ struct tty_driver *console_device(int *i
 void console_stop(struct console *console)
 {
 	__pr_flush(console, 1000, true);
+	console_list_lock();
 	console_lock();
 	console->flags &= ~CON_ENABLED;
 	console_unlock();
+	console_list_unlock();
 }
 EXPORT_SYMBOL(console_stop);
 
 void console_start(struct console *console)
 {
+	console_list_lock();
 	console_lock();
 	console->flags |= CON_ENABLED;
 	console_unlock();
+	console_list_unlock();
 	__pr_flush(console, 1000, true);
 }
 EXPORT_SYMBOL(console_start);
@@ -3081,6 +3115,8 @@ static void try_enable_default_console(s
 	       (con->flags & CON_BOOT) ? "boot" : "",	\
 	       con->name, con->index, ##__VA_ARGS__)
 
+static int console_unregister_locked(struct console *console);
+
 /*
  * The console driver calls this routine during kernel initialization
  * to register the console printing procedure with printk() and to
@@ -3107,13 +3143,14 @@ void register_console(struct console *ne
 	bool realcon_enabled = false;
 	int err;
 
-	for_each_console(con) {
+	console_list_lock();
+	for_each_registered_console(con) {
 		if (WARN(con == newcon, "console '%s%d' already registered\n",
 					 con->name, con->index))
-			return;
+			goto unlock;
 	}
 
-	for_each_console(con) {
+	for_each_registered_console(con) {
 		if (con->flags & CON_BOOT)
 			bootcon_enabled = true;
 		else
@@ -3124,7 +3161,7 @@ void register_console(struct console *ne
 	if (newcon->flags & CON_BOOT && realcon_enabled) {
 		pr_info("Too late to register bootconsole %s%d\n",
 			newcon->name, newcon->index);
-		return;
+		goto unlock;
 	}
 
 	/*
@@ -3155,7 +3192,7 @@ void register_console(struct console *ne
 
 	/* printk() messages are not printed to the Braille console. */
 	if (err || newcon->flags & CON_BRL)
-		return;
+		goto unlock;
 
 	/*
 	 * If we have a bootconsole, and are switching to a real console,
@@ -3209,14 +3246,17 @@ void register_console(struct console *ne
 	if (bootcon_enabled &&
 	    ((newcon->flags & (CON_CONSDEV | CON_BOOT)) == CON_CONSDEV) &&
 	    !keep_bootcon) {
-		for_each_console(con)
+		for_each_console(con) {
 			if (con->flags & CON_BOOT)
-				unregister_console(con);
+				console_unregister_locked(con);
+		}
 	}
+unlock:
+	console_list_unlock();
 }
 EXPORT_SYMBOL(register_console);
 
-int unregister_console(struct console *console)
+static int console_unregister_locked(struct console *console)
 {
 	struct console *con;
 	int res;
@@ -3269,6 +3309,16 @@ int unregister_console(struct console *c
 
 	return res;
 }
+
+int unregister_console(struct console *console)
+{
+	int res;
+
+	console_list_lock();
+	res = console_unregister_locked(console);
+	console_list_unlock();
+	return res;
+}
 EXPORT_SYMBOL(unregister_console);
 
 /*
@@ -3320,7 +3370,8 @@ static int __init printk_late_init(void)
 	struct console *con;
 	int ret;
 
-	for_each_console(con) {
+	console_list_lock();
+	for_each_registered_console(con) {
 		if (!(con->flags & CON_BOOT))
 			continue;
 
@@ -3337,9 +3388,11 @@ static int __init printk_late_init(void)
 			 */
 			pr_warn("bootconsole [%s%d] uses init memory and must be disabled even before the real one is ready\n",
 				con->name, con->index);
-			unregister_console(con);
+			console_unregister_locked(con);
 		}
 	}
+	console_list_unlock();
+
 	ret = cpuhp_setup_state_nocalls(CPUHP_PRINTK_DEAD, "printk:dead", NULL,
 					console_cpu_notify);
 	WARN_ON(ret < 0);


^ permalink raw reply	[flat|nested] 62+ messages in thread

* [patch RFC 07/29] printk: Convert console list walks for readers to list lock
  2022-09-10 22:27 [patch RFC 00/29] printk: A new approach - WIP Thomas Gleixner
                   ` (5 preceding siblings ...)
  2022-09-10 22:27 ` [patch RFC 06/29] printk: Protect [un]register_console() with a mutex Thomas Gleixner
@ 2022-09-10 22:27 ` Thomas Gleixner
  2022-09-14 12:46   ` Sergey Senozhatsky
  2022-09-10 22:27 ` [patch RFC 08/29] parisc: Put console abuse into one place Thomas Gleixner
                   ` (25 subsequent siblings)
  32 siblings, 1 reply; 62+ messages in thread
From: Thomas Gleixner @ 2022-09-10 22:27 UTC (permalink / raw)
  To: LKML
  Cc: John Ogness, Petr Mladek, Sergey Senozhatsky, Steven Rostedt,
	Linus Torvalds, Peter Zijlstra, Paul E. McKenney, Daniel Vetter,
	Greg Kroah-Hartman, Helge Deller, Jason Wessel, Daniel Thompson

Facilities which expose console information to sysfs or procfs can use the
new list protection to keep the list stable. No need to hold console lock.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 drivers/tty/tty_io.c   |    6 +++---
 fs/proc/consoles.c     |    6 +++---
 kernel/printk/printk.c |    8 ++++----
 3 files changed, 10 insertions(+), 10 deletions(-)

--- a/drivers/tty/tty_io.c
+++ b/drivers/tty/tty_io.c
@@ -3535,8 +3535,8 @@ static ssize_t show_cons_active(struct d
 	struct console *c;
 	ssize_t count = 0;
 
-	console_lock();
-	for_each_console(c) {
+	console_list_lock();
+	for_each_registered_console(c) {
 		if (!c->device)
 			continue;
 		if (!c->write)
@@ -3560,7 +3560,7 @@ static ssize_t show_cons_active(struct d
 
 		count += sprintf(buf + count, "%c", i ? ' ':'\n');
 	}
-	console_unlock();
+	console_list_unlock();
 
 	return count;
 }
--- a/fs/proc/consoles.c
+++ b/fs/proc/consoles.c
@@ -63,8 +63,8 @@ static void *c_start(struct seq_file *m,
 	struct console *con;
 	loff_t off = 0;
 
-	console_lock();
-	for_each_console(con)
+	console_list_lock();
+	for_each_registered_console(con)
 		if (off++ == *pos)
 			break;
 
@@ -80,7 +80,7 @@ static void *c_next(struct seq_file *m,
 
 static void c_stop(struct seq_file *m, void *v)
 {
-	console_unlock();
+	console_list_unlock();
 }
 
 static const struct seq_operations consoles_op = {
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2985,18 +2985,18 @@ void console_flush_on_panic(enum con_flu
  */
 struct tty_driver *console_device(int *index)
 {
-	struct console *c;
 	struct tty_driver *driver = NULL;
+	struct console *c;
 
-	console_lock();
-	for_each_console(c) {
+	console_list_lock();
+	for_each_registered_console(c) {
 		if (!c->device)
 			continue;
 		driver = c->device(c, index);
 		if (driver)
 			break;
 	}
-	console_unlock();
+	console_list_unlock();
 	return driver;
 }
 


^ permalink raw reply	[flat|nested] 62+ messages in thread

* [patch RFC 08/29] parisc: Put console abuse into one place
  2022-09-10 22:27 [patch RFC 00/29] printk: A new approach - WIP Thomas Gleixner
                   ` (6 preceding siblings ...)
  2022-09-10 22:27 ` [patch RFC 07/29] printk: Convert console list walks for readers to list lock Thomas Gleixner
@ 2022-09-10 22:27 ` Thomas Gleixner
  2022-09-14 14:56   ` Sergey Senozhatsky
  2022-09-10 22:27 ` [patch RFC 09/29] serial: kgdboc: Lock consoles in probe function Thomas Gleixner
                   ` (24 subsequent siblings)
  32 siblings, 1 reply; 62+ messages in thread
From: Thomas Gleixner @ 2022-09-10 22:27 UTC (permalink / raw)
  To: LKML
  Cc: John Ogness, Petr Mladek, Sergey Senozhatsky, Steven Rostedt,
	Linus Torvalds, Peter Zijlstra, Paul E. McKenney, Daniel Vetter,
	Greg Kroah-Hartman, Helge Deller, Jason Wessel, Daniel Thompson

PARISC has a hope based mechanism to restore consoles in case of a HPMC
(machine check exception) which is scattered over several places.

Move it into one place to make further changes simpler and add comments.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/parisc/include/asm/pdc.h |    2 +-
 arch/parisc/kernel/pdc_cons.c |   38 +++++++++++++++++++++++++++++++-------
 arch/parisc/kernel/traps.c    |   17 +++++------------
 3 files changed, 37 insertions(+), 20 deletions(-)

--- a/arch/parisc/include/asm/pdc.h
+++ b/arch/parisc/include/asm/pdc.h
@@ -20,7 +20,7 @@ extern unsigned long parisc_pat_pdc_cap;
 #define PDC_TYPE_SNAKE		 2 /* Doesn't support SYSTEM_MAP */
 
 void pdc_console_init(void);	/* in pdc_console.c */
-void pdc_console_restart(void);
+void pdc_console_restart(bool hpmc);
 
 void setup_pdc(void);		/* in inventory.c */
 
--- a/arch/parisc/kernel/pdc_cons.c
+++ b/arch/parisc/kernel/pdc_cons.c
@@ -237,20 +237,44 @@ void __init pdc_console_init(void)
 
 
 /*
- * Used for emergencies. Currently only used if an HPMC occurs. If an
- * HPMC occurs, it is possible that the current console may not be
- * properly initialised after the PDC IO reset. This routine unregisters
- * all of the current consoles, reinitializes the pdc console and
- * registers it.
+ * <Historical comments>
+ *
+ * Used for emergencies.
+ *
+ *  - If an HPMC occurs, it is possible that the current console may not be
+ *    properly initialised after the PDC IO reset. This routine unregisters
+ *    all of the current consoles, reinitializes the pdc console and registers
+ *    it.
+ *
+ *  - Maybe the kernel hasn't booted very far yet and hasn't been able
+ *    to initialize the serial or STI console. In that case we should
+ *    re-enable the pdc console, so that the user will be able to
+ *    identify the problem.
+ *
+ * </Historical comments>
+ *
+ * The above is all wishful thinking:
+ *
+ *  - Invoking [un]register_console() from exception contexts is obviously
+ *    unsafe.
+ *
+ *  - If the HPMC left the machine in unpleasant state and the pdc console
+ *    was already initialized, but later removed due to CON_BOOT then this
+ *    will do nothing.
+ *
+ * Pretend that any of the below works in the same way as we pretend that
+ * any of PARISC works.
  */
-
-void pdc_console_restart(void)
+void pdc_console_restart(bool hpmc)
 {
 	struct console *console;
 
 	if (pdc_console_initialized)
 		return;
 
+	if (!hpmc && console_drivers)
+		return;
+
 	/* If we've already seen the output, don't bother to print it again */
 	if (console_drivers != NULL)
 		pdc_cons.flags &= ~CON_PRINTBUFFER;
--- a/arch/parisc/kernel/traps.c
+++ b/arch/parisc/kernel/traps.c
@@ -235,17 +235,12 @@ void die_if_kernel(char *str, struct pt_
 			"                 (__)\\       )\\/\\\n"
 			"                  U  ||----w |\n"
 			"                     ||     ||\n");
-	
+
 	/* unlock the pdc lock if necessary */
 	pdc_emergency_unlock();
 
-	/* maybe the kernel hasn't booted very far yet and hasn't been able 
-	 * to initialize the serial or STI console. In that case we should 
-	 * re-enable the pdc console, so that the user will be able to 
-	 * identify the problem. */
-	if (!console_drivers)
-		pdc_console_restart();
-	
+	pdc_console_restart(false);
+
 	if (err)
 		printk(KERN_CRIT "%s (pid %d): %s (code %ld)\n",
 			current->comm, task_pid_nr(current), str, err);
@@ -429,9 +424,7 @@ void parisc_terminate(char *msg, struct
 	/* unlock the pdc lock if necessary */
 	pdc_emergency_unlock();
 
-	/* restart pdc console if necessary */
-	if (!console_drivers)
-		pdc_console_restart();
+	pdc_console_restart(false);
 
 	/* Not all paths will gutter the processor... */
 	switch(code){
@@ -483,7 +476,7 @@ void notrace handle_interruption(int cod
 	int si_code;
 
 	if (code == 1)
-	    pdc_console_restart();  /* switch back to pdc if HPMC */
+	    pdc_console_restart(true);  /* switch back to pdc if HPMC */
 	else if (!irqs_disabled_flags(regs->gr[0]))
 	    local_irq_enable();
 


^ permalink raw reply	[flat|nested] 62+ messages in thread

* [patch RFC 09/29] serial: kgdboc: Lock consoles in probe function
  2022-09-10 22:27 [patch RFC 00/29] printk: A new approach - WIP Thomas Gleixner
                   ` (7 preceding siblings ...)
  2022-09-10 22:27 ` [patch RFC 08/29] parisc: Put console abuse into one place Thomas Gleixner
@ 2022-09-10 22:27 ` Thomas Gleixner
  2022-09-14 14:59   ` Sergey Senozhatsky
  2022-09-10 22:27 ` [patch RFC 10/29] kgbd: Pretend that console list walk is safe Thomas Gleixner
                   ` (23 subsequent siblings)
  32 siblings, 1 reply; 62+ messages in thread
From: Thomas Gleixner @ 2022-09-10 22:27 UTC (permalink / raw)
  To: LKML
  Cc: John Ogness, Petr Mladek, Sergey Senozhatsky, Steven Rostedt,
	Linus Torvalds, Peter Zijlstra, Paul E. McKenney, Daniel Vetter,
	Greg Kroah-Hartman, Helge Deller, Jason Wessel, Daniel Thompson

Unprotected list walks are not necessarily safe.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 drivers/tty/serial/kgdboc.c |    2 ++
 1 file changed, 2 insertions(+)

--- a/drivers/tty/serial/kgdboc.c
+++ b/drivers/tty/serial/kgdboc.c
@@ -193,6 +193,7 @@ static int configure_kgdboc(void)
 	if (!p)
 		goto noconfig;
 
+	console_lock();
 	for_each_console(cons) {
 		int idx;
 		if (cons->device && cons->device(cons, &idx) == p &&
@@ -201,6 +202,7 @@ static int configure_kgdboc(void)
 			break;
 		}
 	}
+	console_unlock();
 
 	kgdb_tty_driver = p;
 	kgdb_tty_line = tty_line;


^ permalink raw reply	[flat|nested] 62+ messages in thread

* [patch RFC 10/29] kgbd: Pretend that console list walk is safe
  2022-09-10 22:27 [patch RFC 00/29] printk: A new approach - WIP Thomas Gleixner
                   ` (8 preceding siblings ...)
  2022-09-10 22:27 ` [patch RFC 09/29] serial: kgdboc: Lock consoles in probe function Thomas Gleixner
@ 2022-09-10 22:27 ` Thomas Gleixner
  2022-09-14 15:03   ` Sergey Senozhatsky
  2022-09-10 22:27 ` [patch RFC 11/29] printk: Convert console_drivers list to hlist Thomas Gleixner
                   ` (22 subsequent siblings)
  32 siblings, 1 reply; 62+ messages in thread
From: Thomas Gleixner @ 2022-09-10 22:27 UTC (permalink / raw)
  To: LKML
  Cc: John Ogness, Petr Mladek, Sergey Senozhatsky, Steven Rostedt,
	Linus Torvalds, Peter Zijlstra, Paul E. McKenney, Daniel Vetter,
	Greg Kroah-Hartman, Helge Deller, Jason Wessel, Daniel Thompson

Provide a special list iterator macro for KGDB to allow unprotected list
walks and add a few comments to explain the hope based approach.

Preperatory change for changing the console list to hlist and adding
lockdep asserts to regular list walks.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 drivers/tty/serial/kgdboc.c |    5 ++++-
 include/linux/console.h     |   12 +++++++++++-
 kernel/debug/kdb/kdb_io.c   |    7 ++++++-
 3 files changed, 21 insertions(+), 3 deletions(-)

--- a/drivers/tty/serial/kgdboc.c
+++ b/drivers/tty/serial/kgdboc.c
@@ -462,10 +462,13 @@ static void kgdboc_earlycon_pre_exp_hand
 	 * we have no other choice so we keep using it.  Since not all
 	 * serial drivers might be OK with this, print a warning once per
 	 * boot if we detect this case.
+	 *
+	 * Pretend that walking the console list is safe...
 	 */
-	for_each_console(con)
+	for_each_console_kgdb(con) {
 		if (con == kgdboc_earlycon_io_ops.cons)
 			return;
+	}
 
 	already_warned = true;
 	pr_warn("kgdboc_earlycon is still using bootconsole\n");
--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -184,7 +184,17 @@ extern void console_list_unlock(void);
  * Requires console_lock to be held which guarantees that the
  * list is immutable.
  */
-#define for_each_console(con)						\
+#define for_each_console(con) \
+	for (con = console_drivers; con != NULL; con = con->next)
+
+/**
+ * for_each_console_kgdb() - Iterator over registered consoles for KGDB
+ * @con:	struct console pointer used as loop cursor
+ *
+ * Has no serialization requirements and KGDB pretends that this is safe.
+ * Don't use outside of the KGDB fairy tale land!
+ */
+#define for_each_console_kgdb(con)				\
 	for (con = console_drivers; con != NULL; con = con->next)
 
 extern int console_set_on_cmdline;
--- a/kernel/debug/kdb/kdb_io.c
+++ b/kernel/debug/kdb/kdb_io.c
@@ -558,7 +558,12 @@ static void kdb_msg_write(const char *ms
 		cp++;
 	}
 
-	for_each_console(c) {
+	/*
+	 * This is a completely unprotected list walk designed by the
+	 * wishful thinking departement. See the oops_inprogress comment
+	 * below - especially the encourage section...
+	 */
+	for_each_console_kgdb(c) {
 		if (!(c->flags & CON_ENABLED))
 			continue;
 		if (c == dbg_io_ops->cons)


^ permalink raw reply	[flat|nested] 62+ messages in thread

* [patch RFC 11/29] printk: Convert console_drivers list to hlist
  2022-09-10 22:27 [patch RFC 00/29] printk: A new approach - WIP Thomas Gleixner
                   ` (9 preceding siblings ...)
  2022-09-10 22:27 ` [patch RFC 10/29] kgbd: Pretend that console list walk is safe Thomas Gleixner
@ 2022-09-10 22:27 ` Thomas Gleixner
  2022-09-10 22:27 ` [patch RFC 12/29] printk: Prepare for SCRU console list protection Thomas Gleixner
                   ` (21 subsequent siblings)
  32 siblings, 0 replies; 62+ messages in thread
From: Thomas Gleixner @ 2022-09-10 22:27 UTC (permalink / raw)
  To: LKML
  Cc: John Ogness, Petr Mladek, Sergey Senozhatsky, Steven Rostedt,
	Linus Torvalds, Peter Zijlstra, Paul E. McKenney, Daniel Vetter,
	Greg Kroah-Hartman, Helge Deller, Jason Wessel, Daniel Thompson

Replace the open coded single linked list with a hlist so a conversion to
SRCU protected list walks can reuse the existing primitives.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/parisc/kernel/pdc_cons.c |   17 ++-----
 fs/proc/consoles.c            |    5 +-
 include/linux/console.h       |   19 +++++---
 kernel/printk/printk.c        |   99 ++++++++++++++++++++++--------------------
 4 files changed, 75 insertions(+), 65 deletions(-)

--- a/arch/parisc/kernel/pdc_cons.c
+++ b/arch/parisc/kernel/pdc_cons.c
@@ -147,13 +147,8 @@ static int __init pdc_console_tty_driver
 
 	struct console *tmp;
 
-	console_lock();
-	for_each_console(tmp)
-		if (tmp == &pdc_cons)
-			break;
-	console_unlock();
-
-	if (!tmp) {
+	/* Pretend that this works as much as it pretended to work historically */
+	if (hlist_unhashed_lockless(&pdc_cons.node)) {
 		printk(KERN_INFO "PDC console driver not registered anymore, not creating %s\n", pdc_cons.name);
 		return -ENODEV;
 	}
@@ -272,15 +267,15 @@ void pdc_console_restart(bool hpmc)
 	if (pdc_console_initialized)
 		return;
 
-	if (!hpmc && console_drivers)
+	if (!hpmc && !hlist_empty(&console_list))
 		return;
 
 	/* If we've already seen the output, don't bother to print it again */
-	if (console_drivers != NULL)
+	if (!hlist_empty(&console_list))
 		pdc_cons.flags &= ~CON_PRINTBUFFER;
 
-	while ((console = console_drivers) != NULL)
-		unregister_console(console_drivers);
+	while (!hlist_empty(&console_list))
+		unregister_console(READ_ONCE(console_list.first));
 
 	/* force registering the pdc console */
 	pdc_console_init_force();
--- a/fs/proc/consoles.c
+++ b/fs/proc/consoles.c
@@ -74,8 +74,11 @@ static void *c_start(struct seq_file *m,
 static void *c_next(struct seq_file *m, void *v, loff_t *pos)
 {
 	struct console *con = v;
+
 	++*pos;
-	return con->next;
+	hlist_for_each_entry_continue(con, node)
+		break;
+	return con;
 }
 
 static void c_stop(struct seq_file *m, void *v)
--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -15,6 +15,7 @@
 #define _LINUX_CONSOLE_H_ 1
 
 #include <linux/atomic.h>
+#include <linux/list.h>
 #include <linux/types.h>
 
 struct vc_data;
@@ -154,15 +155,19 @@ struct console {
 	u64	seq;
 	unsigned long dropped;
 	void	*data;
-	struct	 console *next;
+	struct hlist_node node;
 };
 
 #ifdef CONFIG_LOCKDEP
+extern void lockdep_assert_console_lock_held(void);
 extern void lockdep_assert_console_list_lock_held(void);
 #else
+static inline void lockdep_assert_console_lock_held(void) { }
 static inline void lockdep_assert_console_list_lock_held(void) { }
 #endif
 
+extern struct hlist_head console_list;
+
 extern void console_list_lock(void) __acquires(console_mutex);
 extern void console_list_unlock(void) __releases(console_mutex);
 
@@ -175,7 +180,7 @@ extern void console_list_unlock(void) __
  */
 #define for_each_registered_console(con)				\
 	lockdep_assert_console_list_lock_held();			\
-	for (con = console_drivers; con != NULL; con = con->next)
+	hlist_for_each_entry(con, &console_list, node)
 
 /**
  * for_each_console() - Iterator over registered consoles
@@ -184,8 +189,9 @@ extern void console_list_unlock(void) __
  * Requires console_lock to be held which guarantees that the
  * list is immutable.
  */
-#define for_each_console(con) \
-	for (con = console_drivers; con != NULL; con = con->next)
+#define for_each_console(con)						\
+	lockdep_assert_console_lock_held();				\
+	hlist_for_each_entry(con, &console_list, node)
 
 /**
  * for_each_console_kgdb() - Iterator over registered consoles for KGDB
@@ -194,8 +200,8 @@ extern void console_list_unlock(void) __
  * Has no serialization requirements and KGDB pretends that this is safe.
  * Don't use outside of the KGDB fairy tale land!
  */
-#define for_each_console_kgdb(con)				\
-	for (con = console_drivers; con != NULL; con = con->next)
+#define for_each_console_kgdb(con)					\
+	hlist_for_each_entry(con, &console_list, node)
 
 extern int console_set_on_cmdline;
 extern struct console *early_console;
@@ -208,7 +214,6 @@ enum con_flush_mode {
 extern int add_preferred_console(char *name, int idx, char *options);
 extern void register_console(struct console *);
 extern int unregister_console(struct console *);
-extern struct console *console_drivers;
 extern void console_lock(void);
 extern int console_trylock(void);
 extern void console_unlock(void);
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -79,17 +79,17 @@ int oops_in_progress;
 EXPORT_SYMBOL(oops_in_progress);
 
 /*
- * console_sem protects the console_drivers list, and also provides
- * serialization for access to the entire console driver system.
+ * console_sem protects onsole_list, and also provides serialization for
+ * access to the entire console driver system.
  *
  * console_mutex serializes register/unregister. console_sem has to be
- * taken for any list manipulation inside the console_mutex locked
+ * taken for any console_list manipulation inside the console_mutex locked
  * section to keep the console BKL machinery happy.
  */
 static DEFINE_MUTEX(console_mutex);
 static DEFINE_SEMAPHORE(console_sem);
-struct console *console_drivers;
-EXPORT_SYMBOL_GPL(console_drivers);
+HLIST_HEAD(console_list);
+EXPORT_SYMBOL_GPL(console_list);
 
 /*
  * System may need to suppress printk message under certain
@@ -108,6 +108,11 @@ static struct lockdep_map console_lock_d
 	.name = "console_lock"
 };
 
+void lockdep_assert_console_lock_held(void)
+{
+	lockdep_assert(lock_is_held(&console_lock_dep_map));
+}
+
 void lockdep_assert_console_list_lock_held(void)
 {
 	lockdep_assert_held(&console_mutex);
@@ -2586,7 +2591,7 @@ static int console_cpu_notify(unsigned i
  * console_lock - lock the console system for exclusive use.
  *
  * Acquires a lock which guarantees that the caller has
- * exclusive access to the console system and the console_drivers list.
+ * exclusive access to the console system and console_list.
  *
  * Can sleep, returns nothing.
  */
@@ -2606,7 +2611,7 @@ EXPORT_SYMBOL(console_lock);
  * console_trylock - try to lock the console system for exclusive use.
  *
  * Try to acquire a lock which guarantees that the caller has exclusive
- * access to the console system and the console_drivers list.
+ * access to the console system and console_list.
  *
  * returns 1 on success, and 0 on failure to acquire the lock.
  */
@@ -2974,7 +2979,15 @@ void console_flush_on_panic(enum con_flu
 		u64 seq;
 
 		seq = prb_first_valid_seq(prb);
-		for_each_console(c)
+		/*
+		 * This cannot use for_each_console() because it's not established
+		 * that the current context has console locked and neither there is
+		 * a guarantee that there is no concurrency in that case.
+		 *
+		 * Open code it for documentation purposes and pretend that
+		 * it works.
+		 */
+		hlist_for_each_entry(c, &console_list, node)
 			c->seq = seq;
 	}
 	console_unlock();
@@ -3115,6 +3128,9 @@ static void try_enable_default_console(s
 	       (con->flags & CON_BOOT) ? "boot" : "",	\
 	       con->name, con->index, ##__VA_ARGS__)
 
+#define cons_first()					\
+	hlist_entry(console_list.first, struct console, node)
+
 static int console_unregister_locked(struct console *console);
 
 /*
@@ -3177,8 +3193,8 @@ void register_console(struct console *ne
 	 * flag set and will be first in the list.
 	 */
 	if (preferred_console < 0) {
-		if (!console_drivers || !console_drivers->device ||
-		    console_drivers->flags & CON_BOOT) {
+		if (hlist_empty(&console_list) || !cons_first()->device ||
+		    cons_first()->flags & CON_BOOT) {
 			try_enable_default_console(newcon);
 		}
 	}
@@ -3206,21 +3222,17 @@ void register_console(struct console *ne
 	}
 
 	/*
-	 *	Put this console in the list - keep the
-	 *	preferred driver at the head of the list.
+	 * Put this console in the list and keep the referred driver at the
+	 * head of the list.
 	 */
 	console_lock();
-	if ((newcon->flags & CON_CONSDEV) || console_drivers == NULL) {
-		newcon->next = console_drivers;
-		console_drivers = newcon;
-		if (newcon->next)
-			newcon->next->flags &= ~CON_CONSDEV;
-		/* Ensure this flag is always set for the head of the list */
-		newcon->flags |= CON_CONSDEV;
-	} else {
-		newcon->next = console_drivers->next;
-		console_drivers->next = newcon;
-	}
+	if (newcon->flags & CON_CONSDEV || hlist_empty(&console_list))
+		hlist_add_head(&newcon->node, &console_list);
+	else
+		hlist_add_behind(&newcon->node, console_list.first);
+
+	/* Ensure this flag is always set for the head of the list */
+	cons_first()->flags |= CON_CONSDEV;
 
 	newcon->dropped = 0;
 	if (newcon->flags & CON_PRINTBUFFER) {
@@ -3246,7 +3258,9 @@ void register_console(struct console *ne
 	if (bootcon_enabled &&
 	    ((newcon->flags & (CON_CONSDEV | CON_BOOT)) == CON_CONSDEV) &&
 	    !keep_bootcon) {
-		for_each_console(con) {
+		struct hlist_node *tmp;
+
+		hlist_for_each_entry_safe(con, tmp, &console_list, node) {
 			if (con->flags & CON_BOOT)
 				console_unregister_locked(con);
 		}
@@ -3258,7 +3272,6 @@ EXPORT_SYMBOL(register_console);
 
 static int console_unregister_locked(struct console *console)
 {
-	struct console *con;
 	int res;
 
 	con_printk(KERN_INFO, console, "disabled\n");
@@ -3269,32 +3282,28 @@ static int console_unregister_locked(str
 	if (res > 0)
 		return 0;
 
-	res = -ENODEV;
 	console_lock();
-	if (console_drivers == console) {
-		console_drivers=console->next;
-		res = 0;
-	} else {
-		for_each_console(con) {
-			if (con->next == console) {
-				con->next = console->next;
-				res = 0;
-				break;
-			}
-		}
-	}
 
-	if (res)
-		goto out_disable_unlock;
+	/* Disable it unconditionally */
+	console->flags &= ~CON_ENABLED;
+
+	if (hlist_unhashed(&console->node))
+		goto out_unlock;
+
+	hlist_del_init(&console->node);
 
 	/*
+	 * <HISTORICAL>
 	 * If this isn't the last console and it has CON_CONSDEV set, we
 	 * need to set it on the next preferred console.
+	 * </HISTORICAL>
+	 *
+	 * The above makes no sense as there is no guarantee that the next
+	 * console has any device attached. Oh well....
 	 */
-	if (console_drivers != NULL && console->flags & CON_CONSDEV)
-		console_drivers->flags |= CON_CONSDEV;
+	if (!hlist_empty(&console_list) && console->flags & CON_CONSDEV)
+		cons_first()->flags |= CON_CONSDEV;
 
-	console->flags &= ~CON_ENABLED;
 	console_unlock();
 	console_sysfs_notify();
 
@@ -3303,10 +3312,8 @@ static int console_unregister_locked(str
 
 	return res;
 
-out_disable_unlock:
-	console->flags &= ~CON_ENABLED;
+out_unlock:
 	console_unlock();
-
 	return res;
 }
 


^ permalink raw reply	[flat|nested] 62+ messages in thread

* [patch RFC 12/29] printk: Prepare for SCRU console list protection
  2022-09-10 22:27 [patch RFC 00/29] printk: A new approach - WIP Thomas Gleixner
                   ` (10 preceding siblings ...)
  2022-09-10 22:27 ` [patch RFC 11/29] printk: Convert console_drivers list to hlist Thomas Gleixner
@ 2022-09-10 22:27 ` Thomas Gleixner
  2022-09-10 22:27 ` [patch RFC 13/29] printk: Move buffer size defines Thomas Gleixner
                   ` (20 subsequent siblings)
  32 siblings, 0 replies; 62+ messages in thread
From: Thomas Gleixner @ 2022-09-10 22:27 UTC (permalink / raw)
  To: LKML
  Cc: John Ogness, Petr Mladek, Sergey Senozhatsky, Steven Rostedt,
	Linus Torvalds, Peter Zijlstra, Paul E. McKenney, Daniel Vetter,
	Greg Kroah-Hartman, Helge Deller, Jason Wessel, Daniel Thompson

Provide a SRCU protected variant to walk the console list.

Preperatory change for a new console infrastructure which operates
independent of console BKL.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/console.h |   14 +++++++++++++-
 kernel/printk/printk.c  |   16 +++++++++++++---
 2 files changed, 26 insertions(+), 4 deletions(-)

--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -15,7 +15,7 @@
 #define _LINUX_CONSOLE_H_ 1
 
 #include <linux/atomic.h>
-#include <linux/list.h>
+#include <linux/rculist.h>
 #include <linux/types.h>
 
 struct vc_data;
@@ -161,6 +161,7 @@ struct console {
 #ifdef CONFIG_LOCKDEP
 extern void lockdep_assert_console_lock_held(void);
 extern void lockdep_assert_console_list_lock_held(void);
+extern bool console_srcu_read_lock_is_held(void);
 #else
 static inline void lockdep_assert_console_lock_held(void) { }
 static inline void lockdep_assert_console_list_lock_held(void) { }
@@ -172,6 +173,17 @@ extern void console_list_lock(void) __ac
 extern void console_list_unlock(void) __releases(console_mutex);
 
 /**
+ * for_each_console_srcu() - Iterator over registered consoles
+ * @con:	struct console pointer used as loop cursor
+ *
+ * Requires console_srcu_read_lock to be held. Can be invoked from
+ * any context.
+ */
+#define for_each_console_srcu(con)					\
+	hlist_for_each_entry_srcu(con, &console_list, node,		\
+				  console_srcu_read_lock_is_held())
+
+/**
  * for_each_registered_console() - Iterator over registered consoles
  * @con:	struct console pointer used as loop cursor
  *
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -90,6 +90,7 @@ static DEFINE_MUTEX(console_mutex);
 static DEFINE_SEMAPHORE(console_sem);
 HLIST_HEAD(console_list);
 EXPORT_SYMBOL_GPL(console_list);
+DEFINE_STATIC_SRCU(console_srcu);
 
 /*
  * System may need to suppress printk message under certain
@@ -118,6 +119,10 @@ void lockdep_assert_console_list_lock_he
 	lockdep_assert_held(&console_mutex);
 }
 
+bool console_srcu_read_lock_is_held(void)
+{
+	return srcu_read_lock_held(&console_srcu);
+}
 #endif
 
 enum devkmsg_log_bits {
@@ -3227,9 +3232,9 @@ void register_console(struct console *ne
 	 */
 	console_lock();
 	if (newcon->flags & CON_CONSDEV || hlist_empty(&console_list))
-		hlist_add_head(&newcon->node, &console_list);
+		hlist_add_head_rcu(&newcon->node, &console_list);
 	else
-		hlist_add_behind(&newcon->node, console_list.first);
+		hlist_add_behind_rcu(&newcon->node, console_list.first);
 
 	/* Ensure this flag is always set for the head of the list */
 	cons_first()->flags |= CON_CONSDEV;
@@ -3245,6 +3250,7 @@ void register_console(struct console *ne
 		newcon->seq = prb_next_seq(prb);
 	}
 	console_unlock();
+	/* No need to synchronize SRCU here! */
 	console_sysfs_notify();
 
 	/*
@@ -3290,7 +3296,7 @@ static int console_unregister_locked(str
 	if (hlist_unhashed(&console->node))
 		goto out_unlock;
 
-	hlist_del_init(&console->node);
+	hlist_del_init_rcu(&console->node);
 
 	/*
 	 * <HISTORICAL>
@@ -3305,6 +3311,10 @@ static int console_unregister_locked(str
 		cons_first()->flags |= CON_CONSDEV;
 
 	console_unlock();
+
+	/* Ensure that all SRCU list walks have completed */
+	synchronize_srcu(&console_srcu);
+
 	console_sysfs_notify();
 
 	if (console->exit)


^ permalink raw reply	[flat|nested] 62+ messages in thread

* [patch RFC 13/29] printk: Move buffer size defines
  2022-09-10 22:27 [patch RFC 00/29] printk: A new approach - WIP Thomas Gleixner
                   ` (11 preceding siblings ...)
  2022-09-10 22:27 ` [patch RFC 12/29] printk: Prepare for SCRU console list protection Thomas Gleixner
@ 2022-09-10 22:27 ` Thomas Gleixner
  2022-09-10 22:27 ` [patch RFC 14/29] printk: Document struct console Thomas Gleixner
                   ` (19 subsequent siblings)
  32 siblings, 0 replies; 62+ messages in thread
From: Thomas Gleixner @ 2022-09-10 22:27 UTC (permalink / raw)
  To: LKML
  Cc: John Ogness, Petr Mladek, Sergey Senozhatsky, Steven Rostedt,
	Linus Torvalds, Peter Zijlstra, Paul E. McKenney, Daniel Vetter,
	Greg Kroah-Hartman, Helge Deller, Jason Wessel, Daniel Thompson

Move the buffer size defines to console.h in preperation of adding a buffer
structure.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/console.h |   14 ++++++++++++++
 include/linux/printk.h  |    2 --
 kernel/printk/printk.c  |    8 --------
 3 files changed, 14 insertions(+), 10 deletions(-)

--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -122,6 +122,20 @@ static inline int con_debug_leave(void)
 #define CM_ERASE    (2)
 #define CM_MOVE     (3)
 
+#ifdef CONFIG_PRINTK
+/* The maximum size of a formatted record (i.e. with prefix added per line) */
+#define CONSOLE_LOG_MAX		1024
+
+/* The maximum size for a dropped text message */
+#define DROPPED_TEXT_MAX	64
+#else
+#define CONSOLE_LOG_MAX		0
+#define DROPPED_TEXT_MAX	0
+#endif
+
+/* The maximum size of an formatted extended record */
+#define CONSOLE_EXT_LOG_MAX	8192
+
 /*
  * The interface for a console, or any other device that wants to capture
  * console messages (printer driver?)
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -44,8 +44,6 @@ static inline const char *printk_skip_he
 	return buffer;
 }
 
-#define CONSOLE_EXT_LOG_MAX	8192
-
 /* printk's without a loglevel use this.. */
 #define MESSAGE_LOGLEVEL_DEFAULT CONFIG_MESSAGE_LOGLEVEL_DEFAULT
 
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -428,12 +428,6 @@ static struct latched_seq clear_seq = {
 #define PREFIX_MAX		32
 #endif
 
-/* the maximum size of a formatted record (i.e. with prefix added per line) */
-#define CONSOLE_LOG_MAX		1024
-
-/* the maximum size for a dropped text message */
-#define DROPPED_TEXT_MAX	64
-
 /* the maximum size allowed to be reserved for a record */
 #define LOG_LINE_MAX		(CONSOLE_LOG_MAX - PREFIX_MAX)
 
@@ -2338,8 +2332,6 @@ static bool __pr_flush(struct console *c
 
 #else /* CONFIG_PRINTK */
 
-#define CONSOLE_LOG_MAX		0
-#define DROPPED_TEXT_MAX	0
 #define printk_time		false
 
 #define prb_read_valid(rb, seq, r)	false


^ permalink raw reply	[flat|nested] 62+ messages in thread

* [patch RFC 14/29] printk: Document struct console
  2022-09-10 22:27 [patch RFC 00/29] printk: A new approach - WIP Thomas Gleixner
                   ` (12 preceding siblings ...)
  2022-09-10 22:27 ` [patch RFC 13/29] printk: Move buffer size defines Thomas Gleixner
@ 2022-09-10 22:27 ` Thomas Gleixner
  2022-09-10 22:27 ` [patch RFC 15/29] printk: Add struct cons_text_buf Thomas Gleixner
                   ` (18 subsequent siblings)
  32 siblings, 0 replies; 62+ messages in thread
From: Thomas Gleixner @ 2022-09-10 22:27 UTC (permalink / raw)
  To: LKML
  Cc: John Ogness, Petr Mladek, Sergey Senozhatsky, Steven Rostedt,
	Linus Torvalds, Peter Zijlstra, Paul E. McKenney, Daniel Vetter,
	Greg Kroah-Hartman, Helge Deller, Jason Wessel, Daniel Thompson

Add docbook comments to struct console.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/console.h |   94 ++++++++++++++++++++++++++++++++++--------------
 1 file changed, 67 insertions(+), 27 deletions(-)

--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -15,6 +15,7 @@
 #define _LINUX_CONSOLE_H_ 1
 
 #include <linux/atomic.h>
+#include <linux/bits.h>
 #include <linux/rculist.h>
 #include <linux/types.h>
 
@@ -139,37 +140,76 @@ static inline int con_debug_leave(void)
 /*
  * The interface for a console, or any other device that wants to capture
  * console messages (printer driver?)
- *
- * If a console driver is marked CON_BOOT then it will be auto-unregistered
- * when the first real console is registered.  This is for early-printk drivers.
  */
 
-#define CON_PRINTBUFFER	(1)
-#define CON_CONSDEV	(2) /* Preferred console, /dev/console */
-#define CON_ENABLED	(4)
-#define CON_BOOT	(8)
-#define CON_ANYTIME	(16) /* Safe to call when cpu is offline */
-#define CON_BRL		(32) /* Used for a braille device */
-#define CON_EXTENDED	(64) /* Use the extended output format a la /dev/kmsg */
+/**
+ * cons_flags - General console flags
+ * @CON_PRINTBUFFER:	Print the complete dmesg backlog on register/enable
+ * @CON_CONSDEV:	Questionable historical leftover to denote which console
+ *			driver is the preferred console which is defining what
+ *			backs up /dev/console
+ * @CON_ENABLED:	General enable state subject to note #1
+ * @CON_BOOT:		Marks the console driver as early console driver which
+ *			is used during boot before the real driver becomes available.
+ *			It will be automatically unregistered unless the early console
+ *			command line parameter for this console has the 'keep' option set.
+ * @CON_ANYTIME:	A misnomed historical flag which tells the core code that the
+ *			legacy @console::write callback can be invoked on a CPU which
+ *			is marked OFFLINE. That's misleading as it suggests that there
+ *			is no contextual limit for invoking the callback.
+ * @CON_BRL:		Indicates a braille device which is exempt from receiving the
+ *			printk spam for obvious reasons
+ * @CON_EXTENDED:	The console supports the extended output format of /dev/kmesg
+ *			which requires a larger output record buffer
+ */
+enum cons_flags {
+	CON_PRINTBUFFER		= BIT(0),
+	CON_CONSDEV		= BIT(1),
+	CON_ENABLED		= BIT(2),
+	CON_BOOT		= BIT(3),
+	CON_ANYTIME		= BIT(4),
+	CON_BRL			= BIT(5),
+	CON_EXTENDED		= BIT(6),
+};
 
+/**
+ * struct console - The console descriptor structure
+ * @name:		The name of the console driver
+ * @write:		Write callback to output messages (Optional)
+ * @read:		Read callback for console input (Optional)
+ * @device:		The underlying TTY device driver (Optional)
+ * @unblank:		Callback to unblank the console (Optional)
+ * @setup:		Callback for initializing the console (Optional)
+ * @exit:		Callback for teardown of the console (Optional)
+ * @match:		Callback for matching a console (Optional)
+ * @flags:		Console flags. See enum cons_flags
+ * @index:		Console index, e.g. port number
+ * @cflag:		TTY control mode flags
+ * @ispeed:		TTY input speed
+ * @ospeed:		TTY output speed
+ * @seq:		Sequence number of the last ringbuffer record printed
+ * @dropped:		Number of dropped ringbuffer records
+ * @data:		Driver private data
+ * @node:		hlist node for the console list
+ */
 struct console {
-	char	name[16];
-	void	(*write)(struct console *, const char *, unsigned);
-	int	(*read)(struct console *, char *, unsigned);
-	struct tty_driver *(*device)(struct console *, int *);
-	void	(*unblank)(void);
-	int	(*setup)(struct console *, char *);
-	int	(*exit)(struct console *);
-	int	(*match)(struct console *, char *name, int idx, char *options);
-	short	flags;
-	short	index;
-	int	cflag;
-	uint	ispeed;
-	uint	ospeed;
-	u64	seq;
-	unsigned long dropped;
-	void	*data;
-	struct hlist_node node;
+	char			name[16];
+	void			(*write)(struct console *, const char *, unsigned);
+	int			(*read)(struct console *, char *, unsigned);
+	struct tty_driver	 *(*device)(struct console *, int *);
+	void			(*unblank)(void);
+	int			(*setup)(struct console *, char *);
+	int			(*exit)(struct console *);
+	int			(*match)(struct console *, char *name, int idx, char *options);
+	short			flags;
+	short			index;
+	int			cflag;
+	uint			ispeed;
+	uint			ospeed;
+	u64			seq;
+	unsigned long		dropped;
+	void			*data;
+	struct hlist_node	node;
 };
 
 #ifdef CONFIG_LOCKDEP


^ permalink raw reply	[flat|nested] 62+ messages in thread

* [patch RFC 15/29] printk: Add struct cons_text_buf
  2022-09-10 22:27 [patch RFC 00/29] printk: A new approach - WIP Thomas Gleixner
                   ` (13 preceding siblings ...)
  2022-09-10 22:27 ` [patch RFC 14/29] printk: Document struct console Thomas Gleixner
@ 2022-09-10 22:27 ` Thomas Gleixner
  2022-09-10 22:27 ` [patch RFC 16/29] printk: Use " Thomas Gleixner
                   ` (17 subsequent siblings)
  32 siblings, 0 replies; 62+ messages in thread
From: Thomas Gleixner @ 2022-09-10 22:27 UTC (permalink / raw)
  To: LKML
  Cc: John Ogness, Petr Mladek, Sergey Senozhatsky, Steven Rostedt,
	Linus Torvalds, Peter Zijlstra, Paul E. McKenney, Daniel Vetter,
	Greg Kroah-Hartman, Helge Deller, Jason Wessel, Daniel Thompson

Create a data structure to replace the open coded seperate buffers for
regular and extended formatting.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/console.h |   14 ++++++++++++++
 1 file changed, 14 insertions(+)

--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -173,6 +173,20 @@ enum cons_flags {
 };
 
 /**
+ * struct cons_text_buf - console output text buffer
+ * @ext_text:		Buffer for extended log format text
+ * @dropped_text:	Buffer for dropped text
+ * @text:		Buffer for ringbuffer text
+ */
+struct cons_text_buf {
+	union {
+		char	ext_text[CONSOLE_EXT_LOG_MAX];
+		char	dropped_text[DROPPED_TEXT_MAX];
+	};
+	char		text[CONSOLE_LOG_MAX];
+};
+
+/**
  * struct console - The console descriptor structure
  * @name:		The name of the console driver
  * @write:		Write callback to output messages (Optional)


^ permalink raw reply	[flat|nested] 62+ messages in thread

* [patch RFC 16/29] printk: Use struct cons_text_buf
  2022-09-10 22:27 [patch RFC 00/29] printk: A new approach - WIP Thomas Gleixner
                   ` (14 preceding siblings ...)
  2022-09-10 22:27 ` [patch RFC 15/29] printk: Add struct cons_text_buf Thomas Gleixner
@ 2022-09-10 22:27 ` Thomas Gleixner
  2022-09-10 22:27 ` [patch RFC 17/29] printk: Use an output descriptor struct for emit Thomas Gleixner
                   ` (16 subsequent siblings)
  32 siblings, 0 replies; 62+ messages in thread
From: Thomas Gleixner @ 2022-09-10 22:27 UTC (permalink / raw)
  To: LKML
  Cc: John Ogness, Petr Mladek, Sergey Senozhatsky, Steven Rostedt,
	Linus Torvalds, Peter Zijlstra, Paul E. McKenney, Daniel Vetter,
	Greg Kroah-Hartman, Helge Deller, Jason Wessel, Daniel Thompson

Replace the seperately allocated output buffers with a single instance of
struct cons_text_buf.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/printk/printk.c |   50 ++++++++++++++++++++-----------------------------
 1 file changed, 21 insertions(+), 29 deletions(-)

--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -666,11 +666,9 @@ struct devkmsg_user {
 	atomic64_t seq;
 	struct ratelimit_state rs;
 	struct mutex lock;
-	char buf[CONSOLE_EXT_LOG_MAX];
-
 	struct printk_info info;
-	char text_buf[CONSOLE_EXT_LOG_MAX];
 	struct printk_record record;
+	struct cons_text_buf txtbuf;
 };
 
 static __printf(3, 4) __cold
@@ -753,6 +751,8 @@ static ssize_t devkmsg_read(struct file
 {
 	struct devkmsg_user *user = file->private_data;
 	struct printk_record *r = &user->record;
+	char *outbuf = user->txtbuf.ext_text;
+	const int maxlen = sizeof(user->txtbuf.ext_text);
 	size_t len;
 	ssize_t ret;
 
@@ -793,8 +793,8 @@ static ssize_t devkmsg_read(struct file
 		goto out;
 	}
 
-	len = info_print_ext_header(user->buf, sizeof(user->buf), r->info);
-	len += msg_print_ext_body(user->buf + len, sizeof(user->buf) - len,
+	len = info_print_ext_header(outbuf, maxlen, r->info);
+	len += msg_print_ext_body(outbuf + len, maxlen - len,
 				  &r->text_buf[0], r->info->text_len,
 				  &r->info->dev_info);
 
@@ -805,7 +805,7 @@ static ssize_t devkmsg_read(struct file
 		goto out;
 	}
 
-	if (copy_to_user(buf, user->buf, len)) {
+	if (copy_to_user(buf, outbuf, len)) {
 		ret = -EFAULT;
 		goto out;
 	}
@@ -904,7 +904,8 @@ static int devkmsg_open(struct inode *in
 	mutex_init(&user->lock);
 
 	prb_rec_init_rd(&user->record, &user->info,
-			&user->text_buf[0], sizeof(user->text_buf));
+			user->txtbuf.text,
+			sizeof(user->txtbuf.text));
 
 	atomic64_set(&user->seq, prb_first_valid_seq(prb));
 
@@ -2704,8 +2705,8 @@ static void __console_unlock(void)
  *
  * Requires the console_lock.
  */
-static bool console_emit_next_record(struct console *con, char *text, char *ext_text,
-				     char *dropped_text, bool *handover)
+static bool console_emit_next_record(struct console *con, struct cons_text_buf *txtbuf,
+				     bool *handover, bool extmsg)
 {
 	static int panic_console_dropped;
 	struct printk_info info;
@@ -2714,7 +2715,7 @@ static bool console_emit_next_record(str
 	char *write_text;
 	size_t len;
 
-	prb_rec_init_rd(&r, &info, text, CONSOLE_LOG_MAX);
+	prb_rec_init_rd(&r, &info, txtbuf->text, CONSOLE_LOG_MAX);
 
 	*handover = false;
 
@@ -2736,13 +2737,13 @@ static bool console_emit_next_record(str
 		goto skip;
 	}
 
-	if (ext_text) {
-		write_text = ext_text;
-		len = info_print_ext_header(ext_text, CONSOLE_EXT_LOG_MAX, r.info);
-		len += msg_print_ext_body(ext_text + len, CONSOLE_EXT_LOG_MAX - len,
+	if (extmsg) {
+		write_text = txtbuf->ext_text;
+		len = info_print_ext_header(write_text, CONSOLE_EXT_LOG_MAX, r.info);
+		len += msg_print_ext_body(write_text + len, CONSOLE_EXT_LOG_MAX - len,
 					  &r.text_buf[0], r.info->text_len, &r.info->dev_info);
 	} else {
-		write_text = text;
+		write_text = txtbuf->text;
 		len = record_print_text(&r, console_msg_format & MSG_FORMAT_SYSLOG, printk_time);
 	}
 
@@ -2760,7 +2761,7 @@ static bool console_emit_next_record(str
 	console_lock_spinning_enable();
 
 	stop_critical_timings();	/* don't trace print latency */
-	call_console_driver(con, write_text, len, dropped_text);
+	call_console_driver(con, write_text, len, extmsg ? NULL : txtbuf->dropped_text);
 	start_critical_timings();
 
 	con->seq++;
@@ -2796,9 +2797,7 @@ static bool console_emit_next_record(str
  */
 static bool console_flush_all(bool do_cond_resched, u64 *next_seq, bool *handover)
 {
-	static char dropped_text[DROPPED_TEXT_MAX];
-	static char ext_text[CONSOLE_EXT_LOG_MAX];
-	static char text[CONSOLE_LOG_MAX];
+	static struct cons_text_buf txtbuf;
 	bool any_usable = false;
 	struct console *con;
 	bool any_progress;
@@ -2816,16 +2815,9 @@ static bool console_flush_all(bool do_co
 				continue;
 			any_usable = true;
 
-			if (con->flags & CON_EXTENDED) {
-				/* Extended consoles do not print "dropped messages". */
-				progress = console_emit_next_record(con, &text[0],
-								    &ext_text[0], NULL,
-								    handover);
-			} else {
-				progress = console_emit_next_record(con, &text[0],
-								    NULL, &dropped_text[0],
-								    handover);
-			}
+			progress = console_emit_next_record(con, &txtbuf, handover,
+							    con->flags & CON_EXTENDED);
+
 			if (*handover)
 				return false;
 


^ permalink raw reply	[flat|nested] 62+ messages in thread

* [patch RFC 17/29] printk: Use an output descriptor struct for emit
  2022-09-10 22:27 [patch RFC 00/29] printk: A new approach - WIP Thomas Gleixner
                   ` (15 preceding siblings ...)
  2022-09-10 22:27 ` [patch RFC 16/29] printk: Use " Thomas Gleixner
@ 2022-09-10 22:27 ` Thomas Gleixner
  2022-09-10 22:27 ` [patch RFC 18/29] printk: Handle dropped message smarter Thomas Gleixner
                   ` (15 subsequent siblings)
  32 siblings, 0 replies; 62+ messages in thread
From: Thomas Gleixner @ 2022-09-10 22:27 UTC (permalink / raw)
  To: LKML
  Cc: John Ogness, Petr Mladek, Sergey Senozhatsky, Steven Rostedt,
	Linus Torvalds, Peter Zijlstra, Paul E. McKenney, Daniel Vetter,
	Greg Kroah-Hartman, Helge Deller, Jason Wessel, Daniel Thompson

To prepare for a new console infrastructure which is independent of the
console BKL wrap the output mode into a descriptor struct so the new
infrastrucure can share the emit code which dumps the ringbuffer record
into the output text buffers.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/console.h |   15 +++++++
 kernel/printk/printk.c  |   94 ++++++++++++++++++++++++++++++++++--------------
 2 files changed, 82 insertions(+), 27 deletions(-)

--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -187,6 +187,21 @@ struct cons_text_buf {
 };
 
 /**
+ * struct cons_outbuf_desc - console output buffer descriptor
+ * @txtbuf:		Pointer to buffer for storing the text
+ * @outbuf:		Pointer to the position in @buffer for
+ *			writing it out to the device
+ * @len:		Message length
+ * @extmsg:		Select extended format printing
+ */
+struct cons_outbuf_desc {
+	struct cons_text_buf	*txtbuf;
+	char			*outbuf;
+	unsigned int		len;
+	bool			extmsg;
+};
+
+/**
  * struct console - The console descriptor structure
  * @name:		The name of the console driver
  * @write:		Write callback to output messages (Optional)
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2684,40 +2684,39 @@ static void __console_unlock(void)
 	up_console_sem();
 }
 
-/*
- * Print one record for the given console. The record printed is whatever
- * record is the next available record for the given console.
- *
- * @text is a buffer of size CONSOLE_LOG_MAX.
- *
- * If extended messages should be printed, @ext_text is a buffer of size
- * CONSOLE_EXT_LOG_MAX. Otherwise @ext_text must be NULL.
- *
- * If dropped messages should be printed, @dropped_text is a buffer of size
- * DROPPED_TEXT_MAX. Otherwise @dropped_text must be NULL.
- *
- * @handover will be set to true if a printk waiter has taken over the
- * console_lock, in which case the caller is no longer holding the
- * console_lock. Otherwise it is set to false.
- *
- * Returns false if the given console has no next record to print, otherwise
- * true.
+
+/**
+ * cons_fill_outbuf - Fill the output buffer with the next record
+ * @con:	The console to print on
+ * @desc:	Pointer to the output descriptor
+ *
+ * The output descriptor contains all information which is necessary
+ * to print (buffer pointer, extended format selector...).
+ *
+ * Returns: False if there is no pending record in the ringbuffer
+ *	    True if there is a pending record in the ringbuffer.
+ *
+ * When the return value is true, then the caller has to check
+ * @desc->outbuf. If not NULL it points to the first character to write to
+ * the device and @desc->len contains the length of the message.
  *
- * Requires the console_lock.
+ * If it is NULL then records have been dropped or skipped and con->seq
+ * has been forwarded so the caller can try to print the next record.
  */
-static bool console_emit_next_record(struct console *con, struct cons_text_buf *txtbuf,
-				     bool *handover, bool extmsg)
+static bool cons_fill_outbuf(struct console *con, struct cons_outbuf_desc *desc)
 {
 	static int panic_console_dropped;
+
+	struct cons_text_buf *txtbuf = desc->txtbuf;
 	struct printk_info info;
 	struct printk_record r;
-	unsigned long flags;
 	char *write_text;
 	size_t len;
 
-	prb_rec_init_rd(&r, &info, txtbuf->text, CONSOLE_LOG_MAX);
+	desc->outbuf = NULL;
+	desc->len = 0;
 
-	*handover = false;
+	prb_rec_init_rd(&r, &info, txtbuf->text, CONSOLE_LOG_MAX);
 
 	if (!prb_read_valid(prb, con->seq, &r))
 		return false;
@@ -2734,10 +2733,10 @@ static bool console_emit_next_record(str
 	/* Skip record that has level above the console loglevel. */
 	if (suppress_message_printing(r.info->level)) {
 		con->seq++;
-		goto skip;
+		return true;
 	}
 
-	if (extmsg) {
+	if (desc->extmsg) {
 		write_text = txtbuf->ext_text;
 		len = info_print_ext_header(write_text, CONSOLE_EXT_LOG_MAX, r.info);
 		len += msg_print_ext_body(write_text + len, CONSOLE_EXT_LOG_MAX - len,
@@ -2747,6 +2746,47 @@ static bool console_emit_next_record(str
 		len = record_print_text(&r, console_msg_format & MSG_FORMAT_SYSLOG, printk_time);
 	}
 
+	desc->len = len;
+	desc->outbuf = write_text;
+	return true;
+}
+
+/**
+ * console_emit_next_record - Print one record for the given console
+ * @con:	The console to print on
+ * @txtbuf:	Pointer to the output buffer
+ * @handover:	Pointer to Handover handshake storage
+ * @extmsg:	Selects extended message format
+ *
+ * The record printed is whatever record is the next available record for
+ * the given console.
+ *
+ * @handover will be set to true if a printk waiter has taken over the
+ * console_lock, in which case the caller is no longer holding the
+ * console_lock. Otherwise it is set to false.
+ *
+ * Returns false if the given console has no next record to print, otherwise
+ * true.
+ *
+ * Requires the console_lock.
+ */
+static bool console_emit_next_record(struct console *con, struct cons_text_buf *txtbuf,
+				     bool *handover, bool extmsg)
+{
+	struct cons_outbuf_desc desc = {
+		.txtbuf	= txtbuf,
+		.extmsg = extmsg,
+	};
+	unsigned long flags;
+
+	*handover = false;
+
+	if (!cons_fill_outbuf(con, &desc))
+		return false;
+
+	if (!desc.outbuf)
+		goto skip;
+
 	/*
 	 * While actively printing out messages, if another printk()
 	 * were to occur on another CPU, it may wait for this one to
@@ -2761,7 +2801,7 @@ static bool console_emit_next_record(str
 	console_lock_spinning_enable();
 
 	stop_critical_timings();	/* don't trace print latency */
-	call_console_driver(con, write_text, len, extmsg ? NULL : txtbuf->dropped_text);
+	call_console_driver(con, desc.outbuf, desc.len, extmsg ? NULL : txtbuf->dropped_text);
 	start_critical_timings();
 
 	con->seq++;


^ permalink raw reply	[flat|nested] 62+ messages in thread

* [patch RFC 18/29] printk: Handle dropped message smarter
  2022-09-10 22:27 [patch RFC 00/29] printk: A new approach - WIP Thomas Gleixner
                   ` (16 preceding siblings ...)
  2022-09-10 22:27 ` [patch RFC 17/29] printk: Use an output descriptor struct for emit Thomas Gleixner
@ 2022-09-10 22:27 ` Thomas Gleixner
  2022-09-10 22:28 ` [patch RFC 19/29] printk: Add basic infrastructure for non-BKL consoles Thomas Gleixner
                   ` (14 subsequent siblings)
  32 siblings, 0 replies; 62+ messages in thread
From: Thomas Gleixner @ 2022-09-10 22:27 UTC (permalink / raw)
  To: LKML
  Cc: John Ogness, Petr Mladek, Sergey Senozhatsky, Steven Rostedt,
	Linus Torvalds, Peter Zijlstra, Paul E. McKenney, Daniel Vetter,
	Greg Kroah-Hartman, Helge Deller, Jason Wessel, Daniel Thompson

No need for an extra buffer type. Regular formatting which needs the '$N
messages dropped' printout can sprintf() it into the unused extended text
buffer.

As a further simplification move the 'dropped' message right in front of
the actual record text and let the write function output it in one go.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/console.h |   12 ++----
 kernel/printk/printk.c  |   86 +++++++++++++++++++++++++++++++-----------------
 2 files changed, 61 insertions(+), 37 deletions(-)

--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -175,28 +175,26 @@ enum cons_flags {
 /**
  * struct cons_text_buf - console output text buffer
  * @ext_text:		Buffer for extended log format text
- * @dropped_text:	Buffer for dropped text
  * @text:		Buffer for ringbuffer text
  */
 struct cons_text_buf {
-	union {
-		char	ext_text[CONSOLE_EXT_LOG_MAX];
-		char	dropped_text[DROPPED_TEXT_MAX];
-	};
-	char		text[CONSOLE_LOG_MAX];
-};
+	char	ext_text[CONSOLE_EXT_LOG_MAX];
+	char	text[CONSOLE_LOG_MAX];
+} __no_randomize_layout;
 
 /**
  * struct cons_outbuf_desc - console output buffer descriptor
  * @txtbuf:		Pointer to buffer for storing the text
  * @outbuf:		Pointer to the position in @buffer for
  *			writing it out to the device
+ * @dropped:		The dropped count
  * @len:		Message length
  * @extmsg:		Select extended format printing
  */
 struct cons_outbuf_desc {
 	struct cons_text_buf	*txtbuf;
 	char			*outbuf;
+	unsigned long		dropped;
 	unsigned int		len;
 	bool			extmsg;
 };
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1957,27 +1957,6 @@ static int console_trylock_spinning(void
 }
 
 /*
- * Call the specified console driver, asking it to write out the specified
- * text and length. If @dropped_text is non-NULL and any records have been
- * dropped, a dropped message will be written out first.
- */
-static void call_console_driver(struct console *con, const char *text, size_t len,
-				char *dropped_text)
-{
-	size_t dropped_len;
-
-	if (con->dropped && dropped_text) {
-		dropped_len = snprintf(dropped_text, DROPPED_TEXT_MAX,
-				       "** %lu printk messages dropped **\n",
-				       con->dropped);
-		con->dropped = 0;
-		con->write(con, dropped_text, dropped_len);
-	}
-
-	con->write(con, text, len);
-}
-
-/*
  * Recursion is tracked separately on each CPU. If NMIs are supported, an
  * additional NMI context per CPU is also separately tracked. Until per-CPU
  * is available, a separate "early tracking" is performed.
@@ -2356,10 +2335,6 @@ static ssize_t msg_print_ext_body(char *
 				  struct dev_printk_info *dev_info) { return 0; }
 static void console_lock_spinning_enable(void) { }
 static int console_lock_spinning_disable_and_check(void) { return 0; }
-static void call_console_driver(struct console *con, const char *text, size_t len,
-				char *dropped_text)
-{
-}
 static bool suppress_message_printing(int level) { return false; }
 static bool pr_flush(int timeout_ms, bool reset_on_progress) { return true; }
 static bool __pr_flush(struct console *con, int timeout_ms, bool reset_on_progress) { return true; }
@@ -2684,6 +2659,46 @@ static void __console_unlock(void)
 	up_console_sem();
 }
 
+/**
+ * cons_print_dropped - Print 'dropped' message if required
+ * @desc:	Pointer to the output descriptor
+ *
+ * Prints the 'dropped' message info the output buffer if @desc->dropped is
+ * not 0 and the regular format is requested. Extended format does not
+ * need this message because it prints the sequence numbers.
+ *
+ * In regular format the extended message buffer is not in use.
+ * So print into it at the beginning and move the resulting string
+ * just in front of the regular text buffer so that the message can
+ * be printed in one go.
+ *
+ * In case of a message this returns with @desc->outbuf and @desc->len
+ * updated. If no message is required then @desc is not modified.
+ */
+static void cons_print_dropped(struct cons_outbuf_desc *desc)
+{
+	struct cons_text_buf *txtbuf = desc->txtbuf;
+	size_t len;
+
+	if (!desc->dropped || desc->extmsg)
+		return;
+
+	if (WARN_ON_ONCE(desc->outbuf != txtbuf->text))
+		return;
+
+	/* Print it into ext_text which is unused */
+	len = snprintf(txtbuf->ext_text, DROPPED_TEXT_MAX,
+		       "** %lu printk messages dropped **\n", desc->dropped);
+	desc->dropped = 0;
+
+	/* Copy it just below text so it goes out with one write */
+	memcpy(txtbuf->text - len, txtbuf->ext_text, len);
+
+	/* Update the descriptor */
+	desc->len += len;
+	desc->outbuf -= len;
+}
+
 
 /**
  * cons_fill_outbuf - Fill the output buffer with the next record
@@ -2737,17 +2752,26 @@ static bool cons_fill_outbuf(struct cons
 	}
 
 	if (desc->extmsg) {
+		/*
+		 * Extended messages do not require "dropped" message
+		 * as that can be seen from the sequence number
+		 */
 		write_text = txtbuf->ext_text;
 		len = info_print_ext_header(write_text, CONSOLE_EXT_LOG_MAX, r.info);
 		len += msg_print_ext_body(write_text + len, CONSOLE_EXT_LOG_MAX - len,
 					  &r.text_buf[0], r.info->text_len, &r.info->dev_info);
+		desc->len = len;
+		desc->outbuf = write_text;
 	} else {
-		write_text = txtbuf->text;
 		len = record_print_text(&r, console_msg_format & MSG_FORMAT_SYSLOG, printk_time);
+
+		desc->len = len;
+		desc->outbuf = txtbuf->text;
+		desc->dropped = con->dropped;
+		cons_print_dropped(desc);
+		con->dropped = desc->dropped;
 	}
 
-	desc->len = len;
-	desc->outbuf = write_text;
 	return true;
 }
 
@@ -2800,8 +2824,10 @@ static bool console_emit_next_record(str
 	printk_safe_enter_irqsave(flags);
 	console_lock_spinning_enable();
 
-	stop_critical_timings();	/* don't trace print latency */
-	call_console_driver(con, desc.outbuf, desc.len, extmsg ? NULL : txtbuf->dropped_text);
+	/* don't trace print latency */
+	stop_critical_timings();
+	/* Write everything out to the hardware */
+	con->write(con, desc.outbuf, desc.len);
 	start_critical_timings();
 
 	con->seq++;


^ permalink raw reply	[flat|nested] 62+ messages in thread

* [patch RFC 19/29] printk: Add basic infrastructure for non-BKL consoles
  2022-09-10 22:27 [patch RFC 00/29] printk: A new approach - WIP Thomas Gleixner
                   ` (17 preceding siblings ...)
  2022-09-10 22:27 ` [patch RFC 18/29] printk: Handle dropped message smarter Thomas Gleixner
@ 2022-09-10 22:28 ` Thomas Gleixner
  2022-11-07 15:58   ` functionality: was: " Petr Mladek
  2022-11-07 16:10   ` cosmetic: " Petr Mladek
  2022-09-10 22:28 ` [patch RFC 20/29] printk: Add non-BKL console acquire/release logic Thomas Gleixner
                   ` (13 subsequent siblings)
  32 siblings, 2 replies; 62+ messages in thread
From: Thomas Gleixner @ 2022-09-10 22:28 UTC (permalink / raw)
  To: LKML
  Cc: John Ogness, Petr Mladek, Sergey Senozhatsky, Steven Rostedt,
	Linus Torvalds, Peter Zijlstra, Paul E. McKenney, Daniel Vetter,
	Greg Kroah-Hartman, Helge Deller, Jason Wessel, Daniel Thompson,
	John Ogness

The current console/printk subsystem is protected by a Big Kernel Lock,
aka. console_lock which has has ill defined semantics and is more or less
stateless. This puts severe limitations on the console subsystem and makes
forced takeover and output in emergency and panic situations a fragile
endavour which is based on try and pray.

The goal of non-BKL consoles is to break out of the console lock jail and
to provide a new infrastructure which avoids the pitfalls and allows
console drivers to be gradually converted over.

The proposed infrastructure aims for the following properties:

  - Lockless (SCRU protected) console list walk
  - Per console locking instead of global locking
  - Per console state which allows to make informed decisions
  - Stateful handover and takeover

As a first step this adds state to struct console. The per console state is
a atomic_long_t with a 32bit bit field and on 64bit a 32bit sequence for
tracking the last printed ringbuffer sequence number. On 32bit the sequence
is seperate from state for obvious reasons which requires to handle a few
extra race conditions.

Add the initial state with the most basic 'alive' and 'enabled' bits and
wire it up into the console register/unregister functionality and exclude
such consoles from being handled in the console BKL mechanisms.

The decision to use a bitfield was made as using a plain u32 and mask/shift
operations turned out to result in uncomprehensible code.

Co-Developed-by: John Ogness <jogness@linutronix.de>
Signed-off-by: John Ogness <jogness@linutronix.de>
Signed-off-by: Thomas Gleixner (Intel) <tglx@linutronix.de>
---
 fs/proc/consoles.c           |    1 
 include/linux/console.h      |   38 +++++++++
 kernel/printk/printk.c       |   21 ++++-
 kernel/printk/printk_nobkl.c |  176 +++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 235 insertions(+), 1 deletion(-)

--- a/fs/proc/consoles.c
+++ b/fs/proc/consoles.c
@@ -21,6 +21,7 @@ static int show_console_dev(struct seq_f
 		{ CON_ENABLED,		'E' },
 		{ CON_CONSDEV,		'C' },
 		{ CON_BOOT,		'B' },
+		{ CON_NO_BKL,		'N' },
 		{ CON_PRINTBUFFER,	'p' },
 		{ CON_BRL,		'b' },
 		{ CON_ANYTIME,		'a' },
--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -161,6 +161,8 @@ static inline int con_debug_leave(void)
  *			printk spam for obvious reasons
  * @CON_EXTENDED:	The console supports the extended output format of /dev/kmesg
  *			which requires a larger output record buffer
+ * @CON_NO_BKL:		Console can operate outside of the BKL style console_lock
+ *			constraints.
  */
 enum cons_flags {
 	CON_PRINTBUFFER		= BIT(0),
@@ -170,6 +172,37 @@ enum cons_flags {
 	CON_ANYTIME		= BIT(4),
 	CON_BRL			= BIT(5),
 	CON_EXTENDED		= BIT(6),
+	CON_NO_BKL		= BIT(7),
+};
+
+/**
+ * struct cons_state - console state for NOBKL consoles
+ * @atom:	Compound of the state fields for atomic operations
+ * @seq:	Sequence for record tracking (64bit only)
+ * @bits:	Compound of the state bits below
+ *
+ * @alive:	Console is alive. Required for teardown
+ * @enabled:	Console is enabled. If 0, do not use
+ *
+ * To be used for state read and preparation of atomic_long_cmpxchg()
+ * operations.
+ */
+struct cons_state {
+	union {
+		unsigned long	atom;
+		struct {
+#ifdef CONFIG_64BIT
+			u32	seq;
+#endif
+			union {
+				u32	bits;
+				struct {
+					u32 alive	:  1;
+					u32 enabled	:  1;
+				};
+			};
+		};
+	};
 };
 
 /**
@@ -218,6 +251,8 @@ struct cons_outbuf_desc {
  * @dropped:		Number of dropped ringbuffer records
  * @data:		Driver private data
  * @node:		hlist node for the console list
+ *
+ * @atomic_state:	State array for non-BKL consoles. Real and handover
  */
 struct console {
 	char			name[16];
@@ -237,6 +272,9 @@ struct console {
 	unsigned long		dropped;
 	void			*data;
 	struct hlist_node	node;
+
+	/* NOBKL console specific members */
+	atomic_long_t __private	atomic_state[2];
 };
 
 #ifdef CONFIG_LOCKDEP
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2339,7 +2339,9 @@ static bool suppress_message_printing(in
 static bool pr_flush(int timeout_ms, bool reset_on_progress) { return true; }
 static bool __pr_flush(struct console *con, int timeout_ms, bool reset_on_progress) { return true; }
 
-#endif /* CONFIG_PRINTK */
+#endif /* !CONFIG_PRINTK */
+
+#include "printk_nobkl.c"
 
 #ifdef CONFIG_EARLY_PRINTK
 struct console *early_console;
@@ -2635,6 +2637,13 @@ static bool abandon_console_lock_in_pani
  */
 static inline bool console_is_usable(struct console *con)
 {
+	/*
+	 * Exclude the NOBKL consoles. They are handled seperately
+	 * as they do not require the console BKL
+	 */
+	if ((con->flags & CON_NO_BKL))
+		return false;
+
 	if (!(con->flags & CON_ENABLED))
 		return false;
 
@@ -3079,7 +3088,10 @@ void console_stop(struct console *consol
 	console_list_lock();
 	console_lock();
 	console->flags &= ~CON_ENABLED;
+	cons_state_disable(console);
 	console_unlock();
+	/* Ensure that all SRCU list walks have completed */
+	synchronize_srcu(&console_srcu);
 	console_list_unlock();
 }
 EXPORT_SYMBOL(console_stop);
@@ -3089,6 +3101,7 @@ void console_start(struct console *conso
 	console_list_lock();
 	console_lock();
 	console->flags |= CON_ENABLED;
+	cons_state_enable(console);
 	console_unlock();
 	console_list_unlock();
 	__pr_flush(console, 1000, true);
@@ -3276,6 +3289,9 @@ void register_console(struct console *ne
 		newcon->flags &= ~CON_PRINTBUFFER;
 	}
 
+	/* Initialize the nobkl data in @newcon */
+	cons_nobkl_init(newcon);
+
 	/*
 	 * Put this console in the list and keep the referred driver at the
 	 * head of the list.
@@ -3342,6 +3358,7 @@ static int console_unregister_locked(str
 
 	/* Disable it unconditionally */
 	console->flags &= ~CON_ENABLED;
+	cons_state_disable(console);
 
 	if (hlist_unhashed(&console->node))
 		goto out_unlock;
@@ -3365,6 +3382,8 @@ static int console_unregister_locked(str
 	/* Ensure that all SRCU list walks have completed */
 	synchronize_srcu(&console_srcu);
 
+	cons_nobkl_cleanup(console);
+
 	console_sysfs_notify();
 
 	if (console->exit)
--- /dev/null
+++ b/kernel/printk/printk_nobkl.c
@@ -0,0 +1,176 @@
+// SPDX-License-Identifier: GPL-2.0-only
+// Copyright (C) 2022 Linutronix GmbH, John Ogness
+// Copyright (C) 2022 Intel, Thomas Gleixner
+
+/*
+ * Printk implementation for consoles which do not depend on the BKL style
+ * console_lock() mechanism.
+ *
+ * Console is locked on a CPU when state::locked is set and state:cpu ==
+ * current CPU. This is valid for the current execution context.
+ *
+ * Nesting execution contexts on the same CPU can carefully take over
+ * if the driver allows reentrancy via state::unsafe = false. When the
+ * interrupted context resumes it checks the state before entering
+ * a unsafe region and aborts the operation it it detects the takeover.
+ *
+ * In case of panic or emergency the nesting context can take over the
+ * console forcefully. The write callback is then invoked with the unsafe
+ * flag set in the write context data which allows the driver side to avoid
+ * locks and to evaluate the driver state so it can use an emergency path or
+ * repair the state instead of blindly assuming that it works.
+ *
+ * If the interrupted context touches the assigned record buffer after
+ * takeover that does not cause harm because at the same execution level
+ * there is no concurrency on the same CPU. A threaded printer has always
+ * its own record buffer so it can never interfere with any of the per CPU
+ * record buffers.
+ *
+ * A concurrent writer on a different CPU can request to take over the
+ * console by:
+ *
+ *	1) Carefully writing the desired state into state[HANDOVER]
+ *	   if there is no same or higher priority request pending
+ *	   This locks state[HANDOVER] except for higher priority
+ *	   waiters.
+ *
+ *	2) Setting state[REAL].req_prio unless a higher priority
+ *	   waiter won the race.
+ *
+ *	3) Carefully spin on state[REAL] until that is locked with the
+ *	   expected state. When the state is not the expected one then it
+ *	   has to verify that state[HANDOVER] is still the same and that
+ *	   state[REAL] has not been taken over or marked dead.
+ *
+ *      The unlocker hands over to state[HANDOVER], but only if state[REAL]
+ *	matches.
+ *
+ * In case that the owner does not react on the request and does not make
+ * observable progress, the caller can decide to do a hostile take over.
+ */
+
+#ifdef CONFIG_PRINTK
+
+#define copy_full_state(_dst, _src)	do { _dst = _src; } while(0)
+#define copy_bit_state(_dst, _src)	do { _dst.bits = _src.bits; } while(0)
+
+#ifdef CONFIG_64BIT
+#define copy_seq_state64(_dst, _src)	do { _dst.seq = _src.seq; } while(0)
+#else
+#define copy_seq_state64(_dst, _src)	do { } while(0)
+#endif
+
+enum state_selector {
+	STATE_REAL,
+	STATE_HANDOVER,
+};
+
+/**
+ * cons_state_set - Helper function to set the console state
+ * @con:	Console to update
+ * @which:	Selects real state or handover state
+ * @new:	The new state to write
+ *
+ * Only to be used when the console is not yet or not longer visible in the
+ * system.
+ */
+static inline void cons_state_set(struct console *con, enum state_selector which,
+				  struct cons_state *new)
+{
+	atomic_long_set(&ACCESS_PRIVATE(con, atomic_state[which]), new->atom);
+}
+
+/**
+ * cons_state_read - Helper function to read the console state
+ * @con:	Console to update
+ * @which:	Selects real state or handover state
+ * @state:	The state to store the result
+ */
+static inline void cons_state_read(struct console *con, enum state_selector which,
+				   struct cons_state *state)
+{
+	state->atom = atomic_long_read(&ACCESS_PRIVATE(con, atomic_state[which]));
+}
+
+/**
+ * cons_state_try_cmpxchg() - Helper function for atomic_long_try_cmpxchg() on console state
+ * @con:	Console to update
+ * @which:	Selects real state or handover state
+ * @old:	Old state
+ * @new:	New state
+ *
+ * Returns: True on success, false on fail
+ */
+static inline bool cons_state_try_cmpxchg(struct console *con,
+					  enum state_selector which,
+					  struct cons_state *old,
+					  struct cons_state *new)
+{
+	return atomic_long_try_cmpxchg(&ACCESS_PRIVATE(con, atomic_state[which]),
+				       &old->atom, new->atom);
+}
+
+/**
+ * cons_state_mod_enabled - Helper function to en/disable a console
+ * @con:	Console to modify
+ */
+static void cons_state_mod_enabled(struct console *con, bool enable)
+{
+	struct cons_state old, new;
+
+	cons_state_read(con, STATE_REAL, &old);
+	do {
+		copy_full_state(new, old);
+		new.enabled = enable;
+	} while (!cons_state_try_cmpxchg(con, STATE_REAL, &old, &new));
+}
+
+/**
+ * cons_state_disable - Helper function to disable a console
+ * @con:	Console to disable
+ */
+static void cons_state_disable(struct console *con)
+{
+	cons_state_mod_enabled(con, false);
+}
+
+/**
+ * cons_state_enable - Helper function to enable a console
+ * @con:	Console to enable
+ */
+static void cons_state_enable(struct console *con)
+{
+	cons_state_mod_enabled(con, true);
+}
+
+/**
+ * cons_nobkl_init - Initialize the NOBKL console state
+ * @con:	Console to initialize
+ */
+static void cons_nobkl_init(struct console *con)
+{
+	struct cons_state state = {
+		.alive = 1,
+		.enabled = !!(con->flags & CON_ENABLED),
+	};
+
+	cons_state_set(con, STATE_REAL, &state);
+}
+
+/**
+ * cons_nobkl_cleanup - Cleanup the NOBKL console state
+ * @con:	Console to cleanup
+ */
+static void cons_nobkl_cleanup(struct console *con)
+{
+	struct cons_state state = { };
+
+	cons_state_set(con, STATE_REAL, &state);
+}
+
+#else /* CONFIG_PRINTK */
+static inline void cons_nobkl_init(struct console *con) { }
+static inline void cons_nobkl_cleanup(struct console *con) { }
+static inline void cons_state_disable(struct console *con) { }
+static inline void cons_state_enable(struct console *con) { }
+#endif /* !CONFIG_PRINTK */


^ permalink raw reply	[flat|nested] 62+ messages in thread

* [patch RFC 20/29] printk: Add non-BKL console acquire/release logic
  2022-09-10 22:27 [patch RFC 00/29] printk: A new approach - WIP Thomas Gleixner
                   ` (18 preceding siblings ...)
  2022-09-10 22:28 ` [patch RFC 19/29] printk: Add basic infrastructure for non-BKL consoles Thomas Gleixner
@ 2022-09-10 22:28 ` Thomas Gleixner
  2022-09-27 13:49   ` John Ogness
  2022-09-10 22:28 ` [patch RFC 21/29] printk: Add buffer management for noBKL consoles Thomas Gleixner
                   ` (12 subsequent siblings)
  32 siblings, 1 reply; 62+ messages in thread
From: Thomas Gleixner @ 2022-09-10 22:28 UTC (permalink / raw)
  To: LKML
  Cc: John Ogness, Petr Mladek, Sergey Senozhatsky, Steven Rostedt,
	Linus Torvalds, Peter Zijlstra, Paul E. McKenney, Daniel Vetter,
	Greg Kroah-Hartman, Helge Deller, Jason Wessel, Daniel Thompson,
	John Ogness

Add per console acquire/release functionality. The console 'locked' state
is a combination of several state fields:

  - The 'locked' bit

  - The 'cpu' field which denotes on which CPU the console is locked
  
  - The 'cur_prio' field which contains the severity of the printk context
    which owns the console. This field is used for decisions whether to
    attempt friendly handovers and also prevents takeovers from a less
    severe context, e.g. to protect the panic CPU.

The acquire mechanism comes with several flavours:

  - Straight forward acquire when the console is not contended

  - Friendly handover mechanism based on a request/grant handshake

    The requesting context:

      1) puts the desired handover state (CPU nr, prio) into a seperate
         handover state

      2) sets the 'req_prio' field in the real console state

      3) Waits for the owning context to handover with a timeout

    The owning context:

      1) Observes the 'req_prio' field set

      2) Hands the console over to the requesting context by switching
      	 the console state to the handover state which the requester
	 provided
    
  - Hostile takeover

      The new owner takes the console over without handshake

      This is required when friendly handovers are not possible, i.e. the
      higher priority context interrupted the owning context on the same
      CPU or the owning context is not able to make progress on a remote
      CPU.

The release is the counterpart which either releases the console directly
or hands it gracefully over to a requester.

All operations on console::atomic_state[REAL|HANDOVER} are atomic cmpxchg
based to handle concurrency.

The acquire/release functions implement only minimal policies:

  - Preference for higher priority contexts
  - Protection of the panic CPU

All other policy decisions have to be made at the call sites.

The design allows to implement the well known:

    acquire()
    output_one_line()
    release()

algorithm, but also allows to avoid the per line acquire/release for
e.g. panic situations by doing the acquire once and then relying on
the panic CPU protection for the rest.

Co-Developed-by: John Ogness <jogness@linutronix.de>
Signed-off-by: John Ogness <jogness@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/console.h      |   63 +++++
 kernel/printk/printk_nobkl.c |  466 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 529 insertions(+)

--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -183,6 +183,12 @@ enum cons_flags {
  *
  * @alive:	Console is alive. Required for teardown
  * @enabled:	Console is enabled. If 0, do not use
+ * @locked:	Console is locked by a writer
+ * @unsafe:	Console is busy in a non takeover region
+ * @thread:	Current owner is the printk thread
+ * @cur_prio:	The priority of the current output
+ * @req_prio:	The priority of a handover request
+ * @cpu:	The CPU on which the writer runs
  *
  * To be used for state read and preparation of atomic_long_cmpxchg()
  * operations.
@@ -199,6 +205,12 @@ struct cons_state {
 				struct {
 					u32 alive	:  1;
 					u32 enabled	:  1;
+					u32 locked	:  1;
+					u32 unsafe	:  1;
+					u32 thread	:  1;
+					u32 cur_prio	:  2;
+					u32 req_prio	:  2;
+					u32 cpu		: 18;
 				};
 			};
 		};
@@ -233,6 +245,56 @@ struct cons_outbuf_desc {
 };
 
 /**
+ * cons_prio - console writer priority for NOBKL consoles
+ * @CONS_PRIO_NONE:		Unused
+ * @CONS_PRIO_NORMAL:		Regular printk
+ * @CONS_PRIO_EMERGENCY:	Emergency output (WARN/OOPS...)
+ * @CONS_PRIO_PANIC:		Panic output
+ *
+ * Emergency output can carefully takeover the console even without consent
+ * of the owner, ideally only when @cons_state::unsafe is not set. Panic
+ * output can ignore the unsafe flag as a last resort. If panic output is
+ * active no takeover is possible until the panic output releases the
+ * console.
+ */
+enum cons_prio {
+	CONS_PRIO_NONE = 0,
+	CONS_PRIO_NORMAL,
+	CONS_PRIO_EMERGENCY,
+	CONS_PRIO_PANIC,
+};
+
+struct console;
+
+/**
+ * struct cons_context - Context for console acquire/release
+ * @console:		The associated console
+ * @state:		The state at acquire time
+ * @old_state:		The old state when try_acquire() failed for analyis
+ *			by the caller
+ * @hov_state:		The handover state for spin and cleanup
+ * @req_state:		The request state for spin and cleanup
+ * @spinwait_max_us:	Limit for spinwait acquire
+ * @prio:		Priority of the context
+ * @thread:		The acquire is printk thread context
+ * @hostile:		Hostile takeover requested. Cleared on normal
+ *			acquire or friendly handover
+ * @spinwait:		Spinwait on acquire if possible
+ */
+struct cons_context {
+	struct console		*console;
+	struct cons_state	state;
+	struct cons_state	old_state;
+	struct cons_state	hov_state;
+	struct cons_state	req_state;
+	unsigned int		spinwait_max_us;
+	enum cons_prio		prio;
+	unsigned int		thread		: 1;
+	unsigned int		hostile		: 1;
+	unsigned int		spinwait	: 1;
+};
+
+/**
  * struct console - The console descriptor structure
  * @name:		The name of the console driver
  * @write:		Write callback to output messages (Optional)
@@ -275,6 +337,7 @@ struct console {
 
 	/* NOBKL console specific members */
 	atomic_long_t __private	atomic_state[2];
+
 };
 
 #ifdef CONFIG_LOCKDEP
--- a/kernel/printk/printk_nobkl.c
+++ b/kernel/printk/printk_nobkl.c
@@ -144,6 +144,472 @@ static void cons_state_enable(struct con
 }
 
 /**
+ * cons_state_ok - Check whether state is ok for usage
+ * @state:	The state to check
+ *
+ * Returns: True if usable, false otherwise.
+ */
+static inline bool cons_state_ok(struct cons_state state)
+{
+	return state.alive && state.enabled;
+}
+
+/**
+ * cons_state_full_match - Check whether the full state matches
+ * @cur:	The state to check
+ * @prev:	The previous state
+ *
+ * Returns: True if matching, false otherwise.
+ *
+ * Check the full state including state::seq on 64bit. For take over
+ * detection.
+ */
+static inline bool cons_state_full_match(struct cons_state cur,
+					 struct cons_state prev)
+{
+	/*
+	 * req_prio can be set by a concurrent writer for friendly
+	 * handover. Ignore it in the comparison.
+	 */
+	cur.req_prio = prev.req_prio;
+	return cur.atom == prev.atom;
+}
+
+/**
+ * cons_state_bits_match - Check for matching state bits
+ * @cur:	The state to check
+ * @prev:	The previous state
+ *
+ * Returns: True if state matches, false otherwise.
+ *
+ * Contrary to cons_state_full_match this checks only the bits and ignores
+ * a sequence change on 64bits. On 32bit the two functions are identical.
+ */
+static inline bool cons_state_bits_match(struct cons_state cur,
+					  struct cons_state prev)
+{
+	/*
+	 * req_prio can be set by a concurrent writer for friendly
+	 * handover. Ignore it in the comparison.
+	 */
+	cur.req_prio = prev.req_prio;
+	return cur.bits == prev.bits;
+}
+
+/**
+ * cons_check_panic - Check whether a remote CPU paniced
+ */
+static inline bool cons_check_panic(void)
+{
+	unsigned int pcpu = atomic_read(&panic_cpu);
+
+	return pcpu != PANIC_CPU_INVALID && pcpu != smp_processor_id();
+}
+
+/**
+ * cons_cleanup_handover - Cleanup a handover request
+ * @ctxt:	Pointer to acquire context
+ *
+ * @ctxt->hov_state contains the state to clean up
+ */
+static void cons_cleanup_handover(struct cons_context *ctxt)
+{
+	struct console *con = ctxt->console;
+	struct cons_state new;
+
+	/*
+	 * No loop required. Either hov_state is still the same or
+	 * not.
+	 */
+	new.atom = 0;
+	cons_state_try_cmpxchg(con, STATE_HANDOVER, &ctxt->hov_state, &new);
+}
+
+/**
+ * cons_setup_handover - Setup a handover request
+ * @ctxt:	Pointer to acquire context
+ *
+ * On success @ctxt->hov_state contains the requested handover state
+ */
+static bool cons_setup_handover(struct cons_context *ctxt)
+{
+	unsigned int cpu = smp_processor_id();
+	struct console *con = ctxt->console;
+	struct cons_state old;
+	struct cons_state hstate = {
+		.alive		= 1,
+		.enabled	= 1,
+		.locked		= 1,
+		.cur_prio	= ctxt->prio,
+		.cpu		= cpu,
+	};
+
+	/*
+	 * Try to store hstate in @con->atomic_state[HANDOVER]. This might
+	 * race with a higher priority waiter.
+	 */
+	cons_state_read(con, STATE_HANDOVER, &old);
+	do {
+		if (cons_check_panic())
+			return false;
+
+		/* Same or higher priority waiter exists? */
+		if (old.cur_prio >= ctxt->prio)
+			return false;
+
+	} while (!cons_state_try_cmpxchg(con, STATE_HANDOVER, &old, &hstate));
+
+	copy_full_state(ctxt->hov_state, hstate);
+	return true;
+}
+
+/**
+ * cons_setup_request - Setup a handover request in state[REAL]
+ * @ctxt:	Pointer to acquire context
+ * @old:	The state which was used to make the decision to spin wait
+ *
+ * @ctxt->hov_state contains the handover state which was set in
+ * state[HANDOVER]
+ */
+static bool cons_setup_request(struct cons_context *ctxt, struct cons_state old)
+{
+	struct console *con = ctxt->console;
+	struct cons_state cur, new;
+
+	/* Now set the request in state[REAL] */
+	cons_state_read(con, STATE_REAL, &cur);
+	do {
+		if (cons_check_panic())
+			goto cleanup;
+
+		/* Bit state changed vs. the decision to spinwait? */
+		if (!cons_state_bits_match(cur, old))
+			goto cleanup;
+
+		/* Setup a request for handover. */
+		copy_full_state(new, cur);
+		new.req_prio = ctxt->prio;
+	} while (!cons_state_try_cmpxchg(con, STATE_REAL, &cur, &new));
+
+	/* Safe that state for comparision in spinwait */
+	copy_bit_state(ctxt->req_state, new);
+	return true;
+
+cleanup:
+	cons_cleanup_handover(ctxt);
+	return false;
+}
+
+/**
+ * cons_try_acquire_spin - Complete the spinwait attempt
+ * @ctxt:	Pointer to an aquire context which contains
+ *		all information about the acquire mode
+ *
+ * @ctxt->hov_state contains the handover state which was set in
+ * state[HANDOVER]
+ * @ctxt->req_state contains the request state which was set in
+ * state[REAL]
+ *
+ * Returns: True if locked. False otherwise
+ */
+static bool cons_try_acquire_spin(struct cons_context *ctxt)
+{
+	struct console *con = ctxt->console;
+	struct cons_state cur, new;
+	bool ret = false;
+	int timeout;
+
+	/* Now wait for the other side to hand over */
+	for (timeout = ctxt->spinwait_max_us; timeout >= 0; timeout--) {
+		if (cons_check_panic())
+			goto cleanup;
+
+		cons_state_read(con, STATE_REAL, &cur);
+		/*
+		 * This might have raced with a new requester coming in
+		 * after the lock was handed over. So the request pends now
+		 * for the current context with higher priority.
+		 */
+		if (cons_state_bits_match(cur, ctxt->hov_state))
+			goto success;
+
+		/*
+		 * When state changed since the request was made give up as
+		 * it is not longer consistent. This must include
+		 * state::req_prio.
+		 */
+		if (cur.bits != ctxt->req_state.bits)
+			goto cleanup;
+
+		/*
+		 * Finally check whether the handover state is still
+		 * the same.
+		 */
+		cons_state_read(con, STATE_HANDOVER, &cur);
+		if (cur.atom != ctxt->hov_state.atom)
+			goto cleanup;
+
+		/* Account time */
+		udelay(1);
+	}
+
+	/*
+	 * Timeout. Cleanup the handover state and carefully try to undo
+	 * req_prio in real state.
+	 */
+	cons_cleanup_handover(ctxt);
+
+	cons_state_read(con, STATE_REAL, &cur);
+	do {
+		/*
+		 * The timeout might have raced with the owner coming late
+		 * and handing it over gracefully.
+		 */
+		if (cur.bits == ctxt->hov_state.bits)
+			goto success;
+		/*
+		 * Validate that the state matches with the state at
+		 * request time.
+		 */
+		if (cur.bits != ctxt->req_state.bits)
+			return false;
+
+		copy_full_state(new, cur);
+		new.req_prio = 0;
+	} while (!cons_state_try_cmpxchg(con, STATE_REAL, &cur, &new));
+	/* Reset worked */
+	return false;
+
+success:
+	/* Store the real state */
+	copy_full_state(ctxt->state, cur);
+	ctxt->hostile = false;
+	ret = true;
+
+cleanup:
+	cons_cleanup_handover(ctxt);
+	return ret;
+}
+
+/**
+ * __cons_try_acquire - Try to acquire the console for printk output
+ * @ctxt:	Pointer to an aquire context which contains
+ *		all information about the acquire mode
+ *
+ * Returns: True if the acquire was successful. False on fail.
+ *
+ * In case of success @ctxt->state contains the acquisition
+ * state.
+ *
+ * In case of fail @ctxt->old_state contains the state
+ * which was read from @con->state for analysis by the caller.
+ */
+static bool __cons_try_acquire(struct cons_context *ctxt)
+{
+	struct console *con = ctxt->console;
+	unsigned int cpu = smp_processor_id();
+	struct cons_state old, new;
+
+	if (WARN_ON_ONCE(!(con->flags & CON_NO_BKL)))
+		return false;
+
+	cons_state_read(con, STATE_REAL, &old);
+
+again:
+	if (cons_check_panic())
+		return false;
+
+	/* Preserve it for the caller and for spinwait */
+	copy_full_state(ctxt->old_state, old);
+
+	if (!cons_state_ok(old))
+		return false;
+
+	/* Set up the new state for takeover */
+	copy_full_state(new, old);
+	new.locked = 1;
+	new.thread = ctxt->thread;
+	new.cur_prio = ctxt->prio;
+	new.req_prio = CONS_PRIO_NONE;
+	new.cpu = cpu;
+
+	/* Attempt to acquire it directly if unlocked */
+	if (!old.locked) {
+		if (!cons_state_try_cmpxchg(con, STATE_REAL, &old, &new))
+			goto again;
+
+		ctxt->hostile = false;
+		copy_full_state(ctxt->state, new);
+		goto success;
+	}
+
+	/*
+	 * Give up if the calling context is the printk thread. The
+	 * atomic writer will wake the thread when it is done with
+	 * the important output.
+	 */
+	if (ctxt->thread)
+		return false;
+
+	/*
+	 * If the active context is on the same CPU then there is
+	 * obviously no handshake possible.
+	 */
+	if (old.cpu == cpu)
+		goto check_hostile;
+
+	/*
+	 * If the caller did not request spin-waiting or a request with the
+	 * same or higher priority is pending then check whether a hostile
+	 * takeover is due.
+	 */
+	if (!ctxt->spinwait || old.req_prio >= ctxt->prio)
+		goto check_hostile;
+
+	/* Proceed further with spin acquire */
+	if (!cons_setup_handover(ctxt))
+		return false;
+
+	/*
+	 * Setup the request in state[REAL]. Hand in the state, which was
+	 * used to make the decision to spinwait above, for comparison.
+	 */
+	if (!cons_setup_request(ctxt, old))
+		return false;
+
+	/* Now spin on it */
+	if (!cons_try_acquire_spin(ctxt))
+		return false;
+success:
+	/* Common updates on success */
+	return true;
+
+check_hostile:
+	if (!ctxt->hostile)
+		return false;
+
+	if (!cons_state_try_cmpxchg(con, STATE_REAL, &old, &new))
+		goto again;
+
+	ctxt->hostile = true;
+	copy_full_state(ctxt->state, new);
+	goto success;
+}
+
+/**
+ * cons_try_acquire - Try to acquire the console for printk output
+ * @ctxt:	Pointer to an aquire context which contains
+ *		all information about the acquire mode
+ *
+ * Returns: True if the acquire was successful. False on fail.
+ *
+ * In case of success @ctxt->state contains the acquisition
+ * state.
+ *
+ * In case of fail @ctxt->old_state contains the state
+ * which was read from @con->state for analysis by the caller.
+ */
+static bool __maybe_unused cons_try_acquire(struct cons_context *ctxt)
+{
+	if (__cons_try_acquire(ctxt))
+		return true;
+
+	ctxt->state.atom = 0;
+	return false;
+}
+
+/**
+ * __cons_release - Release the console after output is done
+ * @ctxt:	The acquire context which contains the state
+ *		at cons_try_acquire()
+ *
+ * Returns:	True if the release was regular
+ *
+ *		False if the console is in unusable state or was handed over
+ *		with handshake or taken	over hostile without handshake.
+ *
+ * The return value tells the caller whether it needs to evaluate further
+ * printing.
+ */
+static bool __cons_release(struct cons_context *ctxt)
+{
+	struct console *con = ctxt->console;
+	struct cons_state old, new, hstate;
+
+	if (WARN_ON_ONCE(!(con->flags & CON_NO_BKL)))
+		return false;
+
+	cons_state_read(con, STATE_REAL, &old);
+
+again:
+	if (!cons_state_full_match(old, ctxt->state))
+		return false;
+
+	/*
+	 * Release it directly when:
+	 * - the console has been disabled
+	 * - no handover request is pending
+	 */
+	if (!cons_state_ok(old) || !old.req_prio)
+		goto unlock;
+
+	/* Read the handover target state */
+	cons_state_read(con, STATE_HANDOVER, &hstate);
+
+	/* If the waiter gave up hstate is 0 */
+	if (!hstate.atom)
+		goto unlock;
+
+	/*
+	 * If a higher priority waiter raced against a lower priority
+	 * waiter then wait for it to update the real state.
+	 */
+	if (hstate.cur_prio != old.req_prio)
+		goto again;
+
+	/* Switch the state and preserve the sequence on 64bit */
+	copy_bit_state(new, hstate);
+	copy_seq_state64(new, old);
+	if (!cons_state_try_cmpxchg(con, STATE_REAL, &old, &new))
+		goto again;
+
+	return true;
+
+unlock:
+	copy_full_state(new, old);
+	new.locked = 0;
+	new.thread = 0;
+	new.cur_prio = CONS_PRIO_NONE;
+	new.req_prio = CONS_PRIO_NONE;
+
+	if (!cons_state_try_cmpxchg(con, STATE_REAL, &old, &new))
+		goto again;
+
+	return true;
+}
+
+/**
+ * cons_release - Release the console after output is done
+ * @ctxt:	The acquire context which contains the state
+ *		at cons_try_acquire()
+ *
+ * Returns:	True if the release was regular
+ *
+ *		False if the console is in unusable state or was handed over
+ *		with handshake or taken	over hostile without handshake.
+ *
+ * The return value tells the caller whether it needs to evaluate further
+ * printing.
+ */
+static bool __maybe_unused cons_release(struct cons_context *ctxt)
+{
+	bool ret = __cons_release(ctxt);
+
+	ctxt->state.atom = 0;
+	return ret;
+}
+
+/**
  * cons_nobkl_init - Initialize the NOBKL console state
  * @con:	Console to initialize
  */


^ permalink raw reply	[flat|nested] 62+ messages in thread

* [patch RFC 21/29] printk: Add buffer management for noBKL consoles
  2022-09-10 22:27 [patch RFC 00/29] printk: A new approach - WIP Thomas Gleixner
                   ` (19 preceding siblings ...)
  2022-09-10 22:28 ` [patch RFC 20/29] printk: Add non-BKL console acquire/release logic Thomas Gleixner
@ 2022-09-10 22:28 ` Thomas Gleixner
  2022-09-10 22:28 ` [patch RFC 22/29] printk: Add sequence handling for non-BKL consoles Thomas Gleixner
                   ` (11 subsequent siblings)
  32 siblings, 0 replies; 62+ messages in thread
From: Thomas Gleixner @ 2022-09-10 22:28 UTC (permalink / raw)
  To: LKML
  Cc: John Ogness, Petr Mladek, Sergey Senozhatsky, Steven Rostedt,
	Linus Torvalds, Peter Zijlstra, Paul E. McKenney, Daniel Vetter,
	Greg Kroah-Hartman, Helge Deller, Jason Wessel, Daniel Thompson,
	John Ogness

In case of hostile takeovers it must be ensured that the previous owner
cannot scribble over the output buffer of the emergency/panic context. This
is achieved by:

 - Allocating per CPU output buffers per console and add the required handling
   into the acquire/release functions.

 - Adding a single instance to struct console for early boot (pre per CPU
   data being available). The builtin instance is also used for threaded
   printing once printer threads become available.

Wrapped into a seperate data structure so other context related fields can
be added in later steps.

Co-Developed-by: John Ogness <jogness@linutronix.de>
Signed-off-by: John Ogness <jogness@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/console.h      |   21 ++++++++++++-
 kernel/printk/printk.c       |   18 ++++++++---
 kernel/printk/printk_nobkl.c |   69 +++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 102 insertions(+), 6 deletions(-)

--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -276,6 +276,7 @@ struct console;
  * @req_state:		The request state for spin and cleanup
  * @spinwait_max_us:	Limit for spinwait acquire
  * @prio:		Priority of the context
+ * @txtbuf:		Pointer to the text buffer for this context
  * @thread:		The acquire is printk thread context
  * @hostile:		Hostile takeover requested. Cleared on normal
  *			acquire or friendly handover
@@ -289,11 +290,25 @@ struct cons_context {
 	struct cons_state	req_state;
 	unsigned int		spinwait_max_us;
 	enum cons_prio		prio;
+	struct cons_text_buf	*txtbuf;
 	unsigned int		thread		: 1;
 	unsigned int		hostile		: 1;
 	unsigned int		spinwait	: 1;
 };
 
+#define CONS_MAX_NEST_LVL	8
+
+/**
+ * struct cons_context_data - console context data
+ * @txtbuf:		Buffer for storing the text
+ *
+ * Used for early boot embedded into struct console and for
+ * per CPU data.
+ */
+struct cons_context_data {
+	struct cons_text_buf		txtbuf;
+};
+
 /**
  * struct console - The console descriptor structure
  * @name:		The name of the console driver
@@ -315,6 +330,8 @@ struct cons_context {
  * @node:		hlist node for the console list
  *
  * @atomic_state:	State array for non-BKL consoles. Real and handover
+ * @pcpu_data:		Pointer to percpu context data
+ * @ctxt_data:		Builtin context data for early boot and threaded printing
  */
 struct console {
 	char			name[16];
@@ -336,8 +353,10 @@ struct console {
 	struct hlist_node	node;
 
 	/* NOBKL console specific members */
-	atomic_long_t __private	atomic_state[2];
+	atomic_long_t __private		atomic_state[2];
 
+	struct cons_context_data __percpu	*pcpu_data;
+	struct cons_context_data		ctxt_data;
 };
 
 #ifdef CONFIG_LOCKDEP
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1071,9 +1071,17 @@ static void __init log_buf_add_cpu(void)
 static inline void log_buf_add_cpu(void) {}
 #endif /* CONFIG_SMP */
 
+static void cons_alloc_percpu_data(struct console *con);
+
 static void __init set_percpu_data_ready(void)
 {
+	struct console *con;
+
+	console_list_lock();
+	for_each_registered_console(con)
+		cons_alloc_percpu_data(con);
 	__printk_percpu_data_ready = true;
+	console_list_unlock();
 }
 
 static unsigned int __init add_to_rb(struct printk_ringbuffer *rb,
@@ -2341,6 +2349,11 @@ static bool __pr_flush(struct console *c
 
 #endif /* !CONFIG_PRINTK */
 
+#define con_printk(lvl, con, fmt, ...)			\
+	printk(lvl pr_fmt("%sconsole [%s%d] " fmt),	\
+	       (con->flags & CON_BOOT) ? "boot" : "",	\
+	       con->name, con->index, ##__VA_ARGS__)
+
 #include "printk_nobkl.c"
 
 #ifdef CONFIG_EARLY_PRINTK
@@ -3191,11 +3204,6 @@ static void try_enable_default_console(s
 		newcon->flags |= CON_CONSDEV;
 }
 
-#define con_printk(lvl, con, fmt, ...)			\
-	printk(lvl pr_fmt("%sconsole [%s%d] " fmt),	\
-	       (con->flags & CON_BOOT) ? "boot" : "",	\
-	       con->name, con->index, ##__VA_ARGS__)
-
 #define cons_first()					\
 	hlist_entry(console_list.first, struct console, node)
 
--- a/kernel/printk/printk_nobkl.c
+++ b/kernel/printk/printk_nobkl.c
@@ -207,6 +207,43 @@ static inline bool cons_check_panic(void
 }
 
 /**
+ * cons_context_set_text_buf - Set the output text buffer for the current context
+ * @ctxt:	Pointer to the aquire context
+ *
+ * Buffer selection:
+ *   1) Early boot uses the console builtin buffer
+ *   2) Threads use the console builtin buffer
+ *   3) All other context use the per CPU buffers
+ *
+ * This guarantees that there is no concurrency on the output records
+ * ever. Per CPU nesting is not a problem at all. The takeover logic
+ * tells the interrupted context that the buffer has been overwritten.
+ *
+ * There are two critical regions which matter:
+ *
+ * 1) Context is filling the buffer with a record. After interruption
+ *    it continues to sprintf() the record and before it goes to
+ *    write it out, it checks the state, notices the takeover, discards
+ *    the content and backs out.
+ *
+ * 2) Context is in a unsafe critical region in the driver. After
+ *    interruption it might read overwritten data from the output
+ *    buffer. When it leaves the critical region it notices and backs
+ *    out. Hostile takeovers in driver critical regions are best effort
+ *    and there is not much which can be done about that.
+ */
+static void cons_context_set_text_buf(struct cons_context *ctxt)
+{
+	struct console *con = ctxt->console;
+
+	/* Early boot or allocation fail? */
+	if (!con->pcpu_data)
+		ctxt->txtbuf = &con->ctxt_data.txtbuf;
+	else
+		ctxt->txtbuf = &(this_cpu_ptr(con->pcpu_data)->txtbuf);
+}
+
+/**
  * cons_cleanup_handover - Cleanup a handover request
  * @ctxt:	Pointer to acquire context
  *
@@ -482,6 +519,7 @@ static bool __cons_try_acquire(struct co
 		return false;
 success:
 	/* Common updates on success */
+	cons_context_set_text_buf(ctxt);
 	return true;
 
 check_hostile:
@@ -610,6 +648,35 @@ static bool __maybe_unused cons_release(
 }
 
 /**
+ * cons_alloc_percpu_data - Allocate percpu data for a console
+ * @con:	Console to allocate for
+ */
+static void cons_alloc_percpu_data(struct console *con)
+{
+	if (!printk_percpu_data_ready())
+		return;
+
+	con->pcpu_data = alloc_percpu(typeof(*con->pcpu_data));
+	if (con->pcpu_data)
+		return;
+
+	con_printk(KERN_WARNING, con, "Failed to allocate percpu buffers\n");
+}
+
+/**
+ * cons_free_percpu_data - Free percpu data of a console on unregister
+ * @con:	Console to clean up
+ */
+static void cons_free_percpu_data(struct console *con)
+{
+	if (!con->pcpu_data)
+		return;
+
+	free_percpu(con->pcpu_data);
+	con->pcpu_data = NULL;
+}
+
+/**
  * cons_nobkl_init - Initialize the NOBKL console state
  * @con:	Console to initialize
  */
@@ -620,6 +687,7 @@ static void cons_nobkl_init(struct conso
 		.enabled = !!(con->flags & CON_ENABLED),
 	};
 
+	cons_alloc_percpu_data(con);
 	cons_state_set(con, STATE_REAL, &state);
 }
 
@@ -632,6 +700,7 @@ static void cons_nobkl_cleanup(struct co
 	struct cons_state state = { };
 
 	cons_state_set(con, STATE_REAL, &state);
+	cons_free_percpu_data(con);
 }
 
 #else /* CONFIG_PRINTK */


^ permalink raw reply	[flat|nested] 62+ messages in thread

* [patch RFC 22/29] printk: Add sequence handling for non-BKL consoles
  2022-09-10 22:27 [patch RFC 00/29] printk: A new approach - WIP Thomas Gleixner
                   ` (20 preceding siblings ...)
  2022-09-10 22:28 ` [patch RFC 21/29] printk: Add buffer management for noBKL consoles Thomas Gleixner
@ 2022-09-10 22:28 ` Thomas Gleixner
  2022-09-10 22:28 ` [patch RFC 23/29] printk: Add non-BKL console print state functions Thomas Gleixner
                   ` (10 subsequent siblings)
  32 siblings, 0 replies; 62+ messages in thread
From: Thomas Gleixner @ 2022-09-10 22:28 UTC (permalink / raw)
  To: LKML
  Cc: John Ogness, Petr Mladek, Sergey Senozhatsky, Steven Rostedt,
	Linus Torvalds, Peter Zijlstra, Paul E. McKenney, Daniel Vetter,
	Greg Kroah-Hartman, Helge Deller, Jason Wessel, Daniel Thompson,
	John Ogness

On 64bit systems the sequence tracking is embedded into the atomic console
state, on 32bit it has to be stored in a seperate atomic member. The latter
needs to handle the non-atomicity in hostile takeover cases, while 64bit can
completely rely on the state atomicity.

The ringbuffer sequence number is 64bit, but having a 32bit representation
in the console is sufficient. If a console ever gets more than 2^31 records
behind the ringbuffer then this is the least of the problems.

On acquire() the stomic 32bit sequence number is expanded to 64 bit by
folding the ringbuffers sequence into it carefully.

Co-Developed-by: John Ogness <jogness@linutronix.de>
Signed-off-by: John Ogness <jogness@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/console.h      |   11 +-
 kernel/printk/printk_nobkl.c |  204 ++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 212 insertions(+), 3 deletions(-)

--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -275,6 +275,8 @@ struct console;
  * @hov_state:		The handover state for spin and cleanup
  * @req_state:		The request state for spin and cleanup
  * @spinwait_max_us:	Limit for spinwait acquire
+ * @oldseq:		The sequence number at acquire()
+ * @newseq:		The sequence number for progress
  * @prio:		Priority of the context
  * @txtbuf:		Pointer to the text buffer for this context
  * @thread:		The acquire is printk thread context
@@ -288,6 +290,8 @@ struct cons_context {
 	struct cons_state	old_state;
 	struct cons_state	hov_state;
 	struct cons_state	req_state;
+	u64			oldseq;
+	u64			newseq;
 	unsigned int		spinwait_max_us;
 	enum cons_prio		prio;
 	struct cons_text_buf	*txtbuf;
@@ -330,6 +334,7 @@ struct cons_context_data {
  * @node:		hlist node for the console list
  *
  * @atomic_state:	State array for non-BKL consoles. Real and handover
+ * @atomic_seq:		Sequence for record tracking (32bit only)
  * @pcpu_data:		Pointer to percpu context data
  * @ctxt_data:		Builtin context data for early boot and threaded printing
  */
@@ -353,8 +358,10 @@ struct console {
 	struct hlist_node	node;
 
 	/* NOBKL console specific members */
-	atomic_long_t __private		atomic_state[2];
-
+	atomic_long_t __private	atomic_state[2];
+#ifndef CONFIG_64BIT
+	atomic_t __private	atomic_seq;
+#endif
 	struct cons_context_data __percpu	*pcpu_data;
 	struct cons_context_data		ctxt_data;
 };
--- a/kernel/printk/printk_nobkl.c
+++ b/kernel/printk/printk_nobkl.c
@@ -51,6 +51,8 @@
 
 #ifdef CONFIG_PRINTK
 
+static bool cons_release(struct cons_context *ctxt);
+
 #define copy_full_state(_dst, _src)	do { _dst = _src; } while(0)
 #define copy_bit_state(_dst, _src)	do { _dst.bits = _src.bits; } while(0)
 
@@ -244,6 +246,205 @@ static void cons_context_set_text_buf(st
 }
 
 /**
+ * cons_forward_sequence - Helper function forward the sequence
+ * @con:	Console to work on
+ *
+ * Forward @con->atomic_seq to the oldest available record. For init
+ * only. Do not use for runtime updates.
+ */
+static void cons_forward_sequence(struct console *con)
+{
+	u32 seq = (u32)prb_first_valid_seq(prb);
+#ifdef CONFIG_64BIT
+	struct cons_state state;
+
+	cons_state_read(con, STATE_REAL, &state);
+	state.seq = seq;
+	cons_state_set(con, STATE_REAL, &state);
+#else
+	atomic_set(&ACCESS_PRIVATE(con, atomic_seq), seq);
+#endif
+}
+
+/**
+ * cons_context_sequence_init - Retrieve the last printed sequence number
+ * @ctxt:	Pointer to an aquire context which contains
+ *		all information about the acquire mode
+ *
+ * On return the retrieved sequence number is stored in ctxt->oldseq.
+ *
+ * The sequence number is safe in forceful takeover situations.
+ *
+ * Either the writer succeded to update before it got interrupted
+ * or it failed. In the latter case the takeover will print the
+ * same line again.
+ *
+ * The sequence is only the lower 32bits of the ringbuffer sequence. The
+ * ringbuffer must be 2^31 records ahead to get out of sync. This needs
+ * some care when starting a console, i.e setting the sequence to 0 is
+ * wrong. It has to be set to the oldest valid sequence in the ringbuffer
+ * as that cannot be more than 2^31 records away
+ *
+ * On 64bit the 32bit sequence is part of console::state which is saved
+ * in @ctxt->state. This prevents the 32bit update race.
+ */
+static void cons_context_sequence_init(struct cons_context *ctxt)
+{
+	u64 rbseq;
+
+#ifdef CONFIG_64BIT
+	ctxt->oldseq = ctxt->state.seq;
+#else
+	ctxt->oldseq = atomic_read(&ACCESS_PRIVATE(ctxt->console, atomic_seq));
+#endif
+
+	/*
+	 * The sequence is only the lower 32bits of the ringbuffer
+	 * sequence. So it needs to be expanded to 64bit. Get the next
+	 * sequence number from the ringbuffer and fold it.
+	 */
+	rbseq = prb_next_seq(prb);
+	ctxt->oldseq = rbseq - ((u32)rbseq - (u32)ctxt->oldseq);
+	ctxt->newseq = ctxt->oldseq;
+}
+
+/**
+ * cons_sequence_try_update - Try to update the sequence number
+ * @ctxt:	Pointer to an aquire context which contains
+ *		all information about the acquire mode
+ *
+ * Returns:	True on success
+ *		False on fail.
+ *
+ * Internal helper as the logic is different on 32bit and 64bit.
+ *
+ * On 32 bit the sequence is seperate from state and therefore
+ * subject to a subtle race in the case of hostile takeovers.
+ *
+ * On 64 bit the sequence is part of the state and therefore safe
+ * vs. hostile takeovers.
+ *
+ * In case of fail the console has been taken over and @ctxt is
+ * invalid. Caller has to reacquire the console.
+ */
+#ifdef CONFIG_64BIT
+static bool __maybe_unused cons_sequence_try_update(struct cons_context *ctxt)
+{
+	struct console *con = ctxt->console;
+	struct cons_state old, new;
+
+	cons_state_read(con, STATE_REAL, &old);
+	do {
+		/* Full state compare including sequence */
+		if (!cons_state_full_match(old, ctxt->state))
+			return false;
+
+		/* Preserve bit state */
+		copy_bit_state(new, old);
+		new.seq = ctxt->newseq;
+
+		/*
+		 * Can race with hostile takeover or with a handover
+		 * request.
+		 */
+	} while (!cons_state_try_cmpxchg(con, STATE_REAL, &old, &new));
+
+	copy_full_state(ctxt->state, new);
+	ctxt->oldseq = ctxt->newseq;
+
+	return true;
+}
+#else
+static bool __maybe_unused cons_sequence_try_update(struct cons_context *ctxt)
+{
+	struct console *con = ctxt->console;
+	unsigned long old, new, cur;
+	struct cons_state state;
+	int pcpu;
+
+	/*
+	 * There is a corner case which needs to be considered here:
+	 *
+	 * CPU0			CPU1
+	 * printk()
+	 *  acquire()		-> emergency
+	 *  write()		   acquire()
+	 *  update_seq()
+	 *    state == OK
+	 * --> NMI
+	 *			   takeover()
+	 * <---			     write()
+	 *  cmpxchg() succeeds	     update_seq()
+	 *			     cmpxchg() fails
+	 *
+	 * There is nothing which can be done about this other than having
+	 * yet another state bit which needs to be tracked and analyzed,
+	 * but fails to cover the problem completely.
+	 *
+	 * No other scenarios expose such a problem. On same CPU takeovers
+	 * the cmpxchg() always fails on the interrupted context after the
+	 * interrupting context finished printing, but that's fine as it
+	 * does not own the console anymore. The state check after the
+	 * failed cmpxchg prevents that.
+	 */
+	cons_state_read(con, STATE_REAL, &state);
+	/* Sequence is not part of cons_state on 32bit */
+	if (!cons_state_bits_match(state, ctxt->state))
+		return false;
+
+	/*
+	 * Get the original sequence number which was retrieved
+	 * from @con->atomic_seq. @con->atomic_seq should be still
+	 * the same. 32bit truncates. See cons_context_set_sequence().
+	 */
+	old = (unsigned long)ctxt->oldseq;
+	new = (unsigned long)ctxt->newseq;
+	cur = atomic_cmpxchg(&ACCESS_PRIVATE(con, atomic_seq), old, new);
+	if (cur == old) {
+		ctxt->oldseq = ctxt->newseq;
+		return true;
+	}
+
+	/*
+	 * Reread the state. If the state does not own the console anymore
+	 * then it cannot touch the sequence again.
+	 */
+	cons_state_read(con, STATE_REAL, &state);
+	/* Sequence is not part of cons_state on 32bit */
+	if (!cons_state_bits_match(state, ctxt->state))
+		return false;
+
+	/* If panic and not on the panic CPU, drop the lock */
+	pcpu = atomic_read(&panic_cpu);
+	if (pcpu != PANIC_CPU_INVALID && pcpu != smp_processor_id())
+		goto unlock;
+
+	if (pcpu == smp_processor_id()) {
+		/*
+		 * This is the panic CPU. Emitting a warning here does not
+		 * help at all. The callchain is clear and the priority is
+		 * to get the messages out. In the worst case duplicated
+		 * ones. That's a job for postprocessing.
+		 */
+		atomic_set(&ACCESS_PRIVATE(con, atomic_seq), new);
+		ctxt->oldseq = ctxt->newseq;
+		return true;
+	}
+
+	/*
+	 * Only emit a warning when this happens outside of a panic
+	 * situation as on panic it's neither useful nor helping to let the
+	 * panic CPU get the important stuff out.
+	 */
+	WARN_ON_ONCE(pcpu == PANIC_CPU_INVALID);
+
+unlock:
+	cons_release(ctxt);
+	return false;
+}
+#endif
+
+/**
  * cons_cleanup_handover - Cleanup a handover request
  * @ctxt:	Pointer to acquire context
  *
@@ -519,6 +720,7 @@ static bool __cons_try_acquire(struct co
 		return false;
 success:
 	/* Common updates on success */
+	cons_context_sequence_init(ctxt);
 	cons_context_set_text_buf(ctxt);
 	return true;
 
@@ -529,7 +731,6 @@ static bool __cons_try_acquire(struct co
 	if (!cons_state_try_cmpxchg(con, STATE_REAL, &old, &new))
 		goto again;
 
-	ctxt->hostile = true;
 	copy_full_state(ctxt->state, new);
 	goto success;
 }
@@ -688,6 +889,7 @@ static void cons_nobkl_init(struct conso
 	};
 
 	cons_alloc_percpu_data(con);
+	cons_forward_sequence(con);
 	cons_state_set(con, STATE_REAL, &state);
 }
 


^ permalink raw reply	[flat|nested] 62+ messages in thread

* [patch RFC 23/29] printk: Add non-BKL console print state functions
  2022-09-10 22:27 [patch RFC 00/29] printk: A new approach - WIP Thomas Gleixner
                   ` (21 preceding siblings ...)
  2022-09-10 22:28 ` [patch RFC 22/29] printk: Add sequence handling for non-BKL consoles Thomas Gleixner
@ 2022-09-10 22:28 ` Thomas Gleixner
  2022-09-10 22:28 ` [patch RFC 24/29] printk: Put seq and dropped into cons_text_desc Thomas Gleixner
                   ` (9 subsequent siblings)
  32 siblings, 0 replies; 62+ messages in thread
From: Thomas Gleixner @ 2022-09-10 22:28 UTC (permalink / raw)
  To: LKML
  Cc: John Ogness, Petr Mladek, Sergey Senozhatsky, Steven Rostedt,
	Linus Torvalds, Peter Zijlstra, Paul E. McKenney, Daniel Vetter,
	Greg Kroah-Hartman, Helge Deller, Jason Wessel, Daniel Thompson,
	John Ogness

Provide three functions which are related to the safe handover mechanism
and allow console drivers to denote takeover unsafe sections:

 - console_can_proceed()

   Invoked by a console driver to check whether a handover request is pending
   or whether the console was taken over in a hostile fashion.

 - console_enter/exit_unsafe()

   Invoked by a console driver to reflect that the driver output function is
   about to enter or to leave an critical region where a hostile take over
   is unsafe. These functions are also cancelation points.

   The unsafe state is reflected in the console state and allows a takeover
   attempt to make informed decisions whether to take over and/or output on
   such console at all. The unsafe state is also reflected on output to the
   driver in the write context to the atomic_write() output function so the
   driver can make informed decisions about the required actions or take an
   special emergency path.

Co-Developed-by: John Ogness <jogness@linutronix.de>
Signed-off-by: John Ogness <jogness@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/console.h      |   20 ++++++
 kernel/printk/printk_nobkl.c |  134 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 154 insertions(+)

--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -300,6 +300,22 @@ struct cons_context {
 	unsigned int		spinwait	: 1;
 };
 
+/**
+ * struct cons_write_context - Context handed to the write callbacks
+ * @ctxt:	The core console context
+ * @outbuf:	Pointer to the text buffer for output
+ * @len:	Length to write
+ * @pos:	Current write position in @outbuf
+ * @unsafe:	Invoked in unsafe state due to force takeover
+ */
+struct cons_write_context {
+	struct cons_context __private	ctxt;
+	char				*outbuf;
+	unsigned int			len;
+	unsigned int			pos;
+	bool				unsafe;
+};
+
 #define CONS_MAX_NEST_LVL	8
 
 /**
@@ -423,6 +439,10 @@ extern void console_list_unlock(void) __
 #define for_each_console_kgdb(con)					\
 	hlist_for_each_entry(con, &console_list, node)
 
+extern bool console_can_proceed(struct cons_write_context *wctxt);
+extern bool console_enter_unsafe(struct cons_write_context *wctxt);
+extern bool console_exit_unsafe(struct cons_write_context *wctxt);
+
 extern int console_set_on_cmdline;
 extern struct console *early_console;
 
--- a/kernel/printk/printk_nobkl.c
+++ b/kernel/printk/printk_nobkl.c
@@ -874,6 +874,140 @@ static void cons_free_percpu_data(struct
 }
 
 /**
+ * console_can_proceed - Check whether printing can proceed
+ * @wctxt:	The write context which was handed to the write function
+ *
+ * Returns:	True if the state is correct. False if a handover
+ *		has been requested or if the console was taken
+ *		over.
+ *
+ * Must be invoked after the record was dumped into the assigned record
+ * buffer and at appropriate safe places in the driver.  For unsafe driver
+ * sections see console_enter_unsafe().
+ *
+ * When this function returns false then the calling context is not allowed
+ * to go forward and has to back out immediately and carefully.  The buffer
+ * content is not longer trusted either and the console lock is not longer
+ * held.
+ */
+bool console_can_proceed(struct cons_write_context *wctxt)
+{
+	struct cons_context *ctxt = &ACCESS_PRIVATE(wctxt, ctxt);
+	struct console *con = ctxt->console;
+	struct cons_state state;
+
+	cons_state_read(con, STATE_REAL, &state);
+	/* Store it for analyis or reuse */
+	copy_full_state(ctxt->old_state, state);
+
+	/*
+	 * If the state aside of req_prio is not longer matching, console
+	 * was taken over.
+	 */
+	if (!cons_state_full_match(state, ctxt->state))
+		return false;
+
+	/*
+	 * Having a safe point for take over and eventually a few
+	 * duplicated characters or a full line is way better than a
+	 * hostile takeover. Post processing can take care of the garbage.
+	 * Continue if the requested priority is not sufficient.
+	 */
+	if (state.req_prio <= state.cur_prio)
+		return true;
+
+	/* Release and hand over */
+	cons_release(ctxt);
+	/*
+	 * This does not check whether the handover succeeded. The
+	 * outermost callsite has to do the final check whether printing
+	 * should continue or not. The console is unlocked already so go
+	 * back all the way instead of trying to implement heuristics in
+	 * tons of places.
+	 */
+	return false;
+}
+
+/**
+ * __console_update_unsafe - Update the unsafe bit in @con->atomic_state
+ * @wctxt:	The write context which was handed to the write function
+ *
+ * Returns:	True if the state is correct. False if a handover
+ *		has been requested or if the console was taken
+ *		over.
+ *
+ * Must be invoked before a unsafe driver section is entered.
+ *
+ * When this function returns false then the calling context is not allowed
+ * to go forward and has to back out immediately and carefully.  The buffer
+ * content is not longer trusted either and the console lock is not longer
+ * held.
+ *
+ * Internal helper to avoid duplicated code
+ */
+static bool __console_update_unsafe(struct cons_write_context *wctxt, bool unsafe)
+{
+	struct cons_context *ctxt = &ACCESS_PRIVATE(wctxt, ctxt);
+	struct console *con = ctxt->console;
+	struct cons_state new;
+
+	do  {
+		if (!console_can_proceed(wctxt))
+			return false;
+		/*
+		 * console_can_proceed() saved the real state in
+		 * ctxt->old_state
+		 */
+		copy_full_state(new, ctxt->old_state);
+		new.unsafe = unsafe;
+
+	} while (!cons_state_try_cmpxchg(con, STATE_REAL, &ctxt->old_state, &new));
+
+	copy_full_state(ctxt->state, new);
+	return true;
+}
+
+/**
+ * console_enter_unsafe - Enter an unsafe region in the driver
+ * @wctxt:	The write context which was handed to the write function
+ *
+ * Returns:	True if the state is correct. False if a handover
+ *		has been requested or if the console was taken
+ *		over.
+ *
+ * Must be invoked before a unsafe driver section is entered.
+ *
+ * When this function returns false then the calling context is not allowed
+ * to go forward and has to back out immediately and carefully.  The buffer
+ * content is not longer trusted either and the console lock is not longer
+ * held.
+ */
+bool console_enter_unsafe(struct cons_write_context *wctxt)
+{
+	return __console_update_unsafe(wctxt, true);
+}
+
+/**
+ * console_exit_unsafe - Exit an unsafe region in the driver
+ * @wctxt:	The write context which was handed to the write function
+ *
+ * Returns:	True if the state is correct. False if a handover
+ *		has been requested or if the console was taken
+ *		over.
+ *
+ * Must be invoked before a unsafe driver section is exited.
+ *
+ * When this function returns false then the calling context is not allowed
+ * to go forward and has to back out immediately and carefully.  The buffer
+ * content is not longer trusted either and the console lock is not longer
+ * held.
+ */
+bool console_exit_unsafe(struct cons_write_context *wctxt)
+{
+	return __console_update_unsafe(wctxt, false);
+}
+
+/**
  * cons_nobkl_init - Initialize the NOBKL console state
  * @con:	Console to initialize
  */


^ permalink raw reply	[flat|nested] 62+ messages in thread

* [patch RFC 24/29] printk: Put seq and dropped into cons_text_desc
  2022-09-10 22:27 [patch RFC 00/29] printk: A new approach - WIP Thomas Gleixner
                   ` (22 preceding siblings ...)
  2022-09-10 22:28 ` [patch RFC 23/29] printk: Add non-BKL console print state functions Thomas Gleixner
@ 2022-09-10 22:28 ` Thomas Gleixner
  2022-09-10 22:28 ` [patch RFC 25/29] printk: Provide functions to emit a ringbuffer record on non-BKL consoles Thomas Gleixner
                   ` (8 subsequent siblings)
  32 siblings, 0 replies; 62+ messages in thread
From: Thomas Gleixner @ 2022-09-10 22:28 UTC (permalink / raw)
  To: LKML
  Cc: John Ogness, Petr Mladek, Sergey Senozhatsky, Steven Rostedt,
	Linus Torvalds, Peter Zijlstra, Paul E. McKenney, Daniel Vetter,
	Greg Kroah-Hartman, Helge Deller, Jason Wessel, Daniel Thompson,
	John Ogness

Non-BKL consoles have atomic sequence tracking which is not
compatible with the legacy consoles.

Put sequence and dropped message count into struct cons_text_desc, let
cons_fill_outbuf() operate on the seq/dropped fields and let the call sites
handle the update to the corresponding fields in struct console.

This allows sharing cons_fill_outbuf() between the two worlds. For the
legacy consoles this is not more or less racy than the existing code.

Co-Developed-by: John Ogness <jogness@linutronix.de>
Signed-off-by: John Ogness <jogness@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/console.h |    2 ++
 kernel/printk/printk.c  |   25 ++++++++++++++-----------
 2 files changed, 16 insertions(+), 11 deletions(-)

--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -232,6 +232,7 @@ struct cons_text_buf {
  * @txtbuf:		Pointer to buffer for storing the text
  * @outbuf:		Pointer to the position in @buffer for
  *			writing it out to the device
+ * @seq:		The sequence requested
  * @dropped:		The dropped count
  * @len:		Message length
  * @extmsg:		Select extended format printing
@@ -239,6 +240,7 @@ struct cons_text_buf {
 struct cons_outbuf_desc {
 	struct cons_text_buf	*txtbuf;
 	char			*outbuf;
+	u64			seq;
 	unsigned long		dropped;
 	unsigned int		len;
 	bool			extmsg;
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2740,7 +2740,7 @@ static void cons_print_dropped(struct co
  * If it is NULL then records have been dropped or skipped and con->seq
  * has been forwarded so the caller can try to print the next record.
  */
-static bool cons_fill_outbuf(struct console *con, struct cons_outbuf_desc *desc)
+static bool cons_fill_outbuf(struct cons_outbuf_desc *desc)
 {
 	static int panic_console_dropped;
 
@@ -2755,12 +2755,12 @@ static bool cons_fill_outbuf(struct cons
 
 	prb_rec_init_rd(&r, &info, txtbuf->text, CONSOLE_LOG_MAX);
 
-	if (!prb_read_valid(prb, con->seq, &r))
+	if (!prb_read_valid(prb, desc->seq, &r))
 		return false;
 
-	if (con->seq != r.info->seq) {
-		con->dropped += r.info->seq - con->seq;
-		con->seq = r.info->seq;
+	if (desc->seq != r.info->seq) {
+		desc->dropped += r.info->seq - desc->seq;
+		desc->seq = r.info->seq;
 		if (panic_in_progress() && panic_console_dropped++ > 10) {
 			suppress_panic_printk = 1;
 			pr_warn_once("Too many dropped messages. Suppress messages on non-panic CPUs to prevent livelock.\n");
@@ -2769,7 +2769,7 @@ static bool cons_fill_outbuf(struct cons
 
 	/* Skip record that has level above the console loglevel. */
 	if (suppress_message_printing(r.info->level)) {
-		con->seq++;
+		desc->seq++;
 		return true;
 	}
 
@@ -2789,9 +2789,7 @@ static bool cons_fill_outbuf(struct cons
 
 		desc->len = len;
 		desc->outbuf = txtbuf->text;
-		desc->dropped = con->dropped;
 		cons_print_dropped(desc);
-		con->dropped = desc->dropped;
 	}
 
 	return true;
@@ -2820,16 +2818,21 @@ static bool console_emit_next_record(str
 				     bool *handover, bool extmsg)
 {
 	struct cons_outbuf_desc desc = {
-		.txtbuf	= txtbuf,
-		.extmsg = extmsg,
+		.txtbuf		= txtbuf,
+		.extmsg		= extmsg,
+		.seq		= con->seq,
+		.dropped	= con->dropped,
 	};
 	unsigned long flags;
 
 	*handover = false;
 
-	if (!cons_fill_outbuf(con, &desc))
+	if (!cons_fill_outbuf(&desc))
 		return false;
 
+	con->seq = desc.seq;
+	con->dropped = desc.dropped;
+
 	if (!desc.outbuf)
 		goto skip;
 


^ permalink raw reply	[flat|nested] 62+ messages in thread

* [patch RFC 25/29] printk: Provide functions to emit a ringbuffer record on non-BKL consoles
  2022-09-10 22:27 [patch RFC 00/29] printk: A new approach - WIP Thomas Gleixner
                   ` (23 preceding siblings ...)
  2022-09-10 22:28 ` [patch RFC 24/29] printk: Put seq and dropped into cons_text_desc Thomas Gleixner
@ 2022-09-10 22:28 ` Thomas Gleixner
  2022-09-10 22:28 ` [patch RFC 26/29] printk: Add threaded printing support Thomas Gleixner
                   ` (7 subsequent siblings)
  32 siblings, 0 replies; 62+ messages in thread
From: Thomas Gleixner @ 2022-09-10 22:28 UTC (permalink / raw)
  To: LKML
  Cc: John Ogness, Petr Mladek, Sergey Senozhatsky, Steven Rostedt,
	Linus Torvalds, Peter Zijlstra, Paul E. McKenney, Daniel Vetter,
	Greg Kroah-Hartman, Helge Deller, Jason Wessel, Daniel Thompson,
	John Ogness

Utilize the shared fill function and add the required safety points to
check for handover/takeover and invoke the atomic write function of the
console driver. Add the proper handling for updating the sequence number.

Co-Developed-by: John Ogness <jogness@linutronix.de>
Signed-off-by: John Ogness <jogness@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/console.h      |    7 ++
 kernel/printk/printk_nobkl.c |  107 ++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 112 insertions(+), 2 deletions(-)

--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -281,10 +281,12 @@ struct console;
  * @newseq:		The sequence number for progress
  * @prio:		Priority of the context
  * @txtbuf:		Pointer to the text buffer for this context
+ * @dropped:		Dropped counter for the current context
  * @thread:		The acquire is printk thread context
  * @hostile:		Hostile takeover requested. Cleared on normal
  *			acquire or friendly handover
  * @spinwait:		Spinwait on acquire if possible
+ * @backlog:		Ringbuffer has pending records
  */
 struct cons_context {
 	struct console		*console;
@@ -297,9 +299,11 @@ struct cons_context {
 	unsigned int		spinwait_max_us;
 	enum cons_prio		prio;
 	struct cons_text_buf	*txtbuf;
+	unsigned long		dropped;
 	unsigned int		thread		: 1;
 	unsigned int		hostile		: 1;
 	unsigned int		spinwait	: 1;
+	unsigned int		backlog		: 1;
 };
 
 /**
@@ -380,6 +384,9 @@ struct console {
 #ifndef CONFIG_64BIT
 	atomic_t __private	atomic_seq;
 #endif
+
+	bool (*write_atomic)(struct console *con, struct cons_write_context *wctxt);
+
 	struct cons_context_data __percpu	*pcpu_data;
 	struct cons_context_data		ctxt_data;
 };
--- a/kernel/printk/printk_nobkl.c
+++ b/kernel/printk/printk_nobkl.c
@@ -328,7 +328,7 @@ static void cons_context_sequence_init(s
  * invalid. Caller has to reacquire the console.
  */
 #ifdef CONFIG_64BIT
-static bool __maybe_unused cons_sequence_try_update(struct cons_context *ctxt)
+static bool cons_sequence_try_update(struct cons_context *ctxt)
 {
 	struct console *con = ctxt->console;
 	struct cons_state old, new;
@@ -354,7 +354,7 @@ static bool __maybe_unused cons_sequence
 	return true;
 }
 #else
-static bool __maybe_unused cons_sequence_try_update(struct cons_context *ctxt)
+static bool cons_sequence_try_update(struct cons_context *ctxt)
 {
 	struct console *con = ctxt->console;
 	unsigned long old, new, cur;
@@ -1006,6 +1006,109 @@ bool console_exit_unsafe(struct cons_wri
 	return __console_update_unsafe(wctxt, false);
 }
 
+static bool cons_fill_outbuf(struct cons_outbuf_desc *desc);
+
+/**
+ * cons_get_record - Fill the buffer with the next pending ringbuffer record
+ * @wctxt:	The write context which will be handed to the write function
+ *
+ * Returns:	True if there are records to print. If the output buffer is
+ *		filled @wctxt->outbuf points to the text, otherwise it is NULL.
+ *
+ *		False signals that there are no pending records anymore and
+ *		the printing can stop.
+ */
+static bool cons_get_record(struct cons_write_context *wctxt)
+{
+	struct cons_context *ctxt = &ACCESS_PRIVATE(wctxt, ctxt);
+	struct console *con = ctxt->console;
+	struct cons_outbuf_desc desc = {
+		.txtbuf		= ctxt->txtbuf,
+		.extmsg		= con->flags & CON_EXTENDED,
+		.seq		= ctxt->newseq,
+		.dropped	= ctxt->dropped,
+	};
+	bool progress = cons_fill_outbuf(&desc);
+
+	ctxt->newseq = desc.seq;
+	ctxt->dropped = desc.dropped;
+
+	wctxt->pos = 0;
+	wctxt->len = desc.len;
+	wctxt->outbuf = desc.outbuf;
+	return progress;
+}
+
+/**
+ * cons_emit_record - Emit record in the acquired context
+ * @wctxt:	The write context which will be handed to the write function
+ *
+ * Returns:	False if the operation was aborted (takeover)
+ *		True otherwise
+ *
+ * In case of takeover the caller is not allowed to touch console state.
+ * The console is owned by someone else. If the caller wants to print
+ * more it has to reacquire the console first.
+ *
+ * If it returns true @wctxt->ctxt.backlog indicates whether there are
+ * still records pending in the ringbuffer,
+ */
+static int __maybe_unused cons_emit_record(struct cons_write_context *wctxt)
+{
+	struct cons_context *ctxt = &ACCESS_PRIVATE(wctxt, ctxt);
+	struct console *con = ctxt->console;
+	bool done = false;
+
+	/*
+	 * @con->dropped is not protected in case of hostile takeovers so
+	 * the update below is racy. Annotate it accordingly.
+	 */
+	ctxt->dropped = data_race(READ_ONCE(con->dropped));
+
+	/* Fill the output buffer with the next record */
+	ctxt->backlog = cons_get_record(wctxt);
+	if (!ctxt->backlog)
+		return true;
+
+	/* Safety point. Don't touch state in case of takeover */
+	if (!console_can_proceed(wctxt))
+		return false;
+
+	/* Counterpart to the read above */
+	WRITE_ONCE(con->dropped, ctxt->dropped);
+
+	/*
+	 * In case of skipped records, Update sequence state in @con.
+	 */
+	if (!wctxt->outbuf)
+		goto update;
+
+	/* Tell the driver about potential unsafe state */
+	wctxt->unsafe = ctxt->state.unsafe;
+
+	if (!ctxt->thread && con->write_atomic) {
+		done = con->write_atomic(con, wctxt);
+	} else {
+		cons_release(ctxt);
+		WARN_ON_ONCE(1);
+		return false;
+	}
+
+	/* If not done, the write was aborted due to takeover */
+	if (!done)
+		return false;
+
+	ctxt->newseq++;
+update:
+	/*
+	 * The sequence update attempt is not part of console_release()
+	 * because in panic situations the console is not released by
+	 * the panic CPU until all records are written. On 32bit the
+	 * sequence is seperate from state anyway.
+	 */
+	return cons_sequence_try_update(ctxt);
+}
+
 /**
  * cons_nobkl_init - Initialize the NOBKL console state
  * @con:	Console to initialize


^ permalink raw reply	[flat|nested] 62+ messages in thread

* [patch RFC 26/29] printk: Add threaded printing support
  2022-09-10 22:27 [patch RFC 00/29] printk: A new approach - WIP Thomas Gleixner
                   ` (24 preceding siblings ...)
  2022-09-10 22:28 ` [patch RFC 25/29] printk: Provide functions to emit a ringbuffer record on non-BKL consoles Thomas Gleixner
@ 2022-09-10 22:28 ` Thomas Gleixner
  2022-09-10 22:28 ` [patch RFC 27/29] printk: Add write context storage for atomic writes Thomas Gleixner
                   ` (6 subsequent siblings)
  32 siblings, 0 replies; 62+ messages in thread
From: Thomas Gleixner @ 2022-09-10 22:28 UTC (permalink / raw)
  To: LKML
  Cc: John Ogness, Petr Mladek, Sergey Senozhatsky, Steven Rostedt,
	Linus Torvalds, Peter Zijlstra, Paul E. McKenney, Daniel Vetter,
	Greg Kroah-Hartman, Helge Deller, Jason Wessel, Daniel Thompson,
	John Ogness

From: John Ogness <jogness@linutronix.de>

Add the infrastructure to create a printer thread per console along with
the required thread function which is takeover/handover aware.

Signed-off-by: John Ogness <jogness@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/console.h      |   12 ++
 kernel/printk/printk_nobkl.c |  207 +++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 214 insertions(+), 5 deletions(-)

--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -17,6 +17,7 @@
 #include <linux/atomic.h>
 #include <linux/bits.h>
 #include <linux/rculist.h>
+#include <linux/rcuwait.h>
 #include <linux/types.h>
 
 struct vc_data;
@@ -357,6 +358,12 @@ struct cons_context_data {
  *
  * @atomic_state:	State array for non-BKL consoles. Real and handover
  * @atomic_seq:		Sequence for record tracking (32bit only)
+ * @kthread:		Pointer to kernel thread
+ * @rcuwait:		RCU wait for the kernel thread
+ * @kthread_running:	Indicator whether the kthread is running
+ * @thread_txtbuf:	Pointer to thread private buffer
+ * @write_atomic:	Write callback for atomic context
+ * @write_thread:	Write callback for threaded printing
  * @pcpu_data:		Pointer to percpu context data
  * @ctxt_data:		Builtin context data for early boot and threaded printing
  */
@@ -384,8 +391,13 @@ struct console {
 #ifndef CONFIG_64BIT
 	atomic_t __private	atomic_seq;
 #endif
+	struct task_struct	*kthread;
+	struct rcuwait		rcuwait;
+	atomic_t		kthread_running;
+	struct cons_text_buf	*thread_txtbuf;
 
 	bool (*write_atomic)(struct console *con, struct cons_write_context *wctxt);
+	bool (*write_thread)(struct console *con, struct cons_write_context *wctxt);
 
 	struct cons_context_data __percpu	*pcpu_data;
 	struct cons_context_data		ctxt_data;
--- a/kernel/printk/printk_nobkl.c
+++ b/kernel/printk/printk_nobkl.c
@@ -50,6 +50,8 @@
  */
 
 #ifdef CONFIG_PRINTK
+static bool printk_threads_enabled __ro_after_init;
+static bool printk_force_atomic __initdata;
 
 static bool cons_release(struct cons_context *ctxt);
 
@@ -238,8 +240,8 @@ static void cons_context_set_text_buf(st
 {
 	struct console *con = ctxt->console;
 
-	/* Early boot or allocation fail? */
-	if (!con->pcpu_data)
+	/* Early boot, allocation fail or thread context? */
+	if (!con->pcpu_data || ctxt->thread)
 		ctxt->txtbuf = &con->ctxt_data.txtbuf;
 	else
 		ctxt->txtbuf = &(this_cpu_ptr(con->pcpu_data)->txtbuf);
@@ -840,6 +842,8 @@ static bool __maybe_unused cons_release(
 {
 	bool ret = __cons_release(ctxt);
 
+	/* Invalidate the record pointer. It's not longer valid */
+	ctxt->txtbuf = NULL;
 	ctxt->state.atom = 0;
 	return ret;
 }
@@ -1022,10 +1026,9 @@ static bool cons_fill_outbuf(struct cons
 static bool cons_get_record(struct cons_write_context *wctxt)
 {
 	struct cons_context *ctxt = &ACCESS_PRIVATE(wctxt, ctxt);
-	struct console *con = ctxt->console;
 	struct cons_outbuf_desc desc = {
 		.txtbuf		= ctxt->txtbuf,
-		.extmsg		= con->flags & CON_EXTENDED,
+		.extmsg		= ctxt->console->flags & CON_EXTENDED,
 		.seq		= ctxt->newseq,
 		.dropped	= ctxt->dropped,
 	};
@@ -1054,7 +1057,7 @@ static bool cons_get_record(struct cons_
  * If it returns true @wctxt->ctxt.backlog indicates whether there are
  * still records pending in the ringbuffer,
  */
-static int __maybe_unused cons_emit_record(struct cons_write_context *wctxt)
+static bool cons_emit_record(struct cons_write_context *wctxt)
 {
 	struct cons_context *ctxt = &ACCESS_PRIVATE(wctxt, ctxt);
 	struct console *con = ctxt->console;
@@ -1089,6 +1092,8 @@ static int __maybe_unused cons_emit_reco
 
 	if (!ctxt->thread && con->write_atomic) {
 		done = con->write_atomic(con, wctxt);
+	} else if (ctxt->thread && con->write_thread) {
+		done = con->write_thread(con, wctxt);
 	} else {
 		cons_release(ctxt);
 		WARN_ON_ONCE(1);
@@ -1111,6 +1116,194 @@ static int __maybe_unused cons_emit_reco
 }
 
 /**
+ * cons_kthread_should_run - Check whether the printk thread should run
+ * @con:	Console to operate on
+ * @ctxt:	The acquire context which contains the state
+ *		at console_acquire()
+ */
+static bool cons_kthread_should_run(struct console *con, struct cons_context *ctxt)
+{
+	if (kthread_should_stop())
+		return true;
+
+	/* This reads state and sequence on 64bit. On 32bit only state */
+	cons_state_read(con, STATE_REAL, &ctxt->state);
+	/* Bring the sequence in @ctxt up to date */
+	cons_context_sequence_init(ctxt);
+
+	if (!cons_state_ok(ctxt->state))
+		return false;
+
+	/*
+	 * Atomic printing is running on some other CPU. The owner
+	 * will wake the console thread on unlock if necessary.
+	 */
+	if (ctxt->state.locked)
+		return false;
+
+	return prb_read_valid(prb, ctxt->oldseq, NULL);
+}
+
+/**
+ * cons_kthread_func - The printk thread function
+ * @__console:	Console to operate on
+ */
+static int cons_kthread_func(void *__console)
+{
+	struct console *con = __console;
+	struct cons_write_context wctxt = {
+		.ctxt.console	= con,
+		.ctxt.prio	= CONS_PRIO_NORMAL,
+		.ctxt.thread	= 1,
+	};
+	struct cons_context *ctxt = &ACCESS_PRIVATE(&wctxt, ctxt);
+	int ret;
+
+	atomic_set(&con->kthread_running, 1);
+
+	for (;;) {
+		atomic_dec(&con->kthread_running);
+		/*
+		 * Provides a full memory barrier vs. cons_kthread_wake().
+		 */
+		ret = rcuwait_wait_event(&con->rcuwait, cons_kthread_should_run(con, ctxt),
+					 TASK_INTERRUPTIBLE);
+
+		if (kthread_should_stop())
+			break;
+
+		atomic_inc(&con->kthread_running);
+
+		/* Wait was interrupted by a spurious signal, go back to sleep */
+		if (ret)
+			continue;
+
+		for (;;) {
+			bool backlog;
+
+			/*
+			 * Ensure this stays on the CPU to make handover and
+			 * takeover possible.
+			 */
+			migrate_disable();
+
+			/*
+			 * Try to acquire the console without attempting to
+			 * take over.  If an atomic printer wants to hand
+			 * back to the thread it simply wakes it up.
+			 */
+			if (!cons_try_acquire(ctxt))
+				break;
+
+			/* Stop when the console was handed/taken over */
+			if (!cons_emit_record(&wctxt))
+				break;
+
+			backlog = ctxt->backlog;
+
+			/* Stop when the console was handed/taken over */
+			if (!cons_release(ctxt))
+				break;
+
+			/* Backlog done? */
+			if (!backlog)
+				break;
+
+			migrate_enable();
+			cond_resched();
+		}
+		migrate_enable();
+	}
+	return 0;
+}
+
+/**
+ * cons_kthread_wake - Wake up a printk thread
+ * @con:	Console to operate on
+ */
+static inline void cons_kthread_wake(struct console *con)
+{
+	rcuwait_wake_up(&con->rcuwait);
+}
+
+/**
+ * cons_kthread_stop - Stop a printk thread
+ * @con:	Console to operate on
+ */
+static void cons_kthread_stop(struct console *con)
+{
+	struct task_struct *kt;
+
+	lockdep_assert_held(&console_mutex);
+
+	if (!con->kthread)
+		return;
+
+	/*
+	 * Nothing else than the thread itself can see @con->kthread
+	 * anymore. @con is unhashed and all list walkers are synchronized.
+	 */
+	kt = con->kthread;
+	con->kthread = NULL;
+	kthread_stop(kt);
+
+	kfree(con->thread_txtbuf);
+	con->thread_txtbuf = NULL;
+}
+
+/**
+ * cons_kthread_create - Create a printk thread
+ * @con:	Console to operate on
+ *
+ * If it fails, let the console proceed. The atomic part might
+ * be usable and useful.
+ */
+static void cons_kthread_create(struct console *con)
+{
+	struct task_struct *kt;
+
+	lockdep_assert_held(&console_mutex);
+
+	if (!(con->flags & CON_NO_BKL) || !con->write_thread)
+		return;
+
+	if (!printk_threads_enabled || con->kthread)
+		return;
+
+	con->thread_txtbuf = kmalloc(sizeof(*con->thread_txtbuf), GFP_KERNEL);
+	if (!con->thread_txtbuf) {
+		con_printk(KERN_ERR, con, "unable to allocate memory for printing thread\n");
+		return;
+	}
+
+	kt = kthread_run(cons_kthread_func, con, "pr/%s%d", con->name, con->index);
+	if (IS_ERR(kt)) {
+		con_printk(KERN_ERR, con, "unable to start printing thread\n");
+		kfree(con->thread_txtbuf);
+		con->thread_txtbuf = NULL;
+		return;
+	}
+
+	con->kthread = kt;
+}
+
+static int __init printk_setup_threads(void)
+{
+	struct console *con;
+
+	if (printk_force_atomic)
+		return 0;
+
+	console_list_lock();
+	printk_threads_enabled = true;
+	for_each_registered_console(con)
+		cons_kthread_create(con);
+	console_list_unlock();
+	return 0;
+}
+early_initcall(printk_setup_threads);
+
+/**
  * cons_nobkl_init - Initialize the NOBKL console state
  * @con:	Console to initialize
  */
@@ -1123,7 +1316,10 @@ static void cons_nobkl_init(struct conso
 
 	cons_alloc_percpu_data(con);
 	cons_forward_sequence(con);
+	rcuwait_init(&con->rcuwait);
+	cons_kthread_create(con);
 	cons_state_set(con, STATE_REAL, &state);
+	cons_kthread_wake(con);
 }
 
 /**
@@ -1134,6 +1330,7 @@ static void cons_nobkl_cleanup(struct co
 {
 	struct cons_state state = { };
 
+	cons_kthread_stop(con);
 	cons_state_set(con, STATE_REAL, &state);
 	cons_free_percpu_data(con);
 }


^ permalink raw reply	[flat|nested] 62+ messages in thread

* [patch RFC 27/29] printk: Add write context storage for atomic writes
  2022-09-10 22:27 [patch RFC 00/29] printk: A new approach - WIP Thomas Gleixner
                   ` (25 preceding siblings ...)
  2022-09-10 22:28 ` [patch RFC 26/29] printk: Add threaded printing support Thomas Gleixner
@ 2022-09-10 22:28 ` Thomas Gleixner
  2022-09-10 22:28 ` [patch RFC 28/29] printk: Provide functions for atomic write enforcement Thomas Gleixner
                   ` (5 subsequent siblings)
  32 siblings, 0 replies; 62+ messages in thread
From: Thomas Gleixner @ 2022-09-10 22:28 UTC (permalink / raw)
  To: LKML
  Cc: John Ogness, Petr Mladek, Sergey Senozhatsky, Steven Rostedt,
	Linus Torvalds, Peter Zijlstra, Paul E. McKenney, Daniel Vetter,
	Greg Kroah-Hartman, Helge Deller, Jason Wessel, Daniel Thompson,
	John Ogness

From: John Ogness <jogness@linutronix.de>

The number of consoles is unknown at compile time and allocating write
contexts on stack in emergency/panic situations is not desired either.

Allocate a write context array (one for each priority level) along with
the per CPU output buffers.

Signed-off-by: John Ogness <jogness@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/console.h |    7 +++++++
 1 file changed, 7 insertions(+)

--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -253,6 +253,7 @@ struct cons_outbuf_desc {
  * @CONS_PRIO_NORMAL:		Regular printk
  * @CONS_PRIO_EMERGENCY:	Emergency output (WARN/OOPS...)
  * @CONS_PRIO_PANIC:		Panic output
+ * @CONS_PRIO_MAX:		The number of priority levels
  *
  * Emergency output can carefully takeover the console even without consent
  * of the owner, ideally only when @cons_state::unsafe is not set. Panic
@@ -265,6 +266,7 @@ enum cons_prio {
 	CONS_PRIO_NORMAL,
 	CONS_PRIO_EMERGENCY,
 	CONS_PRIO_PANIC,
+	CONS_PRIO_MAX,
 };
 
 struct console;
@@ -327,12 +329,17 @@ struct cons_write_context {
 
 /**
  * struct cons_context_data - console context data
+ * @wctxt:		Write context per priority level
  * @txtbuf:		Buffer for storing the text
  *
  * Used for early boot embedded into struct console and for
  * per CPU data.
+ *
+ * The write contexts are allocated to avoid having them on stack, e.g. in
+ * warn() or panic().
  */
 struct cons_context_data {
+	struct cons_write_context	wctxt[CONS_PRIO_MAX];
 	struct cons_text_buf		txtbuf;
 };
 


^ permalink raw reply	[flat|nested] 62+ messages in thread

* [patch RFC 28/29] printk: Provide functions for atomic write enforcement
  2022-09-10 22:27 [patch RFC 00/29] printk: A new approach - WIP Thomas Gleixner
                   ` (26 preceding siblings ...)
  2022-09-10 22:28 ` [patch RFC 27/29] printk: Add write context storage for atomic writes Thomas Gleixner
@ 2022-09-10 22:28 ` Thomas Gleixner
  2022-09-27 13:55   ` John Ogness
                     ` (3 more replies)
  2022-09-10 22:28 ` [patch RFC 29/29] printk: Add atomic write enforcement to warn/panic Thomas Gleixner
                   ` (4 subsequent siblings)
  32 siblings, 4 replies; 62+ messages in thread
From: Thomas Gleixner @ 2022-09-10 22:28 UTC (permalink / raw)
  To: LKML
  Cc: John Ogness, Petr Mladek, Sergey Senozhatsky, Steven Rostedt,
	Linus Torvalds, Peter Zijlstra, Paul E. McKenney, Daniel Vetter,
	Greg Kroah-Hartman, Helge Deller, Jason Wessel, Daniel Thompson,
	John Ogness

From: John Ogness <jogness@linutronix.de>

Threaded printk is the preferred mechanism to tame the noisyness of printk,
but WARN/OOPS/PANIC require to print out immediately as the printer threads
might not be able to run.

Add per CPU state which denotes the priority/urgency of the output and
provide functions which flush the printk backlog during early boot and in
priority elevated contexts.

Signed-off-by: John Ogness <jogness@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/console.h      |    6 +
 kernel/printk/printk.c       |   26 +++--
 kernel/printk/printk_nobkl.c |  217 +++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 240 insertions(+), 9 deletions(-)

--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -16,6 +16,7 @@
 
 #include <linux/atomic.h>
 #include <linux/bits.h>
+#include <linux/irq_work.h>
 #include <linux/rculist.h>
 #include <linux/rcuwait.h>
 #include <linux/types.h>
@@ -367,6 +368,7 @@ struct cons_context_data {
  * @atomic_seq:		Sequence for record tracking (32bit only)
  * @kthread:		Pointer to kernel thread
  * @rcuwait:		RCU wait for the kernel thread
+ * @irq_work:		IRQ work for thread wakeup
  * @kthread_running:	Indicator whether the kthread is running
  * @thread_txtbuf:	Pointer to thread private buffer
  * @write_atomic:	Write callback for atomic context
@@ -400,6 +402,7 @@ struct console {
 #endif
 	struct task_struct	*kthread;
 	struct rcuwait		rcuwait;
+	struct irq_work		irq_work;
 	atomic_t		kthread_running;
 	struct cons_text_buf	*thread_txtbuf;
 
@@ -471,6 +474,9 @@ extern bool console_can_proceed(struct c
 extern bool console_enter_unsafe(struct cons_write_context *wctxt);
 extern bool console_exit_unsafe(struct cons_write_context *wctxt);
 
+extern enum cons_prio cons_atomic_enter(enum cons_prio prio);
+extern void cons_atomic_exit(enum cons_prio prio, enum cons_prio prev_prio);
+
 extern int console_set_on_cmdline;
 extern struct console *early_console;
 
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1072,6 +1072,8 @@ static inline void log_buf_add_cpu(void)
 #endif /* CONFIG_SMP */
 
 static void cons_alloc_percpu_data(struct console *con);
+static void cons_atomic_flush(void);
+static void cons_wake_threads(void);
 
 static void __init set_percpu_data_ready(void)
 {
@@ -2270,17 +2272,21 @@ asmlinkage int vprintk_emit(int facility
 
 	printed_len = vprintk_store(facility, level, dev_info, fmt, args);
 
+	/*
+	 * The caller may be holding system-critical or
+	 * timing-sensitive locks. Disable preemption during
+	 * printing of all remaining records to all consoles so that
+	 * this context can return as soon as possible. Hopefully
+	 * another printk() caller will take over the printing.
+	 */
+	preempt_disable();
+
+	/* Flush the non-BKL consoles if required */
+	cons_atomic_flush();
+
 	/* If called from the scheduler, we can not call up(). */
 	if (!in_sched) {
 		/*
-		 * The caller may be holding system-critical or
-		 * timing-sensitive locks. Disable preemption during
-		 * printing of all remaining records to all consoles so that
-		 * this context can return as soon as possible. Hopefully
-		 * another printk() caller will take over the printing.
-		 */
-		preempt_disable();
-		/*
 		 * Try to acquire and then immediately release the console
 		 * semaphore. The release will print out buffers. With the
 		 * spinning variant, this context tries to take over the
@@ -2288,9 +2294,11 @@ asmlinkage int vprintk_emit(int facility
 		 */
 		if (console_trylock_spinning())
 			console_unlock();
-		preempt_enable();
 	}
 
+	preempt_enable();
+
+	cons_wake_threads();
 	wake_up_klogd();
 	return printed_len;
 }
--- a/kernel/printk/printk_nobkl.c
+++ b/kernel/printk/printk_nobkl.c
@@ -1231,6 +1231,222 @@ static inline void cons_kthread_wake(str
 }
 
 /**
+ * cons_irq_work - irq work to wake printk thread
+ * @irq_work:	The irq work to operate on
+ */
+static void cons_irq_work(struct irq_work *irq_work)
+{
+	struct console *con = container_of(irq_work, struct console, irq_work);
+
+	cons_kthread_wake(con);
+}
+
+/**
+ * cons_wake_threads - Wake up printing threads
+ */
+static void cons_wake_threads(void)
+{
+	struct console *con;
+	int cookie;
+
+	cookie = srcu_read_lock(&console_srcu);
+	for_each_console_srcu(con) {
+		if (con->kthread && !atomic_read(&con->kthread_running))
+			irq_work_queue(&con->irq_work);
+	}
+	srcu_read_unlock(&console_srcu, cookie);
+}
+
+/**
+ * struct cons_cpu_state - Per CPU printk context state
+ * @prio:	The current context priority level
+ * @nesting:	Per priority nest counter
+ */
+struct cons_cpu_state {
+	enum cons_prio	prio;
+	int		nesting[CONS_PRIO_MAX];
+};
+
+static DEFINE_PER_CPU(struct cons_cpu_state, cons_pcpu_state);
+static struct cons_cpu_state early_cons_pcpu_state __initdata;
+
+/**
+ * cons_get_cpu_state - Get the per CPU console state pointer
+ *
+ * Returns either a pointer to the per CPU state of the current CPU or to
+ * the init data state during early boot.
+ */
+static __ref struct cons_cpu_state *cons_get_cpu_state(void)
+{
+	if (!printk_percpu_data_ready())
+		return &early_cons_pcpu_state;
+	else
+		return this_cpu_ptr(&cons_pcpu_state);
+}
+
+/**
+ * cons_get_wctxt - Get the write context for atomic printing
+ * @con:	Console to operate on
+ * @prio:	Priority of the context
+ *
+ * Returns either the per CPU context or the builtin context for
+ * early boot.
+ */
+static struct cons_write_context *cons_get_wctxt(struct console *con,
+						 enum cons_prio prio)
+{
+	if (!con->pcpu_data)
+		return &con->ctxt_data.wctxt[prio];
+
+	return &this_cpu_ptr(con->pcpu_data)->wctxt[prio];
+}
+
+/**
+ * cons_atomic_try_acquire - Try to acquire the console for atomic printing
+ * @con:	The console to acquire
+ * @ctxt:	The console context instance to work on
+ * @prio:	The priority of the current context
+ */
+static bool cons_atomic_try_acquire(struct console *con, struct cons_context *ctxt,
+				    enum cons_prio prio)
+{
+	memset(ctxt, 0, sizeof(*ctxt));
+	ctxt->console		= con;
+	ctxt->spinwait_max_us	= 2000;
+	ctxt->prio		= prio;
+	ctxt->spinwait		= 1;
+
+	/* Try to acquire it directly or via a friendly handover */
+	if (cons_try_acquire(ctxt))
+		return true;
+
+	/* Investigate whether a hostile takeover is due */
+	if (ctxt->old_state.cur_prio >= prio)
+		return false;
+
+	ctxt->hostile = 1;
+	return cons_try_acquire(ctxt);
+}
+
+/**
+ * cons_atomic_flush_one - Flush one console in atomic mode
+ * @con:	The console to flush
+ * @prio:	The priority of the current context
+ */
+static void cons_atomic_flush_one(struct console *con, enum cons_prio prio)
+{
+	struct cons_write_context *wctxt = cons_get_wctxt(con, prio);
+	struct cons_context *ctxt = &ACCESS_PRIVATE(wctxt, ctxt);
+
+	if (!cons_atomic_try_acquire(con, ctxt, prio))
+		return;
+
+	do {
+		/*
+		 * cons_emit_record() returns false when the console was
+		 * handed over or taken over. In both cases the context is
+		 * not longer valid.
+		 */
+		if (!cons_emit_record(wctxt))
+			return;
+	} while (ctxt->backlog);
+
+	cons_release(ctxt);
+}
+
+/**
+ * cons_atomic_flush - Flush consoles in atomic mode if required
+ */
+static void cons_atomic_flush(void)
+{
+	struct cons_cpu_state *cpu_state;
+	struct console *con;
+	int cookie;
+
+	cpu_state = cons_get_cpu_state();
+
+	/*
+	 * Let the outermost write of this priority print. This avoids
+	 * nasty hackery for nested WARN() where the printing itself
+	 * generates one.
+	 *
+	 * cpu_state->prio <= CONS_PRIO_NORMAL is not subject to nesting
+	 * and it can fall through for early boot and for consoles which do
+	 * not have a kthread (yet). For simplicity sake just fall through.
+	 */
+	if (cpu_state->prio > CONS_PRIO_NORMAL &&
+	    cpu_state->nesting[cpu_state->prio] != 1)
+		return;
+
+	cookie = srcu_read_lock(&console_srcu);
+	for_each_console_srcu(con) {
+		if (!con->write_atomic)
+			continue;
+
+		if (cpu_state->prio > CONS_PRIO_NORMAL || !con->kthread)
+			cons_atomic_flush_one(con, cpu_state->prio);
+	}
+	srcu_read_unlock(&console_srcu, cookie);
+}
+
+/**
+ * cons_atomic_enter - Enter a context which enforces atomic printing
+ * @prio:	Priority of the context
+ *
+ * Returns:	The previous priority which needs to be fed into
+ *		the corresponding cons_atomic_exit()
+ */
+enum cons_prio cons_atomic_enter(enum cons_prio prio)
+{
+	struct cons_cpu_state *cpu_state;
+	enum cons_prio prev_prio;
+
+	migrate_disable();
+	cpu_state = cons_get_cpu_state();
+
+	prev_prio = cpu_state->prio;
+	if (prev_prio < prio)
+		cpu_state->prio = prio;
+
+	/*
+	 * Increment the nesting on @cpu_state->prio so a WARN()
+	 * nested into a panic printout does not attempt to
+	 * scribble state.
+	 */
+	cpu_state->nesting[cpu_state->prio]++;
+
+	return prev_prio;
+}
+
+/**
+ * cons_atomic_exit - Exit a context which enforces atomic printing
+ * @prio:	Priority of the context to leave
+ * @prev_prio:	Priority of the previous context for restore
+ *
+ * @prev_prio is the priority returned by the corresponding cons_atomic_enter().
+ */
+void cons_atomic_exit(enum cons_prio prio, enum cons_prio prev_prio)
+{
+	struct cons_cpu_state *cpu_state;
+
+	cpu_state = cons_get_cpu_state();
+
+	/*
+	 * Undo the nesting of cons_atomic_enter() at the CPU state
+	 * priority.
+	 */
+	cpu_state->nesting[cpu_state->prio]--;
+
+	/*
+	 * Restore the previous priority which was returned by
+	 * cons_atomic_enter().
+	 */
+	cpu_state->prio = prev_prio;
+
+	migrate_enable();
+}
+
+/**
  * cons_kthread_stop - Stop a printk thread
  * @con:	Console to operate on
  */
@@ -1321,6 +1537,7 @@ static void cons_nobkl_init(struct conso
 	cons_alloc_percpu_data(con);
 	cons_forward_sequence(con);
 	rcuwait_init(&con->rcuwait);
+	init_irq_work(&con->irq_work, cons_irq_work);
 	cons_kthread_create(con);
 	cons_state_set(con, STATE_REAL, &state);
 	cons_kthread_wake(con);


^ permalink raw reply	[flat|nested] 62+ messages in thread

* [patch RFC 29/29] printk: Add atomic write enforcement to warn/panic...
  2022-09-10 22:27 [patch RFC 00/29] printk: A new approach - WIP Thomas Gleixner
                   ` (27 preceding siblings ...)
  2022-09-10 22:28 ` [patch RFC 28/29] printk: Provide functions for atomic write enforcement Thomas Gleixner
@ 2022-09-10 22:28 ` Thomas Gleixner
  2022-09-10 22:56 ` [patch RFC 00/29] printk: A new approach - WIP Thomas Gleixner
                   ` (3 subsequent siblings)
  32 siblings, 0 replies; 62+ messages in thread
From: Thomas Gleixner @ 2022-09-10 22:28 UTC (permalink / raw)
  To: LKML
  Cc: John Ogness, Petr Mladek, Sergey Senozhatsky, Steven Rostedt,
	Linus Torvalds, Peter Zijlstra, Paul E. McKenney, Daniel Vetter,
	Greg Kroah-Hartman, Helge Deller, Jason Wessel, Daniel Thompson,
	John Ogness

Signed-off-by: John Ogness <jogness@linutronix.de>

Invoke the atomic write enforcement functions for warn/panic to ensure that
the information gets out to the consoles.

This is not yet a final solution as this still unlocks consoles and depends
on the "reliablity" of legacy consoles which are invoked during printk().

Once the legacy is converted over this can be changed to lock consoles on
entry, print undisturbed and release them on exit.

Signed-off-by: John Ogness <jogness@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/panic.c |   12 ++++++++++++
 1 file changed, 12 insertions(+)

--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -256,6 +256,12 @@ void panic(const char *fmt, ...)
 	if (old_cpu != PANIC_CPU_INVALID && old_cpu != this_cpu)
 		panic_smp_self_stop();
 
+	/*
+	 * No point in saving the previous printk severity level
+	 * here. Panic won't come back
+	 */
+	cons_atomic_enter(CONS_PRIO_PANIC);
+
 	console_verbose();
 	bust_spinlocks(1);
 	va_start(args, fmt);
@@ -602,6 +608,10 @@ struct warn_args {
 void __warn(const char *file, int line, void *caller, unsigned taint,
 	    struct pt_regs *regs, struct warn_args *args)
 {
+	enum cons_prio prev_prio;
+
+	prev_prio = cons_atomic_enter(CONS_PRIO_EMERGENCY);
+
 	disable_trace_on_warning();
 
 	if (file)
@@ -633,6 +643,8 @@ void __warn(const char *file, int line,
 
 	/* Just a warning, don't kill lockdep. */
 	add_taint(taint, LOCKDEP_STILL_OK);
+
+	cons_atomic_exit(CONS_PRIO_EMERGENCY, prev_prio);
 }
 
 #ifndef __WARN_FLAGS


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [patch RFC 00/29] printk: A new approach - WIP
  2022-09-10 22:27 [patch RFC 00/29] printk: A new approach - WIP Thomas Gleixner
                   ` (28 preceding siblings ...)
  2022-09-10 22:28 ` [patch RFC 29/29] printk: Add atomic write enforcement to warn/panic Thomas Gleixner
@ 2022-09-10 22:56 ` Thomas Gleixner
  2022-09-11  9:01 ` Paul E. McKenney
                   ` (2 subsequent siblings)
  32 siblings, 0 replies; 62+ messages in thread
From: Thomas Gleixner @ 2022-09-10 22:56 UTC (permalink / raw)
  To: LKML
  Cc: John Ogness, Petr Mladek, Sergey Senozhatsky, Steven Rostedt,
	Linus Torvalds, Peter Zijlstra, Paul E. McKenney, Daniel Vetter,
	Greg Kroah-Hartman, Helge Deller, Jason Wessel, Daniel Thompson

On Sun, Sep 11 2022 at 00:27, Thomas Gleixner wrote:
> For testing we used the most simple driver: a hacked up version of the
> early uart8250 console as we wanted to concentrate on validating the core
> mechanisms of friendly handover and hostile takeovers instead of dealing
> with the horrors of port locks or whatever at the same time. That's the
> next challenge. Hack patch will be provided in a reply.

Here you go.

---
Subject: serial: 8250: Use 8250 serial for exploring noBKL consoles
From: John Ogness <jogness@linutronix.de>
Date: Sat, 10 Sep 2022 01:05:34 +0200

From: John Ogness <jogness@linutronix.de>

Utilize 8250 early console - the only console in the kernel which is
reentrancy and NMI safe - to explore the noBKL console infrastructure.

Not-Signed-off-by: John Ogness <jogness@linutronix.de>
Not-Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 drivers/tty/serial/8250/8250_early.c |   32 ++++++++++++++++++++++++++++++++
 drivers/tty/serial/8250/Kconfig      |   11 ++++++++++-
 drivers/tty/serial/8250/Makefile     |    2 +-
 3 files changed, 43 insertions(+), 2 deletions(-)

--- a/drivers/tty/serial/8250/8250_early.c
+++ b/drivers/tty/serial/8250/8250_early.c
@@ -107,6 +107,34 @@ static void early_serial8250_write(struc
 	uart_console_write(port, s, count, serial_putc);
 }
 
+static bool __early_serial8250_write(struct console *con, struct cons_write_context *wctxt,
+				     unsigned char c)
+{
+	struct earlycon_device *device = con->data;
+	struct uart_port *port = &device->port;
+	unsigned char *s = wctxt->outbuf;
+
+	serial_putc(port, c);
+
+	for (; wctxt->pos < wctxt->len; wctxt->pos++, s++) {
+		if (!console_can_proceed(wctxt))
+			return false;
+
+		uart_console_write(port, s, 1, serial_putc);
+	}
+	return true;
+}
+
+static bool early_serial8250_write_thread(struct console *con, struct cons_write_context *wctxt)
+{
+	return __early_serial8250_write(con, wctxt, 'T');
+}
+
+static bool early_serial8250_write_atomic(struct console *con, struct cons_write_context *wctxt)
+{
+	return __early_serial8250_write(con, wctxt, 'A');
+}
+
 #ifdef CONFIG_CONSOLE_POLL
 static int early_serial8250_read(struct console *console,
 				 char *s, unsigned int count)
@@ -170,6 +198,10 @@ int __init early_serial8250_setup(struct
 
 	device->con->write = early_serial8250_write;
 	device->con->read = early_serial8250_read;
+	device->con->flags &= ~CON_BOOT;
+	device->con->flags |= CON_NO_BKL;
+	device->con->write_thread = early_serial8250_write_thread;
+	device->con->write_atomic = early_serial8250_write_atomic;
 	return 0;
 }
 EARLYCON_DECLARE(uart8250, early_serial8250_setup);
--- a/drivers/tty/serial/8250/Kconfig
+++ b/drivers/tty/serial/8250/Kconfig
@@ -82,9 +82,18 @@ config SERIAL_8250_FINTEK
 
 	  If unsure, say N.
 
+config SERIAL_8250_CONSOLE_EARLY
+	bool "Console on 8250/16550 and compatible noBKL console mockup"
+	default SERIAL_8250
+	select SERIAL_CORE_CONSOLE
+	select SERIAL_EARLYCON
+	help
+	  Mockup to demonstrate the core capabilities for noBKL consoles.
+	  OTOH, the _only_ reliable reentrant and NMI safe console...
+
 config SERIAL_8250_CONSOLE
 	bool "Console on 8250/16550 and compatible serial port"
-	depends on SERIAL_8250=y
+	depends on SERIAL_8250=y && !SERIAL_8250_CONSOLE_EARLY
 	select SERIAL_CORE_CONSOLE
 	select SERIAL_EARLYCON
 	help
--- a/drivers/tty/serial/8250/Makefile
+++ b/drivers/tty/serial/8250/Makefile
@@ -20,7 +20,7 @@ obj-$(CONFIG_SERIAL_8250_CS)		+= serial_
 obj-$(CONFIG_SERIAL_8250_ACORN)		+= 8250_acorn.o
 obj-$(CONFIG_SERIAL_8250_ASPEED_VUART)	+= 8250_aspeed_vuart.o
 obj-$(CONFIG_SERIAL_8250_BCM2835AUX)	+= 8250_bcm2835aux.o
-obj-$(CONFIG_SERIAL_8250_CONSOLE)	+= 8250_early.o
+obj-$(CONFIG_SERIAL_8250_CONSOLE_EARLY)	+= 8250_early.o
 obj-$(CONFIG_SERIAL_8250_FOURPORT)	+= 8250_fourport.o
 obj-$(CONFIG_SERIAL_8250_ACCENT)	+= 8250_accent.o
 obj-$(CONFIG_SERIAL_8250_BOCA)		+= 8250_boca.o

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [patch RFC 00/29] printk: A new approach - WIP
  2022-09-10 22:27 [patch RFC 00/29] printk: A new approach - WIP Thomas Gleixner
                   ` (29 preceding siblings ...)
  2022-09-10 22:56 ` [patch RFC 00/29] printk: A new approach - WIP Thomas Gleixner
@ 2022-09-11  9:01 ` Paul E. McKenney
  2022-09-11 12:01 ` Linus Torvalds
  2022-09-12 16:40 ` printk meeting at LPC 2022 John Ogness
  32 siblings, 0 replies; 62+ messages in thread
From: Paul E. McKenney @ 2022-09-11  9:01 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, John Ogness, Petr Mladek, Sergey Senozhatsky,
	Steven Rostedt, Linus Torvalds, Peter Zijlstra, Daniel Vetter,
	Greg Kroah-Hartman, Helge Deller, Jason Wessel, Daniel Thompson

On Sun, Sep 11, 2022 at 12:27:31AM +0200, Thomas Gleixner wrote:

[ . . . ]

> The infrastructure we implemented comes with the following properties:
> 
>   - It shares the console list, but only for registration/unregistration
>     purposes. Consoles which are utilizing the new infrastructure are
>     ignored by the existing mechanisms and vice versa. This allows to
>     reuse as much code as possible and preserves the printk semantics
>     except for the threaded printing part.
> 
>   - The console list walk becomes SRCU protected to avoid any restrictions
>     on contexts

I am guessing that this means that you need an NMI-safe srcu_read_lock()
and srcu_read_unlock().  If my guess is correct, please let me know,
and I will create one for you.  (As it stands, these are NMI-safe on x86,
but not on architectures using the asm-generic variant of this_cpu_inc().

The result would be srcu_read_lock_nmi() and srcu_read_unlock_nmi()
or similar.  There would need to be something to prevent mixing of
srcu_read_lock() and srcu_read_lock_nmi().

Or are you somehow avoiding ever invoking either srcu_read_lock() or
srcu_read_unlock() from NMI context?

For example, are we simply living dangerously with NMI-based stack traces
as suggested below?  ;-)

>   - Consoles become stateful to handle handover and takeover in a graceful
>     way.
> 
>   - All console state operations rely solely on atomic*_try_cmpxchg() so
>     they work in any context.
> 
>   - Console locking is per console to allow friendly handover or "safe"
>     hostile takeover in emergency/panic situations. Console lock is not
>     relevant for consoles converted to the new infrastructure.
> 
>   - The core provides interfaces for console drivers to query whether they
>     can proceed and to denote 'unsafe' sections in the console state, which
>     is unavoidable for some console drivers.
> 
>     In fact there is not a single regular (non-early) console driver today
>     which is reentrancy safe under all circumstances, which enforces that
>     NMI context is excluded from printing directly. TBH, that's a sad state
>     of affairs.
> 
>     The unsafe state bit allows to make informed decisions in the core
>     code, e.g. to postpone printing if there are consoles available which
>     are safe to acquire. In case of a hostile takeover the unsafe state bit
>     is handed to the atomic write callback so that the console driver can
>     act accordingly.
>     
>   - Printing is delegated to a per console printer thread except for the
>     following situations:
> 
>        - Early boot
>        - Emergency printing (WARN/OOPS)
>        - Panic printing
> 
> The integration is complete, but without any fancy things, like locking all
> consoles when entering a WARN, print and unlock when done. Such things only
> make sense once all drivers are converted over because that conflicts with
> the way how the existing console lock mechanics work.
> 
> For testing we used the most simple driver: a hacked up version of the
> early uart8250 console as we wanted to concentrate on validating the core
> mechanisms of friendly handover and hostile takeovers instead of dealing
> with the horrors of port locks or whatever at the same time. That's the
> next challenge. Hack patch will be provided in a reply.
> 
> Here is sample output where we let the atomic and thread write functions
> prepend each line with the printing context (A=atomic, T=thread):
> 
> A[    0.394066] ... fixed-purpose events:   3
> A[    0.395130] ... event mask:             000000070000000f
> 
> End of early boot, thread starts
> 
> TA[    0.396821] rcu: Hierarchical SRCU implementation.
> 
> ^^ Thread starts printing and immediately raises a warning, so atomic
>    context at the emergency priority takes over and continues printing.
> 
>    This is a forceful, but safe takeover scenario as the WARN context
>    is obviously on the same CPU as the printing thread where friendly
>    is not an option.
> 
> A[    0.397133] rcu: 	Max phase no-delay instances is 400.
> A[    0.397640] ------------[ cut here ]------------
> A[    0.398072] WARNING: CPU: 0 PID: 13 at drivers/tty/serial/8250/8250_early.c:123 __early_serial8250_write.isra.0+0x80/0xa0
> ....
> A[    0.440131]  ret_from_fork+0x1f/0x30
> A[    0.441133]  </TASK>
> A[    0.441867] ---[ end trace 0000000000000000 ]---
> T[    0.443493] smp: Bringing up secondary CPUs ...
> 
> After the warning the thread continues printing.
> 
> ....
> 
> T[    1.916873] input: ImExPS/2 Generic Explorer Mouse as /devices/platform/i8042/serio1/input/input3
> T[    1.918719] md: Waiting for all devices to be available before autod
> 
> A[    1.918719] md: Waiting for all devices to be available before autodetect
> 
> System panics because it can't find a root file system. Panic printing
> takes over the console from the printer thread immediately and reprints
> the interrupted line.
> 
> This case is a friendly handover from the printing thread to the panic
> context because the printing thread was not running on the panic CPU, but
> handed over gracefully.
> 
> A[    1.919942] md: If you don't use raid, use raid=noautodetect
> A[    1.921030] md: Autodetecting RAID arrays.
> A[    1.921919] md: autorun ...
> A[    1.922686] md: ... autorun DONE.
> A[    1.923761] /dev/root: Can't open blockdev
> 
> So far the implemented state handling machinery holds up on the various
> handover and hostile takeover situations we enforced for testing.
> 
> Hostile takeover is nevertheless a special case. If the driver is in an
> unsafe region that's something which needs to be dealt with per driver.
> There is not much the core code can do here except of trying a friendly
> handover first and only enforcing it after a timeout or not trying to print
> on such consoles.
> 
> This needs some thought, but we explicitely did not implement any takeover
> policy into the core state handling mechanism as this is really a decision
> which needs to be made at the call site. See patch 28.
> 
> We are soliciting feedback on that approach and we hope that we can
> organize a BOF in Dublin on short notice.

Last I checked, there were still some slots open.

> The series is also available from git:
> 
>     git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git printk
> 
> The series has the following parts:
> 
>    Patches  1 - 5:   Cleanups
> 
>    Patches  6 - 12:  Locking and list conversion
> 
>    Patches 13 - 18:  Improved output buffer handling to prepare for
>    	      	     code sharing
> 
>    Patches 19 - 29:  New infrastructure implementation
> 
> Most of the preparatory patches 1-18 have probably a value on their own.
> 
> Don't be scared about the patch stats below. We added kernel doc and
> extensive comments to the code:
> 
>   kernel/printk/printk_nobkl.c: Code: 668 lines, Comments: 697 lines, Ratio: 1:1.043

When I reach stable AC, I will fire off some rcutorture tests.  From the
discussion above, it sounds like I should expect some boot-time warnings?
Or were those strictly printk-specific testing?

> Of course the code is trivial and straight forward as any other facility
> which has to deal with concurrency and the twist of being safe in any
> context. :)

;-) ;-) ;-)

							Thanx, Paul

> Comments welcome.
> 
> Thanks,
> 
> 	tglx
> ---
> [1] https://lore.kernel.org/lkml/87r11qp63n.ffs@tglx/
> ---
>  arch/parisc/include/asm/pdc.h  |    2 
>  arch/parisc/kernel/pdc_cons.c  |   53 -
>  arch/parisc/kernel/traps.c     |   17 
>  b/kernel/printk/printk_nobkl.c | 1564 +++++++++++++++++++++++++++++++++++++++++
>  drivers/tty/serial/kgdboc.c    |    7 
>  drivers/tty/tty_io.c           |    6 
>  fs/proc/consoles.c             |   12 
>  fs/proc/kmsg.c                 |    2 
>  include/linux/console.h        |  375 +++++++++
>  include/linux/printk.h         |    9 
>  include/linux/syslog.h         |    3 
>  kernel/debug/kdb/kdb_io.c      |    7 
>  kernel/panic.c                 |   12 
>  kernel/printk/printk.c         |  485 ++++++++----
>  14 files changed, 2304 insertions(+), 250 deletions(-)

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [patch RFC 00/29] printk: A new approach - WIP
  2022-09-10 22:27 [patch RFC 00/29] printk: A new approach - WIP Thomas Gleixner
                   ` (30 preceding siblings ...)
  2022-09-11  9:01 ` Paul E. McKenney
@ 2022-09-11 12:01 ` Linus Torvalds
  2022-09-12 16:40 ` printk meeting at LPC 2022 John Ogness
  32 siblings, 0 replies; 62+ messages in thread
From: Linus Torvalds @ 2022-09-11 12:01 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, John Ogness, Petr Mladek, Sergey Senozhatsky,
	Steven Rostedt, Peter Zijlstra, Paul E. McKenney, Daniel Vetter,
	Greg Kroah-Hartman, Helge Deller, Jason Wessel, Daniel Thompson

On Sat, Sep 10, 2022 at 6:27 PM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> After taking a step back we decided to go for a radically different
> approach:

From a quick look through the patches this morning, I see nothing
alarming. The proof is in the pudding, but this seems to have a sane
model for console list handling and for handling the individual
console states.

But I'm on a laptop and only read through the patches while going
through my email this morning, so I may well have missed something.

                 Linus

^ permalink raw reply	[flat|nested] 62+ messages in thread

* printk meeting at LPC 2022
  2022-09-10 22:27 [patch RFC 00/29] printk: A new approach - WIP Thomas Gleixner
                   ` (31 preceding siblings ...)
  2022-09-11 12:01 ` Linus Torvalds
@ 2022-09-12 16:40 ` John Ogness
  2022-09-15 11:00   ` Sergey Senozhatsky
  2022-09-23 14:49   ` John Ogness
  32 siblings, 2 replies; 62+ messages in thread
From: John Ogness @ 2022-09-12 16:40 UTC (permalink / raw)
  To: Thomas Gleixner, LKML
  Cc: Petr Mladek, Sergey Senozhatsky, Steven Rostedt, Linus Torvalds,
	Peter Zijlstra, Paul E. McKenney, Daniel Vetter,
	Greg Kroah-Hartman, Helge Deller, Jason Wessel, Daniel Thompson,
	John Kacur, John B. Wyatt IV, Sebastian Andrzej Siewior

Hi,

We now have a room/timeslot [0] where Thomas and I will be presenting
and discussing this new approach [1] for bringing kthread and atomic
console printing to the kernel.

Wednesday, 14 Sep. @ 3:00pm-4:30pm in room "Meeting 9"

John Ogness

[0] https://lpc.events/event/16/contributions/1394/
[1] https://lore.kernel.org/all/20220910221947.171557773@linutronix.de/

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [patch RFC 01/29] printk: Make pr_flush() static
  2022-09-10 22:27 ` [patch RFC 01/29] printk: Make pr_flush() static Thomas Gleixner
@ 2022-09-14 11:27   ` Sergey Senozhatsky
  0 siblings, 0 replies; 62+ messages in thread
From: Sergey Senozhatsky @ 2022-09-14 11:27 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, John Ogness, Petr Mladek, Sergey Senozhatsky,
	Steven Rostedt, Linus Torvalds, Peter Zijlstra, Paul E. McKenney,
	Daniel Vetter, Greg Kroah-Hartman, Helge Deller, Jason Wessel,
	Daniel Thompson

On (22/09/11 00:27), Thomas Gleixner wrote:
> 
> No user outside the printk code and no reason to export this.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [patch RFC 02/29] printk: Declare log_wait properly
  2022-09-10 22:27 ` [patch RFC 02/29] printk: Declare log_wait properly Thomas Gleixner
@ 2022-09-14 11:29   ` Sergey Senozhatsky
  0 siblings, 0 replies; 62+ messages in thread
From: Sergey Senozhatsky @ 2022-09-14 11:29 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, John Ogness, Petr Mladek, Sergey Senozhatsky,
	Steven Rostedt, Linus Torvalds, Peter Zijlstra, Paul E. McKenney,
	Daniel Vetter, Greg Kroah-Hartman, Helge Deller, Jason Wessel,
	Daniel Thompson

On (22/09/11 00:27), Thomas Gleixner wrote:
> kernel/printk/printk.c:365:1: warning: symbol 'log_wait' was not declared. Should it be static?
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [patch RFC 03/29] printk: Remove write only variable nr_ext_console_drivers
  2022-09-10 22:27 ` [patch RFC 03/29] printk: Remove write only variable nr_ext_console_drivers Thomas Gleixner
@ 2022-09-14 11:33   ` Sergey Senozhatsky
  0 siblings, 0 replies; 62+ messages in thread
From: Sergey Senozhatsky @ 2022-09-14 11:33 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, John Ogness, Petr Mladek, Sergey Senozhatsky,
	Steven Rostedt, Linus Torvalds, Peter Zijlstra, Paul E. McKenney,
	Daniel Vetter, Greg Kroah-Hartman, Helge Deller, Jason Wessel,
	Daniel Thompson

On (22/09/11 00:27), Thomas Gleixner wrote:
> Subject: [patch RFC 03/29] printk: Remove write only variable
>  nr_ext_console_drivers
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [patch RFC 04/29] printk: Remove bogus comment vs. boot consoles
  2022-09-10 22:27 ` [patch RFC 04/29] printk: Remove bogus comment vs. boot consoles Thomas Gleixner
@ 2022-09-14 11:40   ` Sergey Senozhatsky
  0 siblings, 0 replies; 62+ messages in thread
From: Sergey Senozhatsky @ 2022-09-14 11:40 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, John Ogness, Petr Mladek, Sergey Senozhatsky,
	Steven Rostedt, Linus Torvalds, Peter Zijlstra, Paul E. McKenney,
	Daniel Vetter, Greg Kroah-Hartman, Helge Deller, Jason Wessel,
	Daniel Thompson

On (22/09/11 00:27), Thomas Gleixner wrote:
> Date: Sun, 11 Sep 2022 00:27:38 +0200 (CEST)
> From: Thomas Gleixner <tglx@linutronix.de>
> To: LKML <linux-kernel@vger.kernel.org>
> Cc: John Ogness <john.ogness@linutronix.de>, Petr Mladek
>  <pmladek@suse.com>, Sergey Senozhatsky <senozhatsky@chromium.org>, Steven
>  Rostedt <rostedt@goodmis.org>, Linus Torvalds
>  <torvalds@linuxfoundation.org>, Peter Zijlstra <peterz@infradead.org>,
>  "Paul E. McKenney" <paulmck@kernel.org>, Daniel Vetter <daniel@ffwll.ch>,
>  Greg Kroah-Hartman <gregkh@linuxfoundation.org>, Helge Deller
>  <deller@gmx.de>, Jason Wessel <jason.wessel@windriver.com>, Daniel
>  Thompson <daniel.thompson@linaro.org>
> Subject: [patch RFC 04/29] printk: Remove bogus comment vs. boot consoles
> Message-ID: <20220910222300.582492276@linutronix.de>
> 
> The comment about unregistering boot consoles is just not matching the
> reality. Remove it.

Hmm, yeah, we ~CON_ENABLED console that we unregister before we console_unlock().

> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [patch RFC 05/29] printk: Mark __printk percpu data ready __ro_after_init
  2022-09-10 22:27 ` [patch RFC 05/29] printk: Mark __printk percpu data ready __ro_after_init Thomas Gleixner
@ 2022-09-14 11:41   ` Sergey Senozhatsky
  0 siblings, 0 replies; 62+ messages in thread
From: Sergey Senozhatsky @ 2022-09-14 11:41 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, John Ogness, Petr Mladek, Sergey Senozhatsky,
	Steven Rostedt, Linus Torvalds, Peter Zijlstra, Paul E. McKenney,
	Daniel Vetter, Greg Kroah-Hartman, Helge Deller, Jason Wessel,
	Daniel Thompson

On (22/09/11 00:27), Thomas Gleixner wrote:
> 
> This variable cannot change post boot.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [patch RFC 06/29] printk: Protect [un]register_console() with a mutex
  2022-09-10 22:27 ` [patch RFC 06/29] printk: Protect [un]register_console() with a mutex Thomas Gleixner
@ 2022-09-14 12:05   ` Sergey Senozhatsky
  2022-09-14 12:31   ` Sergey Senozhatsky
  2022-09-27  9:56   ` Petr Mladek
  2 siblings, 0 replies; 62+ messages in thread
From: Sergey Senozhatsky @ 2022-09-14 12:05 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, John Ogness, Petr Mladek, Sergey Senozhatsky,
	Steven Rostedt, Linus Torvalds, Peter Zijlstra, Paul E. McKenney,
	Daniel Vetter, Greg Kroah-Hartman, Helge Deller, Jason Wessel,
	Daniel Thompson

On (22/09/11 00:27), Thomas Gleixner wrote:
> 
> Unprotected list walks are a brilliant idea. Especially in the context of
> hotpluggable consoles.

If it crashes on you then "you're holding the computer the wrong way" ;)

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [patch RFC 06/29] printk: Protect [un]register_console() with a mutex
  2022-09-10 22:27 ` [patch RFC 06/29] printk: Protect [un]register_console() with a mutex Thomas Gleixner
  2022-09-14 12:05   ` Sergey Senozhatsky
@ 2022-09-14 12:31   ` Sergey Senozhatsky
  2022-09-19 12:49     ` John Ogness
  2022-09-27  9:56   ` Petr Mladek
  2 siblings, 1 reply; 62+ messages in thread
From: Sergey Senozhatsky @ 2022-09-14 12:31 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, John Ogness, Petr Mladek, Sergey Senozhatsky,
	Steven Rostedt, Linus Torvalds, Peter Zijlstra, Paul E. McKenney,
	Daniel Vetter, Greg Kroah-Hartman, Helge Deller, Jason Wessel,
	Daniel Thompson

On (22/09/11 00:27), Thomas Gleixner wrote:
[..]
> + * console_sem protects the console_drivers list, and also provides
> + * serialization for access to the entire console driver system.
> + *
> + * console_mutex serializes register/unregister. console_sem has to be
> + * taken for any list manipulation inside the console_mutex locked
> + * section to keep the console BKL machinery happy.
>   */
> +static DEFINE_MUTEX(console_mutex);
>  static DEFINE_SEMAPHORE(console_sem);
[..]
>  /*
>   * Helper macros to handle lockdep when locking/unlocking console_sem. We use
>   * macros instead of functions so that _RET_IP_ contains useful information.
> @@ -2978,17 +3008,21 @@ struct tty_driver *console_device(int *i
>  void console_stop(struct console *console)
>  {
>  	__pr_flush(console, 1000, true);
> +	console_list_lock();
>  	console_lock();
>  	console->flags &= ~CON_ENABLED;
>  	console_unlock();
> +	console_list_unlock();
>  }
>  EXPORT_SYMBOL(console_stop);
>  
>  void console_start(struct console *console)
>  {
> +	console_list_lock();
>  	console_lock();
>  	console->flags |= CON_ENABLED;
>  	console_unlock();
> +	console_list_unlock();
>  	__pr_flush(console, 1000, true);
>  }
>  EXPORT_SYMBOL(console_start);

So the comment says that list lock (console_mutex) is to serialize
register/unregister, but then we take it in stop/start as well. What
does list lock protect us against in start/stop? console->flags reader
(console_is_usable()) does not take list lock, it's called under console
lock and console->flags writers (console_unregister() and console_stop())
modify console->flags under console_lock.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [patch RFC 07/29] printk: Convert console list walks for readers to list lock
  2022-09-10 22:27 ` [patch RFC 07/29] printk: Convert console list walks for readers to list lock Thomas Gleixner
@ 2022-09-14 12:46   ` Sergey Senozhatsky
  0 siblings, 0 replies; 62+ messages in thread
From: Sergey Senozhatsky @ 2022-09-14 12:46 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, John Ogness, Petr Mladek, Sergey Senozhatsky,
	Steven Rostedt, Linus Torvalds, Peter Zijlstra, Paul E. McKenney,
	Daniel Vetter, Greg Kroah-Hartman, Helge Deller, Jason Wessel,
	Daniel Thompson

On (22/09/11 00:27), Thomas Gleixner wrote:
> Facilities which expose console information to sysfs or procfs can use the
> new list protection to keep the list stable. No need to hold console lock.

Yeah I guess it makes sense to take the list lock in console stop/start then.
(I wonder if it'll be better to do it in this patch.)

> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [patch RFC 08/29] parisc: Put console abuse into one place
  2022-09-10 22:27 ` [patch RFC 08/29] parisc: Put console abuse into one place Thomas Gleixner
@ 2022-09-14 14:56   ` Sergey Senozhatsky
  0 siblings, 0 replies; 62+ messages in thread
From: Sergey Senozhatsky @ 2022-09-14 14:56 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, John Ogness, Petr Mladek, Sergey Senozhatsky,
	Steven Rostedt, Linus Torvalds, Peter Zijlstra, Paul E. McKenney,
	Daniel Vetter, Greg Kroah-Hartman, Helge Deller, Jason Wessel,
	Daniel Thompson

On (22/09/11 00:27), Thomas Gleixner wrote:
> 
> PARISC has a hope based mechanism to restore consoles in case of a HPMC
> (machine check exception) which is scattered over several places.
> 
> Move it into one place to make further changes simpler and add comments.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [patch RFC 09/29] serial: kgdboc: Lock consoles in probe function
  2022-09-10 22:27 ` [patch RFC 09/29] serial: kgdboc: Lock consoles in probe function Thomas Gleixner
@ 2022-09-14 14:59   ` Sergey Senozhatsky
  0 siblings, 0 replies; 62+ messages in thread
From: Sergey Senozhatsky @ 2022-09-14 14:59 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, John Ogness, Petr Mladek, Sergey Senozhatsky,
	Steven Rostedt, Linus Torvalds, Peter Zijlstra, Paul E. McKenney,
	Daniel Vetter, Greg Kroah-Hartman, Helge Deller, Jason Wessel,
	Daniel Thompson

On (22/09/11 00:27), Thomas Gleixner wrote:
> 
> Unprotected list walks are not necessarily safe.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

For what it's worth
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [patch RFC 10/29] kgbd: Pretend that console list walk is safe
  2022-09-10 22:27 ` [patch RFC 10/29] kgbd: Pretend that console list walk is safe Thomas Gleixner
@ 2022-09-14 15:03   ` Sergey Senozhatsky
  0 siblings, 0 replies; 62+ messages in thread
From: Sergey Senozhatsky @ 2022-09-14 15:03 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, John Ogness, Petr Mladek, Sergey Senozhatsky,
	Steven Rostedt, Linus Torvalds, Peter Zijlstra, Paul E. McKenney,
	Daniel Vetter, Greg Kroah-Hartman, Helge Deller, Jason Wessel,
	Daniel Thompson

On (22/09/11 00:27), Thomas Gleixner wrote:
> Provide a special list iterator macro for KGDB to allow unprotected list
> walks and add a few comments to explain the hope based approach.
> 
> Preperatory change for changing the console list to hlist and adding
> lockdep asserts to regular list walks.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>

[..]
> -	for_each_console(c) {
> +	/*
> +	 * This is a completely unprotected list walk designed by the
> +	 * wishful thinking departement. See the oops_inprogress comment

s/oops_inprogress/oops_in_progress/

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: printk meeting at LPC 2022
  2022-09-12 16:40 ` printk meeting at LPC 2022 John Ogness
@ 2022-09-15 11:00   ` Sergey Senozhatsky
  2022-09-15 11:09     ` Steven Rostedt
  2022-09-23 14:49   ` John Ogness
  1 sibling, 1 reply; 62+ messages in thread
From: Sergey Senozhatsky @ 2022-09-15 11:00 UTC (permalink / raw)
  To: John Ogness
  Cc: Thomas Gleixner, LKML, Petr Mladek, Sergey Senozhatsky,
	Steven Rostedt, Linus Torvalds, Peter Zijlstra, Paul E. McKenney,
	Daniel Vetter, Greg Kroah-Hartman, Helge Deller, Jason Wessel,
	Daniel Thompson, John Kacur, John B. Wyatt IV,
	Sebastian Andrzej Siewior

Hi,

On (22/09/12 18:46), John Ogness wrote:
> Hi,
> 
> We now have a room/timeslot [0] where Thomas and I will be presenting
> and discussing this new approach [1] for bringing kthread and atomic
> console printing to the kernel.
> 
> Wednesday, 14 Sep. @ 3:00pm-4:30pm in room "Meeting 9"

Was this recorded? I glanced through LPC/kernel summit schedules and didn't
find it anywhere.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: printk meeting at LPC 2022
  2022-09-15 11:00   ` Sergey Senozhatsky
@ 2022-09-15 11:09     ` Steven Rostedt
  2022-09-15 15:25       ` Sergey Senozhatsky
  0 siblings, 1 reply; 62+ messages in thread
From: Steven Rostedt @ 2022-09-15 11:09 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: John Ogness, Thomas Gleixner, LKML, Petr Mladek, Linus Torvalds,
	Peter Zijlstra, Paul E. McKenney, Daniel Vetter,
	Greg Kroah-Hartman, Helge Deller, Jason Wessel, Daniel Thompson,
	John Kacur, John B. Wyatt IV, Sebastian Andrzej Siewior

On Thu, 15 Sep 2022 20:00:57 +0900
Sergey Senozhatsky <senozhatsky@chromium.org> wrote:

> Hi,
> 
> On (22/09/12 18:46), John Ogness wrote:
> > Hi,
> > 
> > We now have a room/timeslot [0] where Thomas and I will be presenting
> > and discussing this new approach [1] for bringing kthread and atomic
> > console printing to the kernel.
> > 
> > Wednesday, 14 Sep. @ 3:00pm-4:30pm in room "Meeting 9"  
> 
> Was this recorded? I glanced through LPC/kernel summit schedules and didn't
> find it anywhere.

Yes it was, but it will take a bit to extract it from BBB and upload it to YouTube.

-- Steve

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: printk meeting at LPC 2022
  2022-09-15 11:09     ` Steven Rostedt
@ 2022-09-15 15:25       ` Sergey Senozhatsky
  0 siblings, 0 replies; 62+ messages in thread
From: Sergey Senozhatsky @ 2022-09-15 15:25 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Sergey Senozhatsky, John Ogness, Thomas Gleixner, LKML,
	Petr Mladek, Linus Torvalds, Peter Zijlstra, Paul E. McKenney,
	Daniel Vetter, Greg Kroah-Hartman, Helge Deller, Jason Wessel,
	Daniel Thompson, John Kacur, John B. Wyatt IV,
	Sebastian Andrzej Siewior

On (22/09/15 07:09), Steven Rostedt wrote:
> > > We now have a room/timeslot [0] where Thomas and I will be presenting
> > > and discussing this new approach [1] for bringing kthread and atomic
> > > console printing to the kernel.
> > > 
> > > Wednesday, 14 Sep. @ 3:00pm-4:30pm in room "Meeting 9"  
> > 
> > Was this recorded? I glanced through LPC/kernel summit schedules and didn't
> > find it anywhere.
> 
> Yes it was, but it will take a bit to extract it from BBB and upload it to YouTube.

Thanks Steven!

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [patch RFC 06/29] printk: Protect [un]register_console() with a mutex
  2022-09-14 12:31   ` Sergey Senozhatsky
@ 2022-09-19 12:49     ` John Ogness
  0 siblings, 0 replies; 62+ messages in thread
From: John Ogness @ 2022-09-19 12:49 UTC (permalink / raw)
  To: Sergey Senozhatsky, Thomas Gleixner
  Cc: LKML, Petr Mladek, Sergey Senozhatsky, Steven Rostedt,
	Linus Torvalds, Peter Zijlstra, Paul E. McKenney, Daniel Vetter,
	Greg Kroah-Hartman, Helge Deller, Jason Wessel, Daniel Thompson

On 2022-09-14, Sergey Senozhatsky <senozhatsky@chromium.org> wrote:
> On (22/09/11 00:27), Thomas Gleixner wrote:
> [..]
>> + * console_sem protects the console_drivers list, and also provides
>> + * serialization for access to the entire console driver system.
>> + *
>> + * console_mutex serializes register/unregister. console_sem has to be
>> + * taken for any list manipulation inside the console_mutex locked
>> + * section to keep the console BKL machinery happy.
>>   */
>> +static DEFINE_MUTEX(console_mutex);
>>  static DEFINE_SEMAPHORE(console_sem);
> [..]
>>  /*
>>   * Helper macros to handle lockdep when locking/unlocking console_sem. We use
>>   * macros instead of functions so that _RET_IP_ contains useful information.
>> @@ -2978,17 +3008,21 @@ struct tty_driver *console_device(int *i
>>  void console_stop(struct console *console)
>>  {
>>  	__pr_flush(console, 1000, true);
>> +	console_list_lock();
>>  	console_lock();
>>  	console->flags &= ~CON_ENABLED;
>>  	console_unlock();
>> +	console_list_unlock();
>>  }
>>  EXPORT_SYMBOL(console_stop);
>>  
>>  void console_start(struct console *console)
>>  {
>> +	console_list_lock();
>>  	console_lock();
>>  	console->flags |= CON_ENABLED;
>>  	console_unlock();
>> +	console_list_unlock();
>>  	__pr_flush(console, 1000, true);
>>  }
>>  EXPORT_SYMBOL(console_start);
>
> So the comment says that list lock (console_mutex) is to serialize
> register/unregister, but then we take it in stop/start as well. What
> does list lock protect us against in start/stop? console->flags reader
> (console_is_usable()) does not take list lock, it's called under console
> lock and console->flags writers (console_unregister() and console_stop())
> modify console->flags under console_lock.

Currently all writers to console->flags are holding the
console_lock. However, there are console->flags readers that do _not_
hold the console_lock (register_console, unregister_console,
printk_late_init).

Aside from adding list synchronization, the list lock also provides the
missing console->flags synchronization. Now all console->flags writers
hold the list lock _and_ console_lock. A console->flags reader can hold
either the list lock or the console_lock.

Since console_start and console_stop are console->flags writers, they
also need to take the list lock. I agree that this should be mentioned
in the commit message and code comments.

The follow-up patch in the series only deals with list/flags
readers. Therefore I think the change to console_stop/console_start
belongs in this patch, which focusses on fixing synchronization.

John

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: printk meeting at LPC 2022
  2022-09-12 16:40 ` printk meeting at LPC 2022 John Ogness
  2022-09-15 11:00   ` Sergey Senozhatsky
@ 2022-09-23 14:49   ` John Ogness
  2022-09-23 15:16     ` Linus Torvalds
                       ` (2 more replies)
  1 sibling, 3 replies; 62+ messages in thread
From: John Ogness @ 2022-09-23 14:49 UTC (permalink / raw)
  To: Thomas Gleixner, LKML
  Cc: Petr Mladek, Sergey Senozhatsky, Steven Rostedt, Linus Torvalds,
	Peter Zijlstra, Paul E. McKenney, Daniel Vetter,
	Greg Kroah-Hartman, Helge Deller, Jason Wessel, Daniel Thompson,
	John Kacur, John B. Wyatt IV, Sebastian Andrzej Siewior

Hi,

On 2022-09-12, John Ogness <john.ogness@linutronix.de> wrote:
> We now have a room/timeslot [0] where Thomas and I will be presenting
> and discussing this new approach [1] for bringing kthread and atomic
> console printing to the kernel.

Thanks to everyone who attended the meeting (in person and virtually)!
It was a productive and fun discussion that left me thinking we will get
the printk threading right this time.

Here are the main points that I took away from the meeting:

- Printing the backlog is important! If some emergency situation occurs,
  make sure the backlog gets printed.

- When an emergency occurs, put the full backtrace into the ringbuffer
  before flushing any backlog. This ensures that the backtrace makes it
  into the ringbuffer in case a panic occurs while flushing the backlog.

- A newline should be added when an atomic console takes over from a
  threaded console. This improves readability. We may decide later that
  the atomic console prints some extra information upon takeover, or
  that it completes the line the threaded console was printing. But for
  now we will just use a newline to keep things simple.

- It should be visible to users and in crash reports if legacy consoles
  were in use. It was suggested that a new TAINT flag could be used for
  this.

- There will need to be new console flags introduced so that safe
  printing decisions can be made in emergency and panic situations.

  For example, upon panic, intially only the consoles marked RELIABLE
  would be used. If any of the RELIABLE consoles required a hostile
  takeover, they would only be used if they are labeled to support safe
  hostile takeovers.

  All other consoles could then be tried as a "last hope" at the very
  end of panic(), after all records have been flushed to reliable
  consoles and when it no longer matters if a console kills the CPU. For
  non-panic emergencies (warn, rcu stalls, etc), there may be other
  flags that would be needed.

  Initially we do not plan to have any such flags. We can add them on an
  as-needed basis as console drivers are moved over to the new
  thread/atomic interface.

If I have missed anything relevant, please let me know.

John Ogness

> [0] https://lpc.events/event/16/contributions/1394/
> [1] https://lore.kernel.org/all/20220910221947.171557773@linutronix.de/

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: printk meeting at LPC 2022
  2022-09-23 14:49   ` John Ogness
@ 2022-09-23 15:16     ` Linus Torvalds
  2022-09-23 15:20     ` Sebastian Andrzej Siewior
  2022-09-23 15:31     ` Steven Rostedt
  2 siblings, 0 replies; 62+ messages in thread
From: Linus Torvalds @ 2022-09-23 15:16 UTC (permalink / raw)
  To: John Ogness
  Cc: Thomas Gleixner, LKML, Petr Mladek, Sergey Senozhatsky,
	Steven Rostedt, Peter Zijlstra, Paul E. McKenney, Daniel Vetter,
	Greg Kroah-Hartman, Helge Deller, Jason Wessel, Daniel Thompson,
	John Kacur, John B. Wyatt IV, Sebastian Andrzej Siewior

On Fri, Sep 23, 2022 at 7:49 AM John Ogness <john.ogness@linutronix.de> wrote:
>
> - Printing the backlog is important! If some emergency situation occurs,
>   make sure the backlog gets printed.

Yeah, I really liked the notion of doing the oops with just filling
the back buffer but still getting it printed out if something goes
wrong in the middle.

That said, I'm sure we can tweak the exact "how much of the back log
we print" if there are any real life issues that look even remotely
like the demo did.

It's not like you couldn't do a "skipping lines" message if there are
thousands of old non-emergency lines in the back buffer, and
prioritize getting the recent ones out first.

I doubt it ends up being an issue in practice, but basically I wanted
to just pipe up and say that the exact details of how much of the back
buffer needs to be flushed first _could_ be tweaked if it ever does
come up as an issue.

                    Linus

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: printk meeting at LPC 2022
  2022-09-23 14:49   ` John Ogness
  2022-09-23 15:16     ` Linus Torvalds
@ 2022-09-23 15:20     ` Sebastian Andrzej Siewior
  2022-09-23 15:31     ` Steven Rostedt
  2 siblings, 0 replies; 62+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-09-23 15:20 UTC (permalink / raw)
  To: John Ogness
  Cc: Thomas Gleixner, LKML, Petr Mladek, Sergey Senozhatsky,
	Steven Rostedt, Linus Torvalds, Peter Zijlstra, Paul E. McKenney,
	Daniel Vetter, Greg Kroah-Hartman, Helge Deller, Jason Wessel,
	Daniel Thompson, John Kacur, John B. Wyatt IV

On 2022-09-23 16:55:28 [+0206], John Ogness wrote:
> If I have missed anything relevant, please let me know.

I just wanted to state, that there was discussion at the end about
removing the early_printk drivers in favour of the atomic-printing
driver/ support which should be capable to do the same.

> John Ogness

Sebastian

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: printk meeting at LPC 2022
  2022-09-23 14:49   ` John Ogness
  2022-09-23 15:16     ` Linus Torvalds
  2022-09-23 15:20     ` Sebastian Andrzej Siewior
@ 2022-09-23 15:31     ` Steven Rostedt
  2 siblings, 0 replies; 62+ messages in thread
From: Steven Rostedt @ 2022-09-23 15:31 UTC (permalink / raw)
  To: John Ogness
  Cc: Thomas Gleixner, LKML, Petr Mladek, Sergey Senozhatsky,
	Linus Torvalds, Peter Zijlstra, Paul E. McKenney, Daniel Vetter,
	Greg Kroah-Hartman, Helge Deller, Jason Wessel, Daniel Thompson,
	John Kacur, John B. Wyatt IV, Sebastian Andrzej Siewior

On Fri, 23 Sep 2022 16:55:28 +0206
John Ogness <john.ogness@linutronix.de> wrote:

>   All other consoles could then be tried as a "last hope" at the very
>   end of panic(), after all records have been flushed to reliable
>   consoles and when it no longer matters if a console kills the CPU. For
>   non-panic emergencies (warn, rcu stalls, etc), there may be other
>   flags that would be needed.

I think we may need to check if kexec is involved. We don't want one of
these "last hope" consoles to lock up the system preventing kexec to occur.

But if there's no kexec, and the system is just going to lock up anyway,
then sure go ahead and call the unsafe consoles.

-- Steve

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [patch RFC 06/29] printk: Protect [un]register_console() with a mutex
  2022-09-10 22:27 ` [patch RFC 06/29] printk: Protect [un]register_console() with a mutex Thomas Gleixner
  2022-09-14 12:05   ` Sergey Senozhatsky
  2022-09-14 12:31   ` Sergey Senozhatsky
@ 2022-09-27  9:56   ` Petr Mladek
  2022-09-27 15:19     ` Petr Mladek
  2 siblings, 1 reply; 62+ messages in thread
From: Petr Mladek @ 2022-09-27  9:56 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, John Ogness, Sergey Senozhatsky, Steven Rostedt,
	Linus Torvalds, Peter Zijlstra, Paul E. McKenney, Daniel Vetter,
	Greg Kroah-Hartman, Helge Deller, Jason Wessel, Daniel Thompson

On Sun 2022-09-11 00:27:41, Thomas Gleixner wrote:
> Unprotected list walks are a brilliant idea. Especially in the context of
> hotpluggable consoles.

Yeah, it is crazy. And it is there probably since the beginning.

> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -79,10 +79,14 @@ int oops_in_progress;
>  EXPORT_SYMBOL(oops_in_progress);
>  
>  /*
> - * console_sem protects the console_drivers list, and also
> - * provides serialisation for access to the entire console
> - * driver system.
> + * console_sem protects the console_drivers list, and also provides
> + * serialization for access to the entire console driver system.
> + *
> + * console_mutex serializes register/unregister. console_sem has to be
> + * taken for any list manipulation inside the console_mutex locked
> + * section to keep the console BKL machinery happy.
>   */
> +static DEFINE_MUTEX(console_mutex);
>  static DEFINE_SEMAPHORE(console_sem);
>  struct console *console_drivers;
>  EXPORT_SYMBOL_GPL(console_drivers);
> @@ -220,6 +230,26 @@ int devkmsg_sysctl_set_loglvl(struct ctl
>  }
>  #endif /* CONFIG_PRINTK && CONFIG_SYSCTL */
>  
> +/**
> + * console_list_lock - Lock the console list
> + *
> + * For non-console related list walks, e.g. procfs, sysfs...
> + */
> +void console_list_lock(void)
> +{
> +	mutex_lock(&console_mutex);
> +}
> +
> +/**
> + * console_list_unlock - Unlock the console list
> + *
> + * Counterpart to console_list_lock()
> + */
> +void console_list_unlock(void)
> +{
> +	mutex_unlock(&console_mutex);
> +}
> +
>  /*
>   * Helper macros to handle lockdep when locking/unlocking console_sem. We use
>   * macros instead of functions so that _RET_IP_ contains useful information.
> @@ -3107,13 +3143,14 @@ void register_console(struct console *ne
>  	bool realcon_enabled = false;
>  	int err;
>  
> -	for_each_console(con) {
> +	console_list_lock();

Hmm, the new mutex is really nasty. It has very strange semantic.
It makes the locking even more complicated.

The ideal solution would be take console_lock() here. We (me and
Sergey) never did it because con->match() and con->setup()
callbacks were called in try_enable_*console(). We were afraid
that some might want to take console_lock() and it could create
a deadlock. There were too many drivers and we did not found time
to check them all. And it had low priority because nobody reported
problems.

A good enough solution might be call this under the later
added srcu_read_lock(&console_srcu) and use for_each_console_srcu().

The srcu walk would prevent seeing broken list. Obviously,
the code might see outdated list and do bad decisions:

  + try to enable the same console twice

  + enable more consoles by default in try_enable_default_console()

  + associate more consoles with /dev/console, see CON_CONSDEV in
    try_enable_preferred_console() and try_enable_default_console()

If we race then we could end up with more consoles enabled by default
and with more consoles with CON_CONSDEV flag.

IMHO, the rcu walk is an acceptable and conservative solution.
Registering the same driver twice is hard to imagine at all.
And I have never seen reports about too many default consoles
or CON_CONSDEV flags.

Anyway, I would like to avoid adding console_mutex. From my POV,
it is a hack that complicates the code. Taking console_lock()
should be enough. Using rcu walk would be good enough.

Do I miss something, please?

> +	for_each_registered_console(con) {
>  		if (WARN(con == newcon, "console '%s%d' already registered\n",
>  					 con->name, con->index))
> -			return;
> +			goto unlock;
>  	}
>  
> -	for_each_console(con) {
> +	for_each_registered_console(con) {
>  		if (con->flags & CON_BOOT)
>  			bootcon_enabled = true;
>  		else

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [patch RFC 20/29] printk: Add non-BKL console acquire/release logic
  2022-09-10 22:28 ` [patch RFC 20/29] printk: Add non-BKL console acquire/release logic Thomas Gleixner
@ 2022-09-27 13:49   ` John Ogness
  0 siblings, 0 replies; 62+ messages in thread
From: John Ogness @ 2022-09-27 13:49 UTC (permalink / raw)
  To: Thomas Gleixner, LKML
  Cc: Petr Mladek, Sergey Senozhatsky, Steven Rostedt, Linus Torvalds,
	Peter Zijlstra, Paul E. McKenney, Daniel Vetter,
	Greg Kroah-Hartman, Helge Deller, Jason Wessel, Daniel Thompson

Below is a fix that was used for the LPC2022 demo so that a hostile
takeover upon timeout is possible.

On 2022-09-11, Thomas Gleixner <tglx@linutronix.de> wrote:
> --- a/kernel/printk/printk_nobkl.c
> +++ b/kernel/printk/printk_nobkl.c
> +/**
> + * __cons_try_acquire - Try to acquire the console for printk output
> + * @ctxt:	Pointer to an aquire context which contains
> + *		all information about the acquire mode
> + *
> + * Returns: True if the acquire was successful. False on fail.
> + *
> + * In case of success @ctxt->state contains the acquisition
> + * state.
> + *
> + * In case of fail @ctxt->old_state contains the state
> + * which was read from @con->state for analysis by the caller.
> + */
> +static bool __cons_try_acquire(struct cons_context *ctxt)
> +{
> +	struct console *con = ctxt->console;
> +	unsigned int cpu = smp_processor_id();
> +	struct cons_state old, new;
> +
> +	if (WARN_ON_ONCE(!(con->flags & CON_NO_BKL)))
> +		return false;
> +
> +	cons_state_read(con, STATE_REAL, &old);
> +
> +again:
> +	if (cons_check_panic())
> +		return false;
> +
> +	/* Preserve it for the caller and for spinwait */
> +	copy_full_state(ctxt->old_state, old);
> +
> +	if (!cons_state_ok(old))
> +		return false;
> +
> +	/* Set up the new state for takeover */
> +	copy_full_state(new, old);
> +	new.locked = 1;
> +	new.thread = ctxt->thread;
> +	new.cur_prio = ctxt->prio;
> +	new.req_prio = CONS_PRIO_NONE;
> +	new.cpu = cpu;
> +
> +	/* Attempt to acquire it directly if unlocked */
> +	if (!old.locked) {
> +		if (!cons_state_try_cmpxchg(con, STATE_REAL, &old, &new))
> +			goto again;
> +
> +		ctxt->hostile = false;
> +		copy_full_state(ctxt->state, new);
> +		goto success;
> +	}
> +
> +	/*
> +	 * Give up if the calling context is the printk thread. The
> +	 * atomic writer will wake the thread when it is done with
> +	 * the important output.
> +	 */
> +	if (ctxt->thread)
> +		return false;
> +
> +	/*
> +	 * If the active context is on the same CPU then there is
> +	 * obviously no handshake possible.
> +	 */
> +	if (old.cpu == cpu)
> +		goto check_hostile;
> +
> +	/*
> +	 * If the caller did not request spin-waiting or a request with the
> +	 * same or higher priority is pending then check whether a hostile
> +	 * takeover is due.
> +	 */
> +	if (!ctxt->spinwait || old.req_prio >= ctxt->prio)
> +		goto check_hostile;
> +
> +	/* Proceed further with spin acquire */
> +	if (!cons_setup_handover(ctxt))
> +		return false;
> +
> +	/*
> +	 * Setup the request in state[REAL]. Hand in the state, which was
> +	 * used to make the decision to spinwait above, for comparison.
> +	 */
> +	if (!cons_setup_request(ctxt, old))
> +		return false;
> +
> +	/* Now spin on it */
> +	if (!cons_try_acquire_spin(ctxt))
> +		return false;

Instead of returning false here, it needs to:

		goto check_hostile;

cons_try_acquire_spin() may have failed due to a timeout, in which case
a hostile takeover might be necessary.

> +success:
> +	/* Common updates on success */
> +	return true;
> +
> +check_hostile:
> +	if (!ctxt->hostile)
> +		return false;
> +
> +	if (!cons_state_try_cmpxchg(con, STATE_REAL, &old, &new))
> +		goto again;
> +
> +	ctxt->hostile = true;
> +	copy_full_state(ctxt->state, new);
> +	goto success;
> +}

John Ogness

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [patch RFC 28/29] printk: Provide functions for atomic write enforcement
  2022-09-10 22:28 ` [patch RFC 28/29] printk: Provide functions for atomic write enforcement Thomas Gleixner
@ 2022-09-27 13:55   ` John Ogness
  2022-09-27 14:40   ` John Ogness
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 62+ messages in thread
From: John Ogness @ 2022-09-27 13:55 UTC (permalink / raw)
  To: Thomas Gleixner, LKML
  Cc: Petr Mladek, Sergey Senozhatsky, Steven Rostedt, Linus Torvalds,
	Peter Zijlstra, Paul E. McKenney, Daniel Vetter,
	Greg Kroah-Hartman, Helge Deller, Jason Wessel, Daniel Thompson

Below is a fix that was used for the LPC2022 demo so that atomic
printing occurs in NMI context.

On 2022-09-11, Thomas Gleixner <tglx@linutronix.de> wrote:
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -1072,6 +1072,8 @@ static inline void log_buf_add_cpu(void)
>  #endif /* CONFIG_SMP */
>  
>  static void cons_alloc_percpu_data(struct console *con);
> +static void cons_atomic_flush(void);
> +static void cons_wake_threads(void);
>  
>  static void __init set_percpu_data_ready(void)
>  {
> @@ -2270,17 +2272,21 @@ asmlinkage int vprintk_emit(int facility
>  
>  	printed_len = vprintk_store(facility, level, dev_info, fmt, args);
>  
> +	/*
> +	 * The caller may be holding system-critical or
> +	 * timing-sensitive locks. Disable preemption during
> +	 * printing of all remaining records to all consoles so that
> +	 * this context can return as soon as possible. Hopefully
> +	 * another printk() caller will take over the printing.
> +	 */
> +	preempt_disable();
> +
> +	/* Flush the non-BKL consoles if required */
> +	cons_atomic_flush();
> +
>  	/* If called from the scheduler, we can not call up(). */
>  	if (!in_sched) {
>  		/*
> -		 * The caller may be holding system-critical or
> -		 * timing-sensitive locks. Disable preemption during
> -		 * printing of all remaining records to all consoles so that
> -		 * this context can return as soon as possible. Hopefully
> -		 * another printk() caller will take over the printing.
> -		 */
> -		preempt_disable();
> -		/*
>  		 * Try to acquire and then immediately release the console
>  		 * semaphore. The release will print out buffers. With the
>  		 * spinning variant, this context tries to take over the
> @@ -2288,9 +2294,11 @@ asmlinkage int vprintk_emit(int facility
>  		 */
>  		if (console_trylock_spinning())
>  			console_unlock();
> -		preempt_enable();
>  	}
>  
> +	preempt_enable();
> +
> +	cons_wake_threads();
>  	wake_up_klogd();
>  	return printed_len;
>  }

Atomic flushing also needs to occur for the cases where deferred
printing is used (NMI or explicit defer). The following hunk will take
care of that.

@@ -3648,6 +3658,8 @@ void wake_up_klogd(void)
 
 void defer_console_output(void)
 {
+	cons_atomic_flush();
+
 	/*
 	 * New messages may have been added directly to the ringbuffer
 	 * using vprintk_store(), so wake any waiters as well.

John Ogness

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [patch RFC 28/29] printk: Provide functions for atomic write enforcement
  2022-09-10 22:28 ` [patch RFC 28/29] printk: Provide functions for atomic write enforcement Thomas Gleixner
  2022-09-27 13:55   ` John Ogness
@ 2022-09-27 14:40   ` John Ogness
  2022-09-27 14:49   ` John Ogness
  2022-09-27 15:01   ` John Ogness
  3 siblings, 0 replies; 62+ messages in thread
From: John Ogness @ 2022-09-27 14:40 UTC (permalink / raw)
  To: Thomas Gleixner, LKML
  Cc: Petr Mladek, Sergey Senozhatsky, Steven Rostedt, Linus Torvalds,
	Peter Zijlstra, Paul E. McKenney, Daniel Vetter,
	Greg Kroah-Hartman, Helge Deller, Jason Wessel, Daniel Thompson

Below is a fix that was used for the LPC2022 demo so that an
interrupting context does not clobber the console state when the
interrupted context was the console owner.

On 2022-09-11, Thomas Gleixner <tglx@linutronix.de> wrote:
> --- a/kernel/printk/printk_nobkl.c
> +++ b/kernel/printk/printk_nobkl.c
> +/**
> + * cons_atomic_flush_one - Flush one console in atomic mode
> + * @con:	The console to flush
> + * @prio:	The priority of the current context
> + */
> +static void cons_atomic_flush_one(struct console *con, enum cons_prio prio)
> +{
> +	struct cons_write_context *wctxt = cons_get_wctxt(con, prio);
> +	struct cons_context *ctxt = &ACCESS_PRIVATE(wctxt, ctxt);

@ctxt is per-console, per-cpu, and per-prio. But it is *not*
per-context. If a CPU is in EMERGENCY priority and is interrupted by
another context that calls into printk() while the first context had the
console locked, the following cons_atomic_try_acquire() will use the
same ctxt and clobber the values used by the first context. This
corrupts the state information for the first context and results in
situations where the console is never unlocked because the owner's state
was corrupt.

> +	if (!cons_atomic_try_acquire(con, ctxt, prio))
> +		return;
> +
> +	do {
> +		/*
> +		 * cons_emit_record() returns false when the console was
> +		 * handed over or taken over. In both cases the context is
> +		 * not longer valid.
> +		 */
> +		if (!cons_emit_record(wctxt))
> +			return;
> +	} while (ctxt->backlog);
> +
> +	cons_release(ctxt);
> +}

Since it is not desirable to actively print from nested contexts
of the same priority, add an @in_use flag to wctxt, allowing to
detect the situation and avoid active printing and corrupting
the state.

Applying the following patch on top does just that:

diff --git a/include/linux/console.h b/include/linux/console.h
index e6bde0e879fc..1b3028cce3f3 100644
--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -317,6 +317,7 @@ struct cons_context {
  * @len:	Length to write
  * @pos:	Current write position in @outbuf
  * @unsafe:	Invoked in unsafe state due to force takeover
+ * @in_use:	The context is in use
  */
 struct cons_write_context {
 	struct cons_context __private	ctxt;
@@ -324,6 +325,7 @@ struct cons_write_context {
 	unsigned int			len;
 	unsigned int			pos;
 	bool				unsafe;
+	bool				in_use;
 };
 
 #define CONS_MAX_NEST_LVL	8
diff --git a/kernel/printk/printk_nobkl.c b/kernel/printk/printk_nobkl.c
index 736f71361799..b513b2fb2c2d 100644
--- a/kernel/printk/printk_nobkl.c
+++ b/kernel/printk/printk_nobkl.c
@@ -1338,8 +1338,12 @@ static void cons_atomic_flush_one(struct console *con, enum cons_prio prio)
 	struct cons_write_context *wctxt = cons_get_wctxt(con, prio);
 	struct cons_context *ctxt = &ACCESS_PRIVATE(wctxt, ctxt);
 
-	if (!cons_atomic_try_acquire(con, ctxt, prio))
+	if (wctxt->in_use)
 		return;
+	wctxt->in_use = true;
+
+	if (!cons_atomic_try_acquire(con, ctxt, prio))
+		goto out;
 
 	do {
 		/*
@@ -1348,10 +1352,12 @@ static void cons_atomic_flush_one(struct console *con, enum cons_prio prio)
 		 * not longer valid.
 		 */
 		if (!cons_emit_record(wctxt))
-			return;
+			goto out;
 	} while (ctxt->backlog);
 
 	cons_release(ctxt);
+out:
+	wctxt->in_use = false;
 }
 
 /**

John Ogness

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* Re: [patch RFC 28/29] printk: Provide functions for atomic write enforcement
  2022-09-10 22:28 ` [patch RFC 28/29] printk: Provide functions for atomic write enforcement Thomas Gleixner
  2022-09-27 13:55   ` John Ogness
  2022-09-27 14:40   ` John Ogness
@ 2022-09-27 14:49   ` John Ogness
  2022-09-27 15:01   ` John Ogness
  3 siblings, 0 replies; 62+ messages in thread
From: John Ogness @ 2022-09-27 14:49 UTC (permalink / raw)
  To: Thomas Gleixner, LKML
  Cc: Petr Mladek, Sergey Senozhatsky, Steven Rostedt, Linus Torvalds,
	Peter Zijlstra, Paul E. McKenney, Daniel Vetter,
	Greg Kroah-Hartman, Helge Deller, Jason Wessel, Daniel Thompson

Below is a fix that was used for the LPC2022 demo so that after a
warning, the atomic printing context is not responsible for printing any
non-emergency backlog that came after the warning.

On 2022-09-11, Thomas Gleixner <tglx@linutronix.de> wrote:
> --- a/kernel/printk/printk_nobkl.c
> +++ b/kernel/printk/printk_nobkl.c
> + * cons_atomic_flush_one - Flush one console in atomic mode
> + * @con:	The console to flush
> + * @prio:	The priority of the current context
> + */
> +static void cons_atomic_flush_one(struct console *con, enum cons_prio prio)
> +{
> +	struct cons_write_context *wctxt = cons_get_wctxt(con, prio);
> +	struct cons_context *ctxt = &ACCESS_PRIVATE(wctxt, ctxt);
> +
> +	if (!cons_atomic_try_acquire(con, ctxt, prio))
> +		return;
> +
> +	do {
> +		/*
> +		 * cons_emit_record() returns false when the console was
> +		 * handed over or taken over. In both cases the context is
> +		 * not longer valid.
> +		 */
> +		if (!cons_emit_record(wctxt))
> +			return;

If the CPU is no longer in an elevated priority and kthreads are
available, abort the atomic printing and let the kthreads take over. Add
the following break condition here:

		/*
		 * If the CPU is no longer in an elevated priority, let the
		 * kthreads take over, if they are available.
		 */
		if (prio <= CONS_PRIO_NORMAL && con->kthread)
			break;


> +	} while (ctxt->backlog);
> +
> +	cons_release(ctxt);
> +}

John Ogness

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [patch RFC 28/29] printk: Provide functions for atomic write enforcement
  2022-09-10 22:28 ` [patch RFC 28/29] printk: Provide functions for atomic write enforcement Thomas Gleixner
                     ` (2 preceding siblings ...)
  2022-09-27 14:49   ` John Ogness
@ 2022-09-27 15:01   ` John Ogness
  3 siblings, 0 replies; 62+ messages in thread
From: John Ogness @ 2022-09-27 15:01 UTC (permalink / raw)
  To: Thomas Gleixner, LKML
  Cc: Petr Mladek, Sergey Senozhatsky, Steven Rostedt, Linus Torvalds,
	Peter Zijlstra, Paul E. McKenney, Daniel Vetter,
	Greg Kroah-Hartman, Helge Deller, Jason Wessel, Daniel Thompson

Below is a fix that was used for the LPC2022 demo to avoid unnecessarily
performing the console_lock/console_unlock dance.

Add a new global boolean @have_bkl_console to be able to quickly
identify if any legacy (bkl) consoles are registered. If there are none,
the console_lock/console_unlock stuff can be skipped. The following
patch does this and can be applied on top.

--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1830,6 +1830,7 @@ static struct lockdep_map console_owner_dep_map = {
 static DEFINE_RAW_SPINLOCK(console_owner_lock);
 static struct task_struct *console_owner;
 static bool console_waiter;
+static bool have_bkl_console;
 
 /**
  * console_lock_spinning_enable - mark beginning of code where another
@@ -2285,7 +2286,7 @@ asmlinkage int vprintk_emit(int facility, int level,
 	cons_atomic_flush();
 
 	/* If called from the scheduler, we can not call up(). */
-	if (!in_sched) {
+	if (!in_sched && have_bkl_console) {
 		/*
 		 * Try to acquire and then immediately release the console
 		 * semaphore. The release will print out buffers. With the
@@ -2575,7 +2576,7 @@ void resume_console(void)
  */
 static int console_cpu_notify(unsigned int cpu)
 {
-	if (!cpuhp_tasks_frozen) {
+	if (!cpuhp_tasks_frozen && have_bkl_console) {
 		/* If trylock fails, someone else is doing the printing */
 		if (console_trylock())
 			console_unlock();
@@ -3023,6 +3024,9 @@ void console_unblank(void)
 {
 	struct console *c;
 
+	if (!have_bkl_console)
+		return;
+
 	/*
 	 * console_unblank can no longer be called in interrupt context unless
 	 * oops_in_progress is set to 1..
@@ -3052,6 +3056,9 @@ void console_unblank(void)
  */
 void console_flush_on_panic(enum con_flush_mode mode)
 {
+	if (!have_bkl_console)
+		return;
+
 	/*
 	 * If someone else is holding the console lock, trylock will fail
 	 * and may_schedule may be set.  Ignore and proceed to unlock so
@@ -3311,6 +3318,10 @@ void register_console(struct console *newcon)
 	/* Initialize the nobkl data in @newcon */
 	cons_nobkl_init(newcon);
 
+	/* Has a legacy (BKL) console registered? */
+	if (!(newcon->flags & CON_NO_BKL))
+		have_bkl_console = true;
+
 	/*
 	 * Put this console in the list and keep the referred driver at the
 	 * head of the list.
@@ -3603,7 +3613,7 @@ static void wake_up_klogd_work_func(struct irq_work *irq_work)
 {
 	int pending = this_cpu_xchg(printk_pending, 0);
 
-	if (pending & PRINTK_PENDING_OUTPUT) {
+	if (have_bkl_console && (pending & PRINTK_PENDING_OUTPUT)) {
 		/* If trylock fails, someone else is doing the printing */
 		if (console_trylock())
 			console_unlock();

John Ogness

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [patch RFC 06/29] printk: Protect [un]register_console() with a mutex
  2022-09-27  9:56   ` Petr Mladek
@ 2022-09-27 15:19     ` Petr Mladek
  0 siblings, 0 replies; 62+ messages in thread
From: Petr Mladek @ 2022-09-27 15:19 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, John Ogness, Sergey Senozhatsky, Steven Rostedt,
	Linus Torvalds, Peter Zijlstra, Paul E. McKenney, Daniel Vetter,
	Greg Kroah-Hartman, Helge Deller, Jason Wessel, Daniel Thompson

On Tue 2022-09-27 11:56:30, Petr Mladek wrote:
> On Sun 2022-09-11 00:27:41, Thomas Gleixner wrote:
> > Unprotected list walks are a brilliant idea. Especially in the context of
> > hotpluggable consoles.
> 
> Yeah, it is crazy. And it is there probably since the beginning.
> 
> > @@ -3107,13 +3143,14 @@ void register_console(struct console *ne
> >  	bool realcon_enabled = false;
> >  	int err;
> >  
> > -	for_each_console(con) {
> > +	console_list_lock();
> 
> Hmm, the new mutex is really nasty. It has very strange semantic.
> It makes the locking even more complicated.

Please, continue the discussion in the reply to the v1 patchset,
see https://lore.kernel.org/r/YzMT27FVllY3u05k@alley

I send it to this RFC by mistake.

I am sorry for the mess.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 62+ messages in thread

* functionality: was: Re: [patch RFC 19/29] printk: Add basic infrastructure for non-BKL consoles
  2022-09-10 22:28 ` [patch RFC 19/29] printk: Add basic infrastructure for non-BKL consoles Thomas Gleixner
@ 2022-11-07 15:58   ` Petr Mladek
  2022-11-07 16:10   ` cosmetic: " Petr Mladek
  1 sibling, 0 replies; 62+ messages in thread
From: Petr Mladek @ 2022-11-07 15:58 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, John Ogness, Sergey Senozhatsky, Steven Rostedt,
	Linus Torvalds, Peter Zijlstra, Paul E. McKenney, Daniel Vetter,
	Greg Kroah-Hartman, Helge Deller, Jason Wessel, Daniel Thompson,
	John Ogness

On Sun 2022-09-11 00:28:01, Thomas Gleixner wrote:
> The current console/printk subsystem is protected by a Big Kernel Lock,
> aka. console_lock which has has ill defined semantics and is more or less
> stateless. This puts severe limitations on the console subsystem and makes
> forced takeover and output in emergency and panic situations a fragile
> endavour which is based on try and pray.
> 
> The goal of non-BKL consoles is to break out of the console lock jail and
> to provide a new infrastructure which avoids the pitfalls and allows
> console drivers to be gradually converted over.
> 
> The proposed infrastructure aims for the following properties:
> 
>   - Lockless (SCRU protected) console list walk
>   - Per console locking instead of global locking
>   - Per console state which allows to make informed decisions
>   - Stateful handover and takeover
> 
> As a first step this adds state to struct console. The per console state is
> a atomic_long_t with a 32bit bit field and on 64bit a 32bit sequence for
> tracking the last printed ringbuffer sequence number. On 32bit the sequence
> is seperate from state for obvious reasons which requires to handle a few
> extra race conditions.
> 
> Add the initial state with the most basic 'alive' and 'enabled' bits and
> wire it up into the console register/unregister functionality and exclude
> such consoles from being handled in the console BKL mechanisms.
> 
> The decision to use a bitfield was made as using a plain u32 and mask/shift
> operations turned out to result in uncomprehensible code.
> 
> --- a/include/linux/console.h
> +++ b/include/linux/console.h
> @@ -170,6 +172,37 @@ enum cons_flags {
>  	CON_ANYTIME		= BIT(4),
>  	CON_BRL			= BIT(5),
>  	CON_EXTENDED		= BIT(6),
> +	CON_NO_BKL		= BIT(7),
> +};
> +
> +/**
> + * struct cons_state - console state for NOBKL consoles
> + * @atom:	Compound of the state fields for atomic operations
> + * @seq:	Sequence for record tracking (64bit only)
> + * @bits:	Compound of the state bits below
> + *
> + * @alive:	Console is alive. Required for teardown

What do you exactly mean with teardown, please?

I somehow do not understand the meaning. The bit "alive" seems
to always be "1" in this patchset.

> + * @enabled:	Console is enabled. If 0, do not use
> + *
> + * To be used for state read and preparation of atomic_long_cmpxchg()
> + * operations.
> + */
> +struct cons_state {
> +	union {
> +		unsigned long	atom;
> +		struct {
> +#ifdef CONFIG_64BIT
> +			u32	seq;
> +#endif
> +			union {
> +				u32	bits;
> +				struct {
> +					u32 alive	:  1;
> +					u32 enabled	:  1;
> +				};
> +			};
> +		};
> +	};
>  };
>  
>  /**
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -3079,7 +3088,10 @@ void console_stop(struct console *consol
>  	console_list_lock();
>  	console_lock();
>  	console->flags &= ~CON_ENABLED;
> +	cons_state_disable(console);
>  	console_unlock();
> +	/* Ensure that all SRCU list walks have completed */
> +	synchronize_srcu(&console_srcu);

I have few questions here:

1. Do we need separate "enabled" flags for BLK and non-blk consoles?

   Hmm, it might be problem to remove CON_ENABLED flag
   because it is exported to userspace via /proc/consoles.

   Well, what is the purpose of the "enabled" flag for atomic
   consoles? Are we going to stop them in the middle of a line?
   Does the flag has to be atomic and part of atomic_state?


2. What is the purpose of synchronize_srcu(), please?

   It probably should make sure that all consoles with CON_NO_BLK
   flag are really stopped once it returns.

   IMHO, this would work only when the "enabled" flag and the
   con->write*() callback is called under srcu_read_lock().

   I do not see it in the code. Do I miss something, please?


3. Is the ordering of console_unlock() and synchronize_srcu()
   important, please?

   IMHO, it would be important if we allowed the following code:

      srcu_read_lock(&console_srcu);
      console_lock();
      // do something
      console_unlock();
      srcu_read_unlock(&console_srcu);

   then we would always have to call synchronize_srcu() outside
   console_lock() otherwise there might be ABBA deadlock.

   I do not see this code yet. But it might make sense.
   Anyway, we should probably document the rules somewhere.


4. Is it important to call cons_state_disable(console) under
   console_lock() ?

   I guess that it isn't. But it is not clear from the code.
   The picture is even more complicated because everything is done
   under console_list_lock().

   It would make sense to explain the purpose of each lock.
   My understanding is the following:

      + console_list_lock() synchronizes manipulation of
	con->flags.

      + console_lock() makes sure that no console will
	be calling con->write() callback after console_unlock().

      + synchronize_srcu() is supposed to make sure that
	any console is calling neither con->write_kthread()
	nor con->atomic_write() after this synchronization.
	Except that it does not work from my POV.

    Anyway, I might make sense to separate the two approaches.
    Let's say:

	console_list_lock()
	if (con->flags & CON_NO_BLK) {
		noblk_console_disable(con);
	} else {
		/* cons->flags are synchronized using console_list_lock */
		console->flags &= ~CON_ENABLED;
		/*
		 * Make sure that no console calls con->write()	anymore.
		 *
		 * This ordering looks a bit ugly. But it shows how
		 * the things are serialized.
		 */
		console_lock();
		console_unlock();
	}

     , where noblk_console_disable(con) must be more complicated.
     It must be somehow synchronized with all con->write_kthread() and
     write_atomic() callers.

     I wonder if noblk_console_disable(con) might somehow use
     the hangover mechanism so that it becomes the owner of
     the console and disables the enabled flag. I mean
     to implement some sleepable cons_acquire(). But this sounds
     a bit like con->mutex that you wanted to avoid.

     It might be easier to check the flag and call con->write() under
     srcu_read_lock() so that synchronize_srcu() really waits until
     the current message gets printed.


>  	console_list_unlock();
>  }
>  EXPORT_SYMBOL(console_stop);


Best Regards,
Petr

PS: I am going to review v3 of "reduce console_lock scope" patchset
    which has arrived few hours ago.

    I just wanted to send my notes that I made last Friday
    when I continued review of this RFC.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* cosmetic: was: Re: [patch RFC 19/29] printk: Add basic infrastructure for non-BKL consoles
  2022-09-10 22:28 ` [patch RFC 19/29] printk: Add basic infrastructure for non-BKL consoles Thomas Gleixner
  2022-11-07 15:58   ` functionality: was: " Petr Mladek
@ 2022-11-07 16:10   ` Petr Mladek
  1 sibling, 0 replies; 62+ messages in thread
From: Petr Mladek @ 2022-11-07 16:10 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, John Ogness, Sergey Senozhatsky, Steven Rostedt,
	Linus Torvalds, Peter Zijlstra, Paul E. McKenney, Daniel Vetter,
	Greg Kroah-Hartman, Helge Deller, Jason Wessel, Daniel Thompson,
	John Ogness

On Sun 2022-09-11 00:28:01, Thomas Gleixner wrote:
> The current console/printk subsystem is protected by a Big Kernel Lock,
> aka. console_lock which has has ill defined semantics and is more or less
> stateless. This puts severe limitations on the console subsystem and makes
> forced takeover and output in emergency and panic situations a fragile
> endavour which is based on try and pray.
> 
> The goal of non-BKL consoles is to break out of the console lock jail and
> to provide a new infrastructure which avoids the pitfalls and allows
> console drivers to be gradually converted over.
> 
> The proposed infrastructure aims for the following properties:
> 
>   - Lockless (SCRU protected) console list walk
>   - Per console locking instead of global locking
>   - Per console state which allows to make informed decisions
>   - Stateful handover and takeover
> 
> As a first step this adds state to struct console. The per console state is
> a atomic_long_t with a 32bit bit field and on 64bit a 32bit sequence for
> tracking the last printed ringbuffer sequence number. On 32bit the sequence
> is seperate from state for obvious reasons which requires to handle a few
> extra race conditions.
> 
> Add the initial state with the most basic 'alive' and 'enabled' bits and
> wire it up into the console register/unregister functionality and exclude
> such consoles from being handled in the console BKL mechanisms.
> 
> The decision to use a bitfield was made as using a plain u32 and mask/shift
> operations turned out to result in uncomprehensible code.
> 
> --- a/include/linux/console.h
> +++ b/include/linux/console.h
> @@ -237,6 +272,9 @@ struct console {
>  	unsigned long		dropped;
>  	void			*data;
>  	struct hlist_node	node;
> +
> +	/* NOBKL console specific members */
> +	atomic_long_t __private	atomic_state[2];

Just to be sure about the meaning. "real" state means the current
state and "handover" means a requested state.

>  };
>  
>  #ifdef CONFIG_LOCKDEP
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -2339,7 +2339,9 @@ static bool suppress_message_printing(in
>  static bool pr_flush(int timeout_ms, bool reset_on_progress) { return true; }
>  static bool __pr_flush(struct console *con, int timeout_ms, bool reset_on_progress) { return true; }
>  
> -#endif /* CONFIG_PRINTK */
> +#endif /* !CONFIG_PRINTK */
> +
> +#include "printk_nobkl.c"

Is there any chance to get rid of this?

If we need to use some of this API in printk.c then please declare
it either in "internal.h" or in a new "printk_noblk.h".

Honestly, I do not have any real arguments why it is bad. But there
are probably reasons why it is not a common pattern. IMHO, split
sources might help to:

    + speed up compilation
    + separate public and internal API
    + keep #ifdef/#else/#endif close each other in .h files
    + keep the sources somehow usable even without cscope
    + ???


>  #ifdef CONFIG_EARLY_PRINTK
>  struct console *early_console;
> @@ -2635,6 +2637,13 @@ static bool abandon_console_lock_in_pani
>   */
>  static inline bool console_is_usable(struct console *con)
>  {
> +	/*
> +	 * Exclude the NOBKL consoles. They are handled seperately
> +	 * as they do not require the console BKL
> +	 */
> +	if ((con->flags & CON_NO_BKL))
> +		return false;

This is confusing. Nobody would expect that a function called
"console_is_usable()" would return false just because the console
has CON_NO_BLK flag set.

Either we need a better name, for example, console_is_blk_and_usable().
Or please put the test into a separate function, e.g. console_is_blk()
and check it separately where needed.

IMHO, the original console_is_usable() would be useful even for
CON_NO_BLK consoles.


> +
>  	if (!(con->flags & CON_ENABLED))
>  		return false;
>  
> --- /dev/null
> +++ b/kernel/printk/printk_nobkl.c
> @@ -0,0 +1,176 @@
> +
> +enum state_selector {
> +	STATE_REAL,
> +	STATE_HANDOVER,
> +};

It might be problem that I am not a native speaker. But the names
are a bit ambiguous to me. I would personally use:

enum state_selector {
	CON_STATE_CURRENT,
	CON_STATE_REQUESTED,
};

or if it is too long: CON_STATE_CUR and CON_STATE_REQ.

Well, I do not resist on the change. I am not sure how the proposed names
would play with the followup patches. The original names might
be good after all. They are not that bad. I primary wanted
to document my first reaction ;-)

> +/**
> + * cons_nobkl_init - Initialize the NOBKL console state
> + * @con:	Console to initialize
> + */
> +static void cons_nobkl_init(struct console *con)
> +{
> +	struct cons_state state = {
> +		.alive = 1,
> +		.enabled = !!(con->flags & CON_ENABLED),
> +	};
> +
> +	cons_state_set(con, STATE_REAL, &state);
> +}

IMHO. we need to update the function description, e.g.

/**
 * cons_nobkl_init - Initialize the NOBKL console specific data
 * @con:	Console to initialize
 */


Background:

The function name does not match the rest:

  + The function name suggests that it initializes NOBLK console.

  + The function description and the implementation suggests that
    it initializes struct cons_state.

I see that the followup patches update this function. It initializes
all the members needed by noblk consoles in struct console. It
allocates per-CPU data and creates the kthread. It means
that the function name is reasonable after all.


> +
> +/**
> + * cons_nobkl_cleanup - Cleanup the NOBKL console state
> + * @con:	Console to cleanup
> + */
> +static void cons_nobkl_cleanup(struct console *con)
> +{
> +	struct cons_state state = { };
> +
> +	cons_state_set(con, STATE_REAL, &state);
> +}

Same as with cons_noblk_init(). The function does a lot
more in the later patches. The description should be

/**
 * cons_nobkl_cleanup - Cleanup the NOBKL console specific data
 * @con:	Console to cleanup
 */

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 62+ messages in thread

end of thread, other threads:[~2022-11-07 16:11 UTC | newest]

Thread overview: 62+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-10 22:27 [patch RFC 00/29] printk: A new approach - WIP Thomas Gleixner
2022-09-10 22:27 ` [patch RFC 01/29] printk: Make pr_flush() static Thomas Gleixner
2022-09-14 11:27   ` Sergey Senozhatsky
2022-09-10 22:27 ` [patch RFC 02/29] printk: Declare log_wait properly Thomas Gleixner
2022-09-14 11:29   ` Sergey Senozhatsky
2022-09-10 22:27 ` [patch RFC 03/29] printk: Remove write only variable nr_ext_console_drivers Thomas Gleixner
2022-09-14 11:33   ` Sergey Senozhatsky
2022-09-10 22:27 ` [patch RFC 04/29] printk: Remove bogus comment vs. boot consoles Thomas Gleixner
2022-09-14 11:40   ` Sergey Senozhatsky
2022-09-10 22:27 ` [patch RFC 05/29] printk: Mark __printk percpu data ready __ro_after_init Thomas Gleixner
2022-09-14 11:41   ` Sergey Senozhatsky
2022-09-10 22:27 ` [patch RFC 06/29] printk: Protect [un]register_console() with a mutex Thomas Gleixner
2022-09-14 12:05   ` Sergey Senozhatsky
2022-09-14 12:31   ` Sergey Senozhatsky
2022-09-19 12:49     ` John Ogness
2022-09-27  9:56   ` Petr Mladek
2022-09-27 15:19     ` Petr Mladek
2022-09-10 22:27 ` [patch RFC 07/29] printk: Convert console list walks for readers to list lock Thomas Gleixner
2022-09-14 12:46   ` Sergey Senozhatsky
2022-09-10 22:27 ` [patch RFC 08/29] parisc: Put console abuse into one place Thomas Gleixner
2022-09-14 14:56   ` Sergey Senozhatsky
2022-09-10 22:27 ` [patch RFC 09/29] serial: kgdboc: Lock consoles in probe function Thomas Gleixner
2022-09-14 14:59   ` Sergey Senozhatsky
2022-09-10 22:27 ` [patch RFC 10/29] kgbd: Pretend that console list walk is safe Thomas Gleixner
2022-09-14 15:03   ` Sergey Senozhatsky
2022-09-10 22:27 ` [patch RFC 11/29] printk: Convert console_drivers list to hlist Thomas Gleixner
2022-09-10 22:27 ` [patch RFC 12/29] printk: Prepare for SCRU console list protection Thomas Gleixner
2022-09-10 22:27 ` [patch RFC 13/29] printk: Move buffer size defines Thomas Gleixner
2022-09-10 22:27 ` [patch RFC 14/29] printk: Document struct console Thomas Gleixner
2022-09-10 22:27 ` [patch RFC 15/29] printk: Add struct cons_text_buf Thomas Gleixner
2022-09-10 22:27 ` [patch RFC 16/29] printk: Use " Thomas Gleixner
2022-09-10 22:27 ` [patch RFC 17/29] printk: Use an output descriptor struct for emit Thomas Gleixner
2022-09-10 22:27 ` [patch RFC 18/29] printk: Handle dropped message smarter Thomas Gleixner
2022-09-10 22:28 ` [patch RFC 19/29] printk: Add basic infrastructure for non-BKL consoles Thomas Gleixner
2022-11-07 15:58   ` functionality: was: " Petr Mladek
2022-11-07 16:10   ` cosmetic: " Petr Mladek
2022-09-10 22:28 ` [patch RFC 20/29] printk: Add non-BKL console acquire/release logic Thomas Gleixner
2022-09-27 13:49   ` John Ogness
2022-09-10 22:28 ` [patch RFC 21/29] printk: Add buffer management for noBKL consoles Thomas Gleixner
2022-09-10 22:28 ` [patch RFC 22/29] printk: Add sequence handling for non-BKL consoles Thomas Gleixner
2022-09-10 22:28 ` [patch RFC 23/29] printk: Add non-BKL console print state functions Thomas Gleixner
2022-09-10 22:28 ` [patch RFC 24/29] printk: Put seq and dropped into cons_text_desc Thomas Gleixner
2022-09-10 22:28 ` [patch RFC 25/29] printk: Provide functions to emit a ringbuffer record on non-BKL consoles Thomas Gleixner
2022-09-10 22:28 ` [patch RFC 26/29] printk: Add threaded printing support Thomas Gleixner
2022-09-10 22:28 ` [patch RFC 27/29] printk: Add write context storage for atomic writes Thomas Gleixner
2022-09-10 22:28 ` [patch RFC 28/29] printk: Provide functions for atomic write enforcement Thomas Gleixner
2022-09-27 13:55   ` John Ogness
2022-09-27 14:40   ` John Ogness
2022-09-27 14:49   ` John Ogness
2022-09-27 15:01   ` John Ogness
2022-09-10 22:28 ` [patch RFC 29/29] printk: Add atomic write enforcement to warn/panic Thomas Gleixner
2022-09-10 22:56 ` [patch RFC 00/29] printk: A new approach - WIP Thomas Gleixner
2022-09-11  9:01 ` Paul E. McKenney
2022-09-11 12:01 ` Linus Torvalds
2022-09-12 16:40 ` printk meeting at LPC 2022 John Ogness
2022-09-15 11:00   ` Sergey Senozhatsky
2022-09-15 11:09     ` Steven Rostedt
2022-09-15 15:25       ` Sergey Senozhatsky
2022-09-23 14:49   ` John Ogness
2022-09-23 15:16     ` Linus Torvalds
2022-09-23 15:20     ` Sebastian Andrzej Siewior
2022-09-23 15:31     ` Steven Rostedt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).