All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH printk v1 00/18] threaded/atomic console support
@ 2023-03-02 19:56 John Ogness
  2023-03-02 19:56 ` [PATCH printk v1 01/18] kdb: do not assume write() callback available John Ogness
                   ` (19 more replies)
  0 siblings, 20 replies; 92+ messages in thread
From: John Ogness @ 2023-03-02 19:56 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Jason Wessel, Daniel Thompson, Douglas Anderson,
	Aaron Tomlin, Luis Chamberlain, kgdb-bugreport,
	Greg Kroah-Hartman, linux-fsdevel, Andrew Morton,
	Guilherme G. Piccoli, David Gow, Tiezhu Yang, Daniel Vetter,
	tangmeng, Paul E. McKenney, Frederic Weisbecker, Neeraj Upadhyay,
	Josh Triplett, Mathieu Desnoyers, Lai Jiangshan, Joel Fernandes,
	rcu

Hi,

This is v1 of a series to bring in a new threaded/atomic console
infrastructure. The history, motivation, and various explanations and
examples are available in the cover letter of tglx's RFC series
[0]. From that series, patches 1-18 have been mainlined as of the 6.3
merge window. What remains, patches 19-29, is what this series
represents.

Since the RFC there has been significant changes involving bug-fixing,
refactoring, renaming, and feature completion. Despite the many
changes in the code, the concept and design has been preserved.

The key points to the threaded/atomic (aka NOBKL) consoles are:

- Each console has its own kthread and uses a new write_thread()
  console callback that is always called from sleepable context. The
  kthreads of different consoles do not contend with each other and do
  not use the global console lock.

- Each console uses a new write_atomic() console callback that is able
  to write from any context (including NMI). The threaded/atomic
  infrastructure provides supporting functions to help atomic console
  drivers to synchronize with their own threaded and re-entrant atomic
  counterparts.

- Until the console threads are available, atomic printing is
  performed. The threaded printing is able to take over shortly before
  SMP is brought online so machines with many CPUs should be able to
  boot at full speed without being held back by console printing.

- When console threads are shutdown (on system reboot and shutdown),
  again the atomic consoles take over. This ensures the final message
  make it out to the consoles.

- When panic, WARN, and NMI stall detection occurs, the atomic
  consoles temporarily take over printing until the related messages
  have been output.  In this case the full set of related messages are
  stored into the printk ringbuffer before the atomic consoles begin
  flushing the ringbuffer to the consoles.

- Atomic printing is split into 3 priorities. For example, upon
  shutdown (when kthreads are not available), the console output will
  be normal priority atomic printing. This could be interrupted by a
  WARN on another CPU (emergency priority). And that could be
  interrupted by a panic on yet another CPU (panic priority). And of
  course any atomic printing priority can interrupt the kthread
  printer.

- The transition between kthread and any atomic printing or to any
  elevated priority atomic printing is synchronized using an atomic_t
  state variable per console. The state variable acts like a spinlock
  but with special properties that the spinning can timeout and even a
  hostile takeover based on the atomic priorities is possible. After
  outputting each byte, a console printing context checks the state
  variable to ensure it is still the owner of the console. If it is
  not (for example, in the case of a hostile takeover) it will
  immediately abort any continued use of the console and rely on the
  new owner to flush the messages.

- Using the console state variable, console drivers can mark unsafe
  regions in their code where ownership transition is not
  possible. Combined with the timeout feature, a handover protocol,
  and the possibility for a hostile takeover, this allows drivers to
  make safe decisions about when and how console ownership is
  transferred to another context. It also allows the printk
  infrastructure to make safe decisions in panic situations, such as
  only outputting to atomic consoles where safe takeovers are
  possible. And only after handling all other panic responsibilities,
  attempting unsafe takeovers for the consoles that have not yet
  transferred ownership.

- In order to support hostile takeovers (where a CPU with a higher
  priority context can steal ownership from another CPU) without CPUs
  clobbering each others buffers, each CPU has its own set of string
  print buffers.

The existing legacy consoles continue to function unmodified as before
and legacy consoles can work next to NOBKL consoles (i.e. a legacy
virtual terminal graphics console and network console will work with a
NOBKL uart8250 console). However, in order to have the full
benefit/reliability of NOBKL consoles, a system should use _only_
NOBKL consoles.

We believe that this series covers all printk features and usage to
allow new threaded/atomic consoles to be able to replace the legacy
consoles.  However, this will be a gradual transition as individual
console drivers are updated to support the NOBKL requirements.

This series does not include any changes to console drivers to allow
them to act as NOBKL consoles. That will be a follow-up series, once a
finalized infrastructure is in place. However, I will reply to this
message with an all-in-one uart8250 patch that fully implements NOBKL
support. The patch will allow you to perform runtime tests with the
NOBKL consoles on the uart8250.

John Ogness

[0] https://lore.kernel.org/lkml/20220910221947.171557773@linutronix.de

John Ogness (7):
  kdb: do not assume write() callback available
  printk: Add NMI check to down_trylock_console_sem()
  printk: Consolidate console deferred printing
  printk: Add per-console suspended state
  printk: nobkl: Stop threads on shutdown/reboot
  rcu: Add atomic write enforcement for rcu stalls
  printk: Perform atomic flush in console_flush_on_panic()

Thomas Gleixner (11):
  printk: Add non-BKL console basic infrastructure
  printk: nobkl: Add acquire/release logic
  printk: nobkl: Add buffer management
  printk: nobkl: Add sequence handling
  printk: nobkl: Add print state functions
  printk: nobkl: Add emit function and callback functions for atomic
    printing
  printk: nobkl: Introduce printer threads
  printk: nobkl: Add printer thread wakeups
  printk: nobkl: Add write context storage for atomic writes
  printk: nobkl: Provide functions for atomic write enforcement
  kernel/panic: Add atomic write enforcement to warn/panic

 fs/proc/consoles.c           |    1 +
 include/linux/console.h      |  174 ++++
 include/linux/printk.h       |    9 +
 kernel/debug/kdb/kdb_io.c    |    2 +
 kernel/panic.c               |   17 +
 kernel/printk/Makefile       |    2 +-
 kernel/printk/internal.h     |  103 +-
 kernel/printk/printk.c       |  307 ++++--
 kernel/printk/printk_nobkl.c | 1820 ++++++++++++++++++++++++++++++++++
 kernel/printk/printk_safe.c  |    9 +-
 kernel/rcu/tree_stall.h      |    6 +
 11 files changed, 2362 insertions(+), 88 deletions(-)
 create mode 100644 kernel/printk/printk_nobkl.c


base-commit: 10d639febe5629687dac17c4a7500a96537ce11a
-- 
2.30.2


^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH printk v1 01/18] kdb: do not assume write() callback available
  2023-03-02 19:56 [PATCH printk v1 00/18] threaded/atomic console support John Ogness
@ 2023-03-02 19:56 ` John Ogness
  2023-03-07 14:57   ` Petr Mladek
                     ` (2 more replies)
  2023-03-02 19:56 ` [PATCH printk v1 02/18] printk: Add NMI check to down_trylock_console_sem() John Ogness
                   ` (18 subsequent siblings)
  19 siblings, 3 replies; 92+ messages in thread
From: John Ogness @ 2023-03-02 19:56 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Jason Wessel, Daniel Thompson, Douglas Anderson,
	Aaron Tomlin, Luis Chamberlain, kgdb-bugreport

It is allowed for consoles to provide no write() callback. For
example ttynull does this.

Check if a write() callback is available before using it.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
---
 kernel/debug/kdb/kdb_io.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/debug/kdb/kdb_io.c b/kernel/debug/kdb/kdb_io.c
index 5c7e9ba7cd6b..e9139dfc1f0a 100644
--- a/kernel/debug/kdb/kdb_io.c
+++ b/kernel/debug/kdb/kdb_io.c
@@ -576,6 +576,8 @@ static void kdb_msg_write(const char *msg, int msg_len)
 			continue;
 		if (c == dbg_io_ops->cons)
 			continue;
+		if (!c->write)
+			continue;
 		/*
 		 * Set oops_in_progress to encourage the console drivers to
 		 * disregard their internal spin locks: in the current calling
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH printk v1 02/18] printk: Add NMI check to down_trylock_console_sem()
  2023-03-02 19:56 [PATCH printk v1 00/18] threaded/atomic console support John Ogness
  2023-03-02 19:56 ` [PATCH printk v1 01/18] kdb: do not assume write() callback available John Ogness
@ 2023-03-02 19:56 ` John Ogness
  2023-03-07 16:05   ` Petr Mladek
  2023-03-02 19:56 ` [PATCH printk v1 03/18] printk: Consolidate console deferred printing John Ogness
                   ` (17 subsequent siblings)
  19 siblings, 1 reply; 92+ messages in thread
From: John Ogness @ 2023-03-02 19:56 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel

The printk path is NMI safe because it only adds content to the
buffer and then triggers the delayed output via irq_work. If the
console is flushed or unblanked (on panic) from NMI then it can
deadlock in down_trylock_console_sem() because the semaphore is not
NMI safe.

Avoid try-locking the console from NMI and assume it failed.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
---
 kernel/printk/printk.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 40c5f4170ac7..84af038292d9 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -318,6 +318,10 @@ static int __down_trylock_console_sem(unsigned long ip)
 	int lock_failed;
 	unsigned long flags;
 
+	/* Semaphores are not NMI-safe. */
+	if (in_nmi())
+		return 1;
+
 	/*
 	 * Here and in __up_console_sem() we need to be in safe mode,
 	 * because spindump/WARN/etc from under console ->lock will
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH printk v1 03/18] printk: Consolidate console deferred printing
  2023-03-02 19:56 [PATCH printk v1 00/18] threaded/atomic console support John Ogness
  2023-03-02 19:56 ` [PATCH printk v1 01/18] kdb: do not assume write() callback available John Ogness
  2023-03-02 19:56 ` [PATCH printk v1 02/18] printk: Add NMI check to down_trylock_console_sem() John Ogness
@ 2023-03-02 19:56 ` John Ogness
  2023-03-08 13:15   ` Petr Mladek
  2023-03-02 19:56 ` [PATCH printk v1 04/18] printk: Add per-console suspended state John Ogness
                   ` (16 subsequent siblings)
  19 siblings, 1 reply; 92+ messages in thread
From: John Ogness @ 2023-03-02 19:56 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel

Printig to consoles can be deferred for several reasons:

- explicitly with printk_deferred()
- printk() in NMI context
- recursive printk() calls

The current implementation is not consistent. For printk_deferred(),
irq work is scheduled twice. For NMI und recursive, panic CPU
suppression and caller delays are not properly enforced.

Correct these inconsistencies by consolidating the deferred printing
code so that vprintk_deferred() is the toplevel function for
deferred printing and vprintk_emit() will perform whichever irq_work
queueing is appropriate.

Also add kerneldoc for wake_up_klogd() and defer_console_output() to
clarify their differences and appropriate usage.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
---
 kernel/printk/printk.c      | 31 ++++++++++++++++++++++++-------
 kernel/printk/printk_safe.c |  9 ++-------
 2 files changed, 26 insertions(+), 14 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 84af038292d9..bdeaf12e0bd2 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2321,7 +2321,10 @@ asmlinkage int vprintk_emit(int facility, int level,
 		preempt_enable();
 	}
 
-	wake_up_klogd();
+	if (in_sched)
+		defer_console_output();
+	else
+		wake_up_klogd();
 	return printed_len;
 }
 EXPORT_SYMBOL(vprintk_emit);
@@ -3811,11 +3814,30 @@ static void __wake_up_klogd(int val)
 	preempt_enable();
 }
 
+/**
+ * wake_up_klogd - Wake kernel logging daemon
+ *
+ * Use this function when new records have been added to the ringbuffer
+ * and the console printing for those records is handled elsewhere. In
+ * this case only the logging daemon needs to be woken.
+ *
+ * Context: Any context.
+ */
 void wake_up_klogd(void)
 {
 	__wake_up_klogd(PRINTK_PENDING_WAKEUP);
 }
 
+/**
+ * defer_console_output - Wake kernel logging daemon and trigger
+ *	console printing in a deferred context
+ *
+ * Use this function when new records have been added to the ringbuffer
+ * but the current context is unable to perform the console printing.
+ * This function also wakes the logging daemon.
+ *
+ * Context: Any context.
+ */
 void defer_console_output(void)
 {
 	/*
@@ -3832,12 +3854,7 @@ void printk_trigger_flush(void)
 
 int vprintk_deferred(const char *fmt, va_list args)
 {
-	int r;
-
-	r = vprintk_emit(0, LOGLEVEL_SCHED, NULL, fmt, args);
-	defer_console_output();
-
-	return r;
+	return vprintk_emit(0, LOGLEVEL_SCHED, NULL, fmt, args);
 }
 
 int _printk_deferred(const char *fmt, ...)
diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c
index ef0f9a2044da..6d10927a07d8 100644
--- a/kernel/printk/printk_safe.c
+++ b/kernel/printk/printk_safe.c
@@ -38,13 +38,8 @@ asmlinkage int vprintk(const char *fmt, va_list args)
 	 * Use the main logbuf even in NMI. But avoid calling console
 	 * drivers that might have their own locks.
 	 */
-	if (this_cpu_read(printk_context) || in_nmi()) {
-		int len;
-
-		len = vprintk_store(0, LOGLEVEL_DEFAULT, NULL, fmt, args);
-		defer_console_output();
-		return len;
-	}
+	if (this_cpu_read(printk_context) || in_nmi())
+		return vprintk_deferred(fmt, args);
 
 	/* No obstacles. */
 	return vprintk_default(fmt, args);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH printk v1 04/18] printk: Add per-console suspended state
  2023-03-02 19:56 [PATCH printk v1 00/18] threaded/atomic console support John Ogness
                   ` (2 preceding siblings ...)
  2023-03-02 19:56 ` [PATCH printk v1 03/18] printk: Consolidate console deferred printing John Ogness
@ 2023-03-02 19:56 ` John Ogness
  2023-03-08 14:40   ` Petr Mladek
  2023-03-02 19:56 ` [PATCH printk v1 05/18] printk: Add non-BKL console basic infrastructure John Ogness
                   ` (15 subsequent siblings)
  19 siblings, 1 reply; 92+ messages in thread
From: John Ogness @ 2023-03-02 19:56 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman

Currently the global @console_suspended is used to determine if
consoles are in a suspended state. Its primary purpose is to allow
usage of the console_lock when suspended without causing console
printing. It is synchronized by the console_lock.

Rather than relying on the console_lock to determine suspended
state, make it an official per-console state that is set within
console->flags. This allows the state to be queried via SRCU.

@console_suspended will continue to exist, but now only to implement
the console_lock/console_unlock trickery and _not_ to represent
the suspend state of a particular console.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
---
 include/linux/console.h |  3 +++
 kernel/printk/printk.c  | 46 ++++++++++++++++++++++++++++++++---------
 2 files changed, 39 insertions(+), 10 deletions(-)

diff --git a/include/linux/console.h b/include/linux/console.h
index 1e36958aa656..f7967fb238e0 100644
--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -153,6 +153,8 @@ static inline int con_debug_leave(void)
  *			receiving the printk spam for obvious reasons.
  * @CON_EXTENDED:	The console supports the extended output format of
  *			/dev/kmesg which requires a larger output buffer.
+ * @CON_SUSPENDED:	Indicates if a console is suspended. If true, the
+ *			printing callbacks must not be called.
  */
 enum cons_flags {
 	CON_PRINTBUFFER		= BIT(0),
@@ -162,6 +164,7 @@ enum cons_flags {
 	CON_ANYTIME		= BIT(4),
 	CON_BRL			= BIT(5),
 	CON_EXTENDED		= BIT(6),
+	CON_SUSPENDED		= BIT(7),
 };
 
 /**
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index bdeaf12e0bd2..626d467c7e9b 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2563,10 +2563,26 @@ MODULE_PARM_DESC(console_no_auto_verbose, "Disable console loglevel raise to hig
  */
 void suspend_console(void)
 {
+	struct console *con;
+
 	if (!console_suspend_enabled)
 		return;
 	pr_info("Suspending console(s) (use no_console_suspend to debug)\n");
 	pr_flush(1000, true);
+
+	console_list_lock();
+	for_each_console(con)
+		console_srcu_write_flags(con, con->flags | CON_SUSPENDED);
+	console_list_unlock();
+
+	/*
+	 * Ensure that all SRCU list walks have completed. All printing
+	 * contexts must be able to see that they are suspended so that it
+	 * is guaranteed that all printing has stopped when this function
+	 * completes.
+	 */
+	synchronize_srcu(&console_srcu);
+
 	console_lock();
 	console_suspended = 1;
 	up_console_sem();
@@ -2574,11 +2590,26 @@ void suspend_console(void)
 
 void resume_console(void)
 {
+	struct console *con;
+
 	if (!console_suspend_enabled)
 		return;
 	down_console_sem();
 	console_suspended = 0;
 	console_unlock();
+
+	console_list_lock();
+	for_each_console(con)
+		console_srcu_write_flags(con, con->flags & ~CON_SUSPENDED);
+	console_list_unlock();
+
+	/*
+	 * Ensure that all SRCU list walks have completed. All printing
+	 * contexts must be able to see they are no longer suspended so
+	 * that they are guaranteed to wake up and resume printing.
+	 */
+	synchronize_srcu(&console_srcu);
+
 	pr_flush(1000, true);
 }
 
@@ -2681,6 +2712,9 @@ static inline bool console_is_usable(struct console *con)
 	if (!(flags & CON_ENABLED))
 		return false;
 
+	if ((flags & CON_SUSPENDED))
+		return false;
+
 	if (!con->write)
 		return false;
 
@@ -3695,8 +3729,7 @@ static bool __pr_flush(struct console *con, int timeout_ms, bool reset_on_progre
 
 		/*
 		 * Hold the console_lock to guarantee safe access to
-		 * console->seq and to prevent changes to @console_suspended
-		 * until all consoles have been processed.
+		 * console->seq.
 		 */
 		console_lock();
 
@@ -3712,14 +3745,7 @@ static bool __pr_flush(struct console *con, int timeout_ms, bool reset_on_progre
 		}
 		console_srcu_read_unlock(cookie);
 
-		/*
-		 * If consoles are suspended, it cannot be expected that they
-		 * make forward progress, so timeout immediately. @diff is
-		 * still used to return a valid flush status.
-		 */
-		if (console_suspended)
-			remaining = 0;
-		else if (diff != last_diff && reset_on_progress)
+		if (diff != last_diff && reset_on_progress)
 			remaining = timeout_ms;
 
 		console_unlock();
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH printk v1 05/18] printk: Add non-BKL console basic infrastructure
  2023-03-02 19:56 [PATCH printk v1 00/18] threaded/atomic console support John Ogness
                   ` (3 preceding siblings ...)
  2023-03-02 19:56 ` [PATCH printk v1 04/18] printk: Add per-console suspended state John Ogness
@ 2023-03-02 19:56 ` John Ogness
  2023-03-09 14:08   ` global states: was: " Petr Mladek
                     ` (2 more replies)
  2023-03-02 19:56 ` [PATCH printk v1 06/18] printk: nobkl: Add acquire/release logic John Ogness
                   ` (14 subsequent siblings)
  19 siblings, 3 replies; 92+ messages in thread
From: John Ogness @ 2023-03-02 19:56 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman, linux-fsdevel

From: Thomas Gleixner <tglx@linutronix.de>

The current console/printk subsystem is protected by a Big Kernel Lock,
(aka console_lock) which has ill defined semantics and is more or less
stateless. This puts severe limitations on the console subsystem and
makes forced takeover and output in emergency and panic situations a
fragile endavour which is based on try and pray.

The goal of non-BKL consoles is to break out of the console lock jail
and to provide a new infrastructure that avoids the pitfalls and
allows console drivers to be gradually converted over.

The proposed infrastructure aims for the following properties:

  - Per console locking instead of global locking
  - Per console state which allows to make informed decisions
  - Stateful handover and takeover

As a first step state is added to struct console. The per console state
is an atomic_long_t with a 32bit bit field and on 64bit also a 32bit
sequence for tracking the last printed ringbuffer sequence number. On
32bit the sequence is separate from state for obvious reasons which
requires handling a few extra race conditions.

Reserve state bits, which will be populated later in the series. Wire
it up into the console register/unregister functionality and exclude
such consoles from being handled in the console BKL mechanisms. Since
the non-BKL consoles will not depend on the console lock/unlock dance
for printing, only perform said dance if a BKL console is registered.

The decision to use a bitfield was made as using a plain u32 with
mask/shift operations turned out to result in uncomprehensible code.

Co-developed-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Thomas Gleixner (Intel) <tglx@linutronix.de>
---
 fs/proc/consoles.c           |   1 +
 include/linux/console.h      |  33 +++++++++
 kernel/printk/Makefile       |   2 +-
 kernel/printk/internal.h     |  10 +++
 kernel/printk/printk.c       |  50 +++++++++++--
 kernel/printk/printk_nobkl.c | 137 +++++++++++++++++++++++++++++++++++
 6 files changed, 226 insertions(+), 7 deletions(-)
 create mode 100644 kernel/printk/printk_nobkl.c

diff --git a/fs/proc/consoles.c b/fs/proc/consoles.c
index e0758fe7936d..9ce506866e60 100644
--- a/fs/proc/consoles.c
+++ b/fs/proc/consoles.c
@@ -21,6 +21,7 @@ static int show_console_dev(struct seq_file *m, void *v)
 		{ CON_ENABLED,		'E' },
 		{ CON_CONSDEV,		'C' },
 		{ CON_BOOT,		'B' },
+		{ CON_NO_BKL,		'N' },
 		{ CON_PRINTBUFFER,	'p' },
 		{ CON_BRL,		'b' },
 		{ CON_ANYTIME,		'a' },
diff --git a/include/linux/console.h b/include/linux/console.h
index f7967fb238e0..b9d2ad580128 100644
--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -155,6 +155,8 @@ static inline int con_debug_leave(void)
  *			/dev/kmesg which requires a larger output buffer.
  * @CON_SUSPENDED:	Indicates if a console is suspended. If true, the
  *			printing callbacks must not be called.
+ * @CON_NO_BKL:		Console can operate outside of the BKL style console_lock
+ *			constraints.
  */
 enum cons_flags {
 	CON_PRINTBUFFER		= BIT(0),
@@ -165,6 +167,32 @@ enum cons_flags {
 	CON_BRL			= BIT(5),
 	CON_EXTENDED		= BIT(6),
 	CON_SUSPENDED		= BIT(7),
+	CON_NO_BKL		= BIT(8),
+};
+
+/**
+ * struct cons_state - console state for NOBKL consoles
+ * @atom:	Compound of the state fields for atomic operations
+ * @seq:	Sequence for record tracking (64bit only)
+ * @bits:	Compound of the state bits below
+ *
+ * To be used for state read and preparation of atomic_long_cmpxchg()
+ * operations.
+ */
+struct cons_state {
+	union {
+		unsigned long	atom;
+		struct {
+#ifdef CONFIG_64BIT
+			u32	seq;
+#endif
+			union {
+				u32	bits;
+				struct {
+				};
+			};
+		};
+	};
 };
 
 /**
@@ -186,6 +214,8 @@ enum cons_flags {
  * @dropped:		Number of unreported dropped ringbuffer records
  * @data:		Driver private data
  * @node:		hlist node for the console list
+ *
+ * @atomic_state:	State array for NOBKL consoles; real and handover
  */
 struct console {
 	char			name[16];
@@ -205,6 +235,9 @@ struct console {
 	unsigned long		dropped;
 	void			*data;
 	struct hlist_node	node;
+
+	/* NOBKL console specific members */
+	atomic_long_t		__private atomic_state[2];
 };
 
 #ifdef CONFIG_LOCKDEP
diff --git a/kernel/printk/Makefile b/kernel/printk/Makefile
index f5b388e810b9..b36683bd2f82 100644
--- a/kernel/printk/Makefile
+++ b/kernel/printk/Makefile
@@ -1,6 +1,6 @@
 # SPDX-License-Identifier: GPL-2.0-only
 obj-y	= printk.o
-obj-$(CONFIG_PRINTK)	+= printk_safe.o
+obj-$(CONFIG_PRINTK)	+= printk_safe.o printk_nobkl.o
 obj-$(CONFIG_A11Y_BRAILLE_CONSOLE)	+= braille.o
 obj-$(CONFIG_PRINTK_INDEX)	+= index.o
 
diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h
index 2a17704136f1..da380579263b 100644
--- a/kernel/printk/internal.h
+++ b/kernel/printk/internal.h
@@ -3,6 +3,7 @@
  * internal.h - printk internal definitions
  */
 #include <linux/percpu.h>
+#include <linux/console.h>
 
 #if defined(CONFIG_PRINTK) && defined(CONFIG_SYSCTL)
 void __init printk_sysctl_init(void);
@@ -61,6 +62,10 @@ void defer_console_output(void);
 
 u16 printk_parse_prefix(const char *text, int *level,
 			enum printk_info_flags *flags);
+
+void cons_nobkl_cleanup(struct console *con);
+void cons_nobkl_init(struct console *con);
+
 #else
 
 #define PRINTK_PREFIX_MAX	0
@@ -76,8 +81,13 @@ u16 printk_parse_prefix(const char *text, int *level,
 #define printk_safe_exit_irqrestore(flags) local_irq_restore(flags)
 
 static inline bool printk_percpu_data_ready(void) { return false; }
+static inline void cons_nobkl_init(struct console *con) { }
+static inline void cons_nobkl_cleanup(struct console *con) { }
+
 #endif /* CONFIG_PRINTK */
 
+extern bool have_boot_console;
+
 /**
  * struct printk_buffers - Buffers to read/format/output printk messages.
  * @outbuf:	After formatting, contains text to output.
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 626d467c7e9b..b2c7c92c3d79 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -446,6 +446,19 @@ static int console_msg_format = MSG_FORMAT_DEFAULT;
 /* syslog_lock protects syslog_* variables and write access to clear_seq. */
 static DEFINE_MUTEX(syslog_lock);
 
+/*
+ * Specifies if a BKL console was ever registered. Used to determine if the
+ * console lock/unlock dance is needed for console printing.
+ */
+static bool have_bkl_console;
+
+/*
+ * Specifies if a boot console is registered. Used to determine if NOBKL
+ * consoles may be used since NOBKL consoles cannot synchronize with boot
+ * consoles.
+ */
+bool have_boot_console;
+
 #ifdef CONFIG_PRINTK
 DECLARE_WAIT_QUEUE_HEAD(log_wait);
 /* All 3 protected by @syslog_lock. */
@@ -2301,7 +2314,7 @@ asmlinkage int vprintk_emit(int facility, int level,
 	printed_len = vprintk_store(facility, level, dev_info, fmt, args);
 
 	/* If called from the scheduler, we can not call up(). */
-	if (!in_sched) {
+	if (!in_sched && have_bkl_console) {
 		/*
 		 * The caller may be holding system-critical or
 		 * timing-sensitive locks. Disable preemption during
@@ -2624,7 +2637,7 @@ void resume_console(void)
  */
 static int console_cpu_notify(unsigned int cpu)
 {
-	if (!cpuhp_tasks_frozen) {
+	if (!cpuhp_tasks_frozen && have_bkl_console) {
 		/* If trylock fails, someone else is doing the printing */
 		if (console_trylock())
 			console_unlock();
@@ -3098,6 +3111,9 @@ void console_unblank(void)
 	struct console *c;
 	int cookie;
 
+	if (!have_bkl_console)
+		return;
+
 	/*
 	 * Stop console printing because the unblank() callback may
 	 * assume the console is not within its write() callback.
@@ -3135,6 +3151,9 @@ void console_unblank(void)
  */
 void console_flush_on_panic(enum con_flush_mode mode)
 {
+	if (!have_bkl_console)
+		return;
+
 	/*
 	 * If someone else is holding the console lock, trylock will fail
 	 * and may_schedule may be set.  Ignore and proceed to unlock so
@@ -3310,9 +3329,10 @@ static void try_enable_default_console(struct console *newcon)
 		newcon->flags |= CON_CONSDEV;
 }
 
-#define con_printk(lvl, con, fmt, ...)			\
-	printk(lvl pr_fmt("%sconsole [%s%d] " fmt),	\
-	       (con->flags & CON_BOOT) ? "boot" : "",	\
+#define con_printk(lvl, con, fmt, ...)				\
+	printk(lvl pr_fmt("%s%sconsole [%s%d] " fmt),		\
+	       (con->flags & CON_NO_BKL) ? "" : "legacy ",	\
+	       (con->flags & CON_BOOT) ? "boot" : "",		\
 	       con->name, con->index, ##__VA_ARGS__)
 
 static void console_init_seq(struct console *newcon, bool bootcon_registered)
@@ -3472,6 +3492,14 @@ void register_console(struct console *newcon)
 	newcon->dropped = 0;
 	console_init_seq(newcon, bootcon_registered);
 
+	if (!(newcon->flags & CON_NO_BKL))
+		have_bkl_console = true;
+	else
+		cons_nobkl_init(newcon);
+
+	if (newcon->flags & CON_BOOT)
+		have_boot_console = true;
+
 	/*
 	 * Put this console in the list - keep the
 	 * preferred driver at the head of the list.
@@ -3515,6 +3543,9 @@ void register_console(struct console *newcon)
 			if (con->flags & CON_BOOT)
 				unregister_console_locked(con);
 		}
+
+		/* All boot consoles have been unregistered. */
+		have_boot_console = false;
 	}
 unlock:
 	console_list_unlock();
@@ -3563,6 +3594,9 @@ static int unregister_console_locked(struct console *console)
 	 */
 	synchronize_srcu(&console_srcu);
 
+	if (console->flags & CON_NO_BKL)
+		cons_nobkl_cleanup(console);
+
 	console_sysfs_notify();
 
 	if (console->exit)
@@ -3866,11 +3900,15 @@ void wake_up_klogd(void)
  */
 void defer_console_output(void)
 {
+	int val = PRINTK_PENDING_WAKEUP;
+
 	/*
 	 * New messages may have been added directly to the ringbuffer
 	 * using vprintk_store(), so wake any waiters as well.
 	 */
-	__wake_up_klogd(PRINTK_PENDING_WAKEUP | PRINTK_PENDING_OUTPUT);
+	if (have_bkl_console)
+		val |= PRINTK_PENDING_OUTPUT;
+	__wake_up_klogd(val);
 }
 
 void printk_trigger_flush(void)
diff --git a/kernel/printk/printk_nobkl.c b/kernel/printk/printk_nobkl.c
new file mode 100644
index 000000000000..8df3626808dd
--- /dev/null
+++ b/kernel/printk/printk_nobkl.c
@@ -0,0 +1,137 @@
+// SPDX-License-Identifier: GPL-2.0-only
+// Copyright (C) 2022 Linutronix GmbH, John Ogness
+// Copyright (C) 2022 Intel, Thomas Gleixner
+
+#include <linux/kernel.h>
+#include <linux/console.h>
+#include "internal.h"
+/*
+ * Printk implementation for consoles that do not depend on the BKL style
+ * console_lock mechanism.
+ *
+ * Console is locked on a CPU when state::locked is set and state:cpu ==
+ * current CPU. This is valid for the current execution context.
+ *
+ * Nesting execution contexts on the same CPU can carefully take over
+ * if the driver allows reentrancy via state::unsafe = false. When the
+ * interrupted context resumes it checks the state before entering
+ * an unsafe region and aborts the operation if it detects a takeover.
+ *
+ * In case of panic or emergency the nesting context can take over the
+ * console forcefully. The write callback is then invoked with the unsafe
+ * flag set in the write context data, which allows the driver side to avoid
+ * locks and to evaluate the driver state so it can use an emergency path
+ * or repair the state instead of blindly assuming that it works.
+ *
+ * If the interrupted context touches the assigned record buffer after
+ * takeover, it does not cause harm because at the same execution level
+ * there is no concurrency on the same CPU. A threaded printer always has
+ * its own record buffer so it can never interfere with any of the per CPU
+ * record buffers.
+ *
+ * A concurrent writer on a different CPU can request to take over the
+ * console by:
+ *
+ *	1) Carefully writing the desired state into state[REQ]
+ *	   if there is no same or higher priority request pending.
+ *	   This locks state[REQ] except for higher priority
+ *	   waiters.
+ *
+ *	2) Setting state[CUR].req_prio unless a same or higher
+ *	   priority waiter won the race.
+ *
+ *	3) Carefully spin on state[CUR] until that is locked with the
+ *	   expected state. When the state is not the expected one then it
+ *	   has to verify that state[REQ] is still the same and that
+ *	   state[CUR] has not been taken over or unlocked.
+ *
+ *      The unlocker hands over to state[REQ], but only if state[CUR]
+ *	matches.
+ *
+ * In case that the owner does not react on the request and does not make
+ * observable progress, the waiter will timeout and can then decide to do
+ * a hostile takeover.
+ */
+
+#define copy_full_state(_dst, _src)	do { _dst = _src; } while (0)
+#define copy_bit_state(_dst, _src)	do { _dst.bits = _src.bits; } while (0)
+
+#ifdef CONFIG_64BIT
+#define copy_seq_state64(_dst, _src)	do { _dst.seq = _src.seq; } while (0)
+#else
+#define copy_seq_state64(_dst, _src)	do { } while (0)
+#endif
+
+enum state_selector {
+	CON_STATE_CUR,
+	CON_STATE_REQ,
+};
+
+/**
+ * cons_state_set - Helper function to set the console state
+ * @con:	Console to update
+ * @which:	Selects real state or handover state
+ * @new:	The new state to write
+ *
+ * Only to be used when the console is not yet or no longer visible in the
+ * system. Otherwise use cons_state_try_cmpxchg().
+ */
+static inline void cons_state_set(struct console *con, enum state_selector which,
+				  struct cons_state *new)
+{
+	atomic_long_set(&ACCESS_PRIVATE(con, atomic_state[which]), new->atom);
+}
+
+/**
+ * cons_state_read - Helper function to read the console state
+ * @con:	Console to update
+ * @which:	Selects real state or handover state
+ * @state:	The state to store the result
+ */
+static inline void cons_state_read(struct console *con, enum state_selector which,
+				   struct cons_state *state)
+{
+	state->atom = atomic_long_read(&ACCESS_PRIVATE(con, atomic_state[which]));
+}
+
+/**
+ * cons_state_try_cmpxchg() - Helper function for atomic_long_try_cmpxchg() on console state
+ * @con:	Console to update
+ * @which:	Selects real state or handover state
+ * @old:	Old/expected state
+ * @new:	New state
+ *
+ * Returns: True on success, false on fail
+ */
+static inline bool cons_state_try_cmpxchg(struct console *con,
+					  enum state_selector which,
+					  struct cons_state *old,
+					  struct cons_state *new)
+{
+	return atomic_long_try_cmpxchg(&ACCESS_PRIVATE(con, atomic_state[which]),
+				       &old->atom, new->atom);
+}
+
+/**
+ * cons_nobkl_init - Initialize the NOBKL console specific data
+ * @con:	Console to initialize
+ */
+void cons_nobkl_init(struct console *con)
+{
+	struct cons_state state = { };
+
+	cons_state_set(con, CON_STATE_CUR, &state);
+	cons_state_set(con, CON_STATE_REQ, &state);
+}
+
+/**
+ * cons_nobkl_cleanup - Cleanup the NOBKL console specific data
+ * @con:	Console to cleanup
+ */
+void cons_nobkl_cleanup(struct console *con)
+{
+	struct cons_state state = { };
+
+	cons_state_set(con, CON_STATE_CUR, &state);
+	cons_state_set(con, CON_STATE_REQ, &state);
+}
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH printk v1 06/18] printk: nobkl: Add acquire/release logic
  2023-03-02 19:56 [PATCH printk v1 00/18] threaded/atomic console support John Ogness
                   ` (4 preceding siblings ...)
  2023-03-02 19:56 ` [PATCH printk v1 05/18] printk: Add non-BKL console basic infrastructure John Ogness
@ 2023-03-02 19:56 ` John Ogness
  2023-03-06  9:07   ` Dan Carpenter
                     ` (2 more replies)
  2023-03-02 19:56 ` [PATCH printk v1 07/18] printk: nobkl: Add buffer management John Ogness
                   ` (13 subsequent siblings)
  19 siblings, 3 replies; 92+ messages in thread
From: John Ogness @ 2023-03-02 19:56 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman

From: Thomas Gleixner <tglx@linutronix.de>

Add per console acquire/release functionality. The console 'locked'
state is a combination of several state fields:

  - The 'locked' bit

  - The 'cpu' field that denotes on which CPU the console is locked

  - The 'cur_prio' field that contains the severity of the printk
    context that owns the console. This field is used for decisions
    whether to attempt friendly handovers and also prevents takeovers
    from a less severe context, e.g. to protect the panic CPU.

The acquire mechanism comes with several flavours:

  - Straight forward acquire when the console is not contended

  - Friendly handover mechanism based on a request/grant handshake

    The requesting context:

      1) Puts the desired handover state (CPU nr, prio) into a
         separate handover state

      2) Sets the 'req_prio' field in the real console state

      3) Waits (with a timeout) for the owning context to handover

    The owning context:

      1) Observes the 'req_prio' field set

      2) Hands the console over to the requesting context by
         switching the console state to the handover state that was
         provided by the requester

  - Hostile takeover

      The new owner takes the console over without handshake

      This is required when friendly handovers are not possible,
      i.e. the higher priority context interrupted the owning context
      on the same CPU or the owning context is not able to make
      progress on a remote CPU.

The release is the counterpart which either releases the console
directly or hands it gracefully over to a requester.

All operations on console::atomic_state[CUR|REQ] are atomic
cmpxchg based to handle concurrency.

The acquire/release functions implement only minimal policies:

  - Preference for higher priority contexts
  - Protection of the panic CPU

All other policy decisions have to be made at the call sites.

The design allows to implement the well known:

    acquire()
    output_one_line()
    release()

algorithm, but also allows to avoid the per line acquire/release for
e.g. panic situations by doing the acquire once and then relying on
the panic CPU protection for the rest.

Co-developed-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Thomas Gleixner (Intel) <tglx@linutronix.de>
---
 include/linux/console.h      |  82 ++++++
 kernel/printk/printk_nobkl.c | 531 +++++++++++++++++++++++++++++++++++
 2 files changed, 613 insertions(+)

diff --git a/include/linux/console.h b/include/linux/console.h
index b9d2ad580128..2c95fcc765e6 100644
--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -176,8 +176,20 @@ enum cons_flags {
  * @seq:	Sequence for record tracking (64bit only)
  * @bits:	Compound of the state bits below
  *
+ * @locked:	Console is locked by a writer
+ * @unsafe:	Console is busy in a non takeover region
+ * @cur_prio:	The priority of the current output
+ * @req_prio:	The priority of a handover request
+ * @cpu:	The CPU on which the writer runs
+ *
  * To be used for state read and preparation of atomic_long_cmpxchg()
  * operations.
+ *
+ * The @req_prio field is particularly important to allow spin-waiting to
+ * timeout and give up without the risk of it being assigned the lock
+ * after giving up. The @req_prio field has a nice side-effect that it
+ * also makes it possible for a single read+cmpxchg in the common case of
+ * acquire and release.
  */
 struct cons_state {
 	union {
@@ -189,12 +201,79 @@ struct cons_state {
 			union {
 				u32	bits;
 				struct {
+					u32 locked	:  1;
+					u32 unsafe	:  1;
+					u32 cur_prio	:  2;
+					u32 req_prio	:  2;
+					u32 cpu		: 18;
 				};
 			};
 		};
 	};
 };
 
+/**
+ * cons_prio - console writer priority for NOBKL consoles
+ * @CONS_PRIO_NONE:		Unused
+ * @CONS_PRIO_NORMAL:		Regular printk
+ * @CONS_PRIO_EMERGENCY:	Emergency output (WARN/OOPS...)
+ * @CONS_PRIO_PANIC:		Panic output
+ *
+ * Emergency output can carefully takeover the console even without consent
+ * of the owner, ideally only when @cons_state::unsafe is not set. Panic
+ * output can ignore the unsafe flag as a last resort. If panic output is
+ * active no takeover is possible until the panic output releases the
+ * console.
+ */
+enum cons_prio {
+	CONS_PRIO_NONE = 0,
+	CONS_PRIO_NORMAL,
+	CONS_PRIO_EMERGENCY,
+	CONS_PRIO_PANIC,
+};
+
+struct console;
+
+/**
+ * struct cons_context - Context for console acquire/release
+ * @console:		The associated console
+ * @state:		The state at acquire time
+ * @old_state:		The old state when try_acquire() failed for analysis
+ *			by the caller
+ * @hov_state:		The handover state for spin and cleanup
+ * @req_state:		The request state for spin and cleanup
+ * @spinwait_max_us:	Limit for spinwait acquire
+ * @prio:		Priority of the context
+ * @hostile:		Hostile takeover requested. Cleared on normal
+ *			acquire or friendly handover
+ * @spinwait:		Spinwait on acquire if possible
+ */
+struct cons_context {
+	struct console		*console;
+	struct cons_state	state;
+	struct cons_state	old_state;
+	struct cons_state	hov_state;
+	struct cons_state	req_state;
+	unsigned int		spinwait_max_us;
+	enum cons_prio		prio;
+	unsigned int		hostile		: 1;
+	unsigned int		spinwait	: 1;
+};
+
+/**
+ * struct cons_write_context - Context handed to the write callbacks
+ * @ctxt:	The core console context
+ * @outbuf:	Pointer to the text buffer for output
+ * @len:	Length to write
+ * @unsafe:	Invoked in unsafe state due to force takeover
+ */
+struct cons_write_context {
+	struct cons_context	__private ctxt;
+	char			*outbuf;
+	unsigned int		len;
+	bool			unsafe;
+};
+
 /**
  * struct console - The console descriptor structure
  * @name:		The name of the console driver
@@ -364,6 +443,9 @@ static inline bool console_is_registered(const struct console *con)
 	lockdep_assert_console_list_lock_held();			\
 	hlist_for_each_entry(con, &console_list, node)
 
+extern bool console_try_acquire(struct cons_write_context *wctxt);
+extern bool console_release(struct cons_write_context *wctxt);
+
 extern int console_set_on_cmdline;
 extern struct console *early_console;
 
diff --git a/kernel/printk/printk_nobkl.c b/kernel/printk/printk_nobkl.c
index 8df3626808dd..78136347a328 100644
--- a/kernel/printk/printk_nobkl.c
+++ b/kernel/printk/printk_nobkl.c
@@ -4,6 +4,7 @@
 
 #include <linux/kernel.h>
 #include <linux/console.h>
+#include <linux/delay.h>
 #include "internal.h"
 /*
  * Printk implementation for consoles that do not depend on the BKL style
@@ -112,6 +113,536 @@ static inline bool cons_state_try_cmpxchg(struct console *con,
 				       &old->atom, new->atom);
 }
 
+/**
+ * cons_state_full_match - Check whether the full state matches
+ * @cur:	The state to check
+ * @prev:	The previous state
+ *
+ * Returns: True if matching, false otherwise.
+ *
+ * Check the full state including state::seq on 64bit. For take over
+ * detection.
+ */
+static inline bool cons_state_full_match(struct cons_state cur,
+					 struct cons_state prev)
+{
+	/*
+	 * req_prio can be set by a concurrent writer for friendly
+	 * handover. Ignore it in the comparison.
+	 */
+	cur.req_prio = prev.req_prio;
+	return cur.atom == prev.atom;
+}
+
+/**
+ * cons_state_bits_match - Check for matching state bits
+ * @cur:	The state to check
+ * @prev:	The previous state
+ *
+ * Returns: True if state matches, false otherwise.
+ *
+ * Contrary to cons_state_full_match this checks only the bits and ignores
+ * a sequence change on 64bits. On 32bit the two functions are identical.
+ */
+static inline bool cons_state_bits_match(struct cons_state cur, struct cons_state prev)
+{
+	/*
+	 * req_prio can be set by a concurrent writer for friendly
+	 * handover. Ignore it in the comparison.
+	 */
+	cur.req_prio = prev.req_prio;
+	return cur.bits == prev.bits;
+}
+
+/**
+ * cons_check_panic - Check whether a remote CPU is in panic
+ *
+ * Returns: True if a remote CPU is in panic, false otherwise.
+ */
+static inline bool cons_check_panic(void)
+{
+	unsigned int pcpu = atomic_read(&panic_cpu);
+
+	return pcpu != PANIC_CPU_INVALID && pcpu != smp_processor_id();
+}
+
+/**
+ * cons_cleanup_handover - Cleanup a handover request
+ * @ctxt:	Pointer to acquire context
+ *
+ * @ctxt->hov_state contains the state to clean up
+ */
+static void cons_cleanup_handover(struct cons_context *ctxt)
+{
+	struct console *con = ctxt->console;
+	struct cons_state new;
+
+	/*
+	 * No loop required. Either hov_state is still the same or
+	 * not.
+	 */
+	new.atom = 0;
+	cons_state_try_cmpxchg(con, CON_STATE_REQ, &ctxt->hov_state, &new);
+}
+
+/**
+ * cons_setup_handover - Setup a handover request
+ * @ctxt:	Pointer to acquire context
+ *
+ * Returns: True if a handover request was setup, false otherwise.
+ *
+ * On success @ctxt->hov_state contains the requested handover state
+ *
+ * On failure this context is not allowed to request a handover from the
+ * current owner. Reasons would be priority too low or a remote CPU in panic.
+ * In both cases this context should give up trying to acquire the console.
+ */
+static bool cons_setup_handover(struct cons_context *ctxt)
+{
+	unsigned int cpu = smp_processor_id();
+	struct console *con = ctxt->console;
+	struct cons_state old;
+	struct cons_state hstate = {
+		.locked		= 1,
+		.cur_prio	= ctxt->prio,
+		.cpu		= cpu,
+	};
+
+	/*
+	 * Try to store hstate in @con->atomic_state[REQ]. This might
+	 * race with a higher priority waiter.
+	 */
+	cons_state_read(con, CON_STATE_REQ, &old);
+	do {
+		if (cons_check_panic())
+			return false;
+
+		/* Same or higher priority waiter exists? */
+		if (old.cur_prio >= ctxt->prio)
+			return false;
+
+	} while (!cons_state_try_cmpxchg(con, CON_STATE_REQ, &old, &hstate));
+
+	/* Save that state for comparison in spinwait */
+	copy_full_state(ctxt->hov_state, hstate);
+	return true;
+}
+
+/**
+ * cons_setup_request - Setup a handover request in state[CUR]
+ * @ctxt:	Pointer to acquire context
+ * @old:	The state that was used to make the decision to spin wait
+ *
+ * Returns: True if a handover request was setup in state[CUR], false
+ * otherwise.
+ *
+ * On success @ctxt->req_state contains the request state that was set in
+ * state[CUR]
+ *
+ * On failure this context encountered unexpected state values. This
+ * context should retry the full handover request setup process (the
+ * handover request setup by cons_setup_handover() is now invalidated
+ * and must be performed again).
+ */
+static bool cons_setup_request(struct cons_context *ctxt, struct cons_state old)
+{
+	struct console *con = ctxt->console;
+	struct cons_state cur;
+	struct cons_state new;
+
+	/* Now set the request in state[CUR] */
+	cons_state_read(con, CON_STATE_CUR, &cur);
+	do {
+		if (cons_check_panic())
+			goto cleanup;
+
+		/* Bit state changed vs. the decision to spinwait? */
+		if (!cons_state_bits_match(cur, old))
+			goto cleanup;
+
+		/*
+		 * A higher or equal priority context already setup a
+		 * request?
+		 */
+		if (cur.req_prio >= ctxt->prio)
+			goto cleanup;
+
+		/* Setup a request for handover. */
+		copy_full_state(new, cur);
+		new.req_prio = ctxt->prio;
+	} while (!cons_state_try_cmpxchg(con, CON_STATE_CUR, &cur, &new));
+
+	/* Save that state for comparison in spinwait */
+	copy_bit_state(ctxt->req_state, new);
+	return true;
+
+cleanup:
+	cons_cleanup_handover(ctxt);
+	return false;
+}
+
+/**
+ * cons_try_acquire_spin - Complete the spinwait attempt
+ * @ctxt:	Pointer to an acquire context that contains
+ *		all information about the acquire mode
+ *
+ * @ctxt->hov_state contains the handover state that was set in
+ * state[REQ]
+ * @ctxt->req_state contains the request state that was set in
+ * state[CUR]
+ *
+ * Returns: 0 if successfully locked. -EBUSY on timeout. -EAGAIN on
+ * unexpected state values.
+ *
+ * On success @ctxt->state contains the new state that was set in
+ * state[CUR]
+ *
+ * On -EBUSY failure this context timed out. This context should either
+ * give up or attempt a hostile takeover.
+ *
+ * On -EAGAIN failure this context encountered unexpected state values.
+ * This context should retry the full handover request setup process (the
+ * handover request setup by cons_setup_handover() is now invalidated and
+ * must be performed again).
+ */
+static bool cons_try_acquire_spin(struct cons_context *ctxt)
+{
+	struct console *con = ctxt->console;
+	struct cons_state cur;
+	struct cons_state new;
+	int err = -EAGAIN;
+	int timeout;
+
+	/* Now wait for the other side to hand over */
+	for (timeout = ctxt->spinwait_max_us; timeout >= 0; timeout--) {
+		/* Timeout immediately if a remote panic is detected. */
+		if (cons_check_panic())
+			break;
+
+		cons_state_read(con, CON_STATE_CUR, &cur);
+
+		/*
+		 * If the real state of the console matches the handover state
+		 * that this context setup, then the handover was a success
+		 * and this context is now the owner.
+		 *
+		 * Note that this might have raced with a new higher priority
+		 * requester coming in after the lock was handed over.
+		 * However, that requester will see that the owner changes and
+		 * setup a new request for the current owner (this context).
+		 */
+		if (cons_state_bits_match(cur, ctxt->hov_state))
+			goto success;
+
+		/*
+		 * If state changed since the request was made, give up as
+		 * it is no longer consistent. This must include
+		 * state::req_prio since there could be a higher priority
+		 * request available.
+		 */
+		if (cur.bits != ctxt->req_state.bits)
+			goto cleanup;
+
+		/*
+		 * Finally check whether the handover state is still
+		 * the same.
+		 */
+		cons_state_read(con, CON_STATE_REQ, &cur);
+		if (cur.atom != ctxt->hov_state.atom)
+			goto cleanup;
+
+		/* Account time */
+		if (timeout > 0)
+			udelay(1);
+	}
+
+	/*
+	 * Timeout. Cleanup the handover state and carefully try to reset
+	 * req_prio in the real state. The reset is important to ensure
+	 * that the owner does not hand over the lock after this context
+	 * has given up waiting.
+	 */
+	cons_cleanup_handover(ctxt);
+
+	cons_state_read(con, CON_STATE_CUR, &cur);
+	do {
+		/*
+		 * The timeout might have raced with the owner coming late
+		 * and handing it over gracefully.
+		 */
+		if (cons_state_bits_match(cur, ctxt->hov_state))
+			goto success;
+
+		/*
+		 * Validate that the state matches with the state at request
+		 * time. If this check fails, there is already a higher
+		 * priority context waiting or the owner has changed (either
+		 * by higher priority or by hostile takeover). In all fail
+		 * cases this context is no longer in line for a handover to
+		 * take place, so no reset is necessary.
+		 */
+		if (cur.bits != ctxt->req_state.bits)
+			goto cleanup;
+
+		copy_full_state(new, cur);
+		new.req_prio = 0;
+	} while (!cons_state_try_cmpxchg(con, CON_STATE_CUR, &cur, &new));
+	/* Reset worked. Report timeout. */
+	return -EBUSY;
+
+success:
+	/* Store the real state */
+	copy_full_state(ctxt->state, cur);
+	ctxt->hostile = false;
+	err = 0;
+
+cleanup:
+	cons_cleanup_handover(ctxt);
+	return err;
+}
+
+/**
+ * __cons_try_acquire - Try to acquire the console for printk output
+ * @ctxt:	Pointer to an acquire context that contains
+ *		all information about the acquire mode
+ *
+ * Returns: True if the acquire was successful. False on fail.
+ *
+ * In case of success @ctxt->state contains the acquisition
+ * state.
+ *
+ * In case of fail @ctxt->old_state contains the state
+ * that was read from @con->state for analysis by the caller.
+ */
+static bool __cons_try_acquire(struct cons_context *ctxt)
+{
+	unsigned int cpu = smp_processor_id();
+	struct console *con = ctxt->console;
+	short flags = console_srcu_read_flags(con);
+	struct cons_state old;
+	struct cons_state new;
+	int err;
+
+	if (WARN_ON_ONCE(!(flags & CON_NO_BKL)))
+		return false;
+again:
+	cons_state_read(con, CON_STATE_CUR, &old);
+
+	/* Preserve it for the caller and for spinwait */
+	copy_full_state(ctxt->old_state, old);
+
+	if (cons_check_panic())
+		return false;
+
+	/* Set up the new state for takeover */
+	copy_full_state(new, old);
+	new.locked = 1;
+	new.cur_prio = ctxt->prio;
+	new.req_prio = CONS_PRIO_NONE;
+	new.cpu = cpu;
+
+	/* Attempt to acquire it directly if unlocked */
+	if (!old.locked) {
+		if (!cons_state_try_cmpxchg(con, CON_STATE_CUR, &old, &new))
+			goto again;
+
+		ctxt->hostile = false;
+		copy_full_state(ctxt->state, new);
+		goto success;
+	}
+
+	/*
+	 * If the active context is on the same CPU then there is
+	 * obviously no handshake possible.
+	 */
+	if (old.cpu == cpu)
+		goto check_hostile;
+
+	/*
+	 * If a handover request with same or higher priority is already
+	 * pending then this context cannot setup a handover request.
+	 */
+	if (old.req_prio >= ctxt->prio)
+		goto check_hostile;
+
+	/*
+	 * If the caller did not request spin-waiting then performing a
+	 * handover is not an option.
+	 */
+	if (!ctxt->spinwait)
+		goto check_hostile;
+
+	/*
+	 * Setup the request in state[REQ]. If this fails then this
+	 * context is not allowed to setup a handover request.
+	 */
+	if (!cons_setup_handover(ctxt))
+		goto check_hostile;
+
+	/*
+	 * Setup the request in state[CUR]. Hand in the state that was
+	 * used to make the decision to spinwait above, for comparison. If
+	 * this fails then unexpected state values were encountered and the
+	 * full request setup process is retried.
+	 */
+	if (!cons_setup_request(ctxt, old))
+		goto again;
+
+	/*
+	 * Spin-wait to acquire the console. If this fails then unexpected
+	 * state values were encountered (for example, a hostile takeover by
+	 * another context) and the full request setup process is retried.
+	 */
+	err = cons_try_acquire_spin(ctxt);
+	if (err) {
+		if (err == -EAGAIN)
+			goto again;
+		goto check_hostile;
+	}
+success:
+	/* Common updates on success */
+	return true;
+
+check_hostile:
+	if (!ctxt->hostile)
+		return false;
+
+	if (cons_check_panic())
+		return false;
+
+	if (!cons_state_try_cmpxchg(con, CON_STATE_CUR, &old, &new))
+		goto again;
+
+	copy_full_state(ctxt->state, new);
+	goto success;
+}
+
+/**
+ * cons_try_acquire - Try to acquire the console for printk output
+ * @ctxt:	Pointer to an acquire context that contains
+ *		all information about the acquire mode
+ *
+ * Returns: True if the acquire was successful. False on fail.
+ *
+ * In case of success @ctxt->state contains the acquisition
+ * state.
+ *
+ * In case of fail @ctxt->old_state contains the state
+ * that was read from @con->state for analysis by the caller.
+ */
+static bool cons_try_acquire(struct cons_context *ctxt)
+{
+	if (__cons_try_acquire(ctxt))
+		return true;
+
+	ctxt->state.atom = 0;
+	return false;
+}
+
+/**
+ * __cons_release - Release the console after output is done
+ * @ctxt:	The acquire context that contains the state
+ *		at cons_try_acquire()
+ *
+ * Returns:	True if the release was regular
+ *
+ *		False if the console is in unusable state or was handed over
+ *		with handshake or taken	over hostile without handshake.
+ *
+ * The return value tells the caller whether it needs to evaluate further
+ * printing.
+ */
+static bool __cons_release(struct cons_context *ctxt)
+{
+	struct console *con = ctxt->console;
+	short flags = console_srcu_read_flags(con);
+	struct cons_state hstate;
+	struct cons_state old;
+	struct cons_state new;
+
+	if (WARN_ON_ONCE(!(flags & CON_NO_BKL)))
+		return false;
+
+	cons_state_read(con, CON_STATE_CUR, &old);
+again:
+	if (!cons_state_bits_match(old, ctxt->state))
+		return false;
+
+	/* Release it directly when no handover request is pending. */
+	if (!old.req_prio)
+		goto unlock;
+
+	/* Read the handover target state */
+	cons_state_read(con, CON_STATE_REQ, &hstate);
+
+	/* If the waiter gave up hstate is 0 */
+	if (!hstate.atom)
+		goto unlock;
+
+	/*
+	 * If a higher priority waiter raced against a lower priority
+	 * waiter then unlock instead of handing over to either. The
+	 * higher priority waiter will notice the updated state and
+	 * retry.
+	 */
+	if (hstate.cur_prio != old.req_prio)
+		goto unlock;
+
+	/* Switch the state and preserve the sequence on 64bit */
+	copy_bit_state(new, hstate);
+	copy_seq_state64(new, old);
+	if (!cons_state_try_cmpxchg(con, CON_STATE_CUR, &old, &new))
+		goto again;
+
+	return true;
+
+unlock:
+	/* Clear the state and preserve the sequence on 64bit */
+	new.atom = 0;
+	copy_seq_state64(new, old);
+	if (!cons_state_try_cmpxchg(con, CON_STATE_CUR, &old, &new))
+		goto again;
+
+	return true;
+}
+
+/**
+ * cons_release - Release the console after output is done
+ * @ctxt:	The acquire context that contains the state
+ *		at cons_try_acquire()
+ *
+ * Returns:	True if the release was regular
+ *
+ *		False if the console is in unusable state or was handed over
+ *		with handshake or taken	over hostile without handshake.
+ *
+ * The return value tells the caller whether it needs to evaluate further
+ * printing.
+ */
+static bool cons_release(struct cons_context *ctxt)
+{
+	bool ret = __cons_release(ctxt);
+
+	ctxt->state.atom = 0;
+	return ret;
+}
+
+bool console_try_acquire(struct cons_write_context *wctxt)
+{
+	struct cons_context *ctxt = &ACCESS_PRIVATE(wctxt, ctxt);
+
+	return cons_try_acquire(ctxt);
+}
+EXPORT_SYMBOL(console_try_acquire);
+
+bool console_release(struct cons_write_context *wctxt)
+{
+	struct cons_context *ctxt = &ACCESS_PRIVATE(wctxt, ctxt);
+
+	return cons_release(ctxt);
+}
+EXPORT_SYMBOL(console_release);
+
 /**
  * cons_nobkl_init - Initialize the NOBKL console specific data
  * @con:	Console to initialize
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH printk v1 07/18] printk: nobkl: Add buffer management
  2023-03-02 19:56 [PATCH printk v1 00/18] threaded/atomic console support John Ogness
                   ` (5 preceding siblings ...)
  2023-03-02 19:56 ` [PATCH printk v1 06/18] printk: nobkl: Add acquire/release logic John Ogness
@ 2023-03-02 19:56 ` John Ogness
  2023-03-21 16:38   ` Petr Mladek
  2023-03-02 19:56 ` [PATCH printk v1 08/18] printk: nobkl: Add sequence handling John Ogness
                   ` (12 subsequent siblings)
  19 siblings, 1 reply; 92+ messages in thread
From: John Ogness @ 2023-03-02 19:56 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman

From: Thomas Gleixner <tglx@linutronix.de>

In case of hostile takeovers it must be ensured that the previous
owner cannot scribble over the output buffer of the emergency/panic
context. This is achieved by:

 - Adding a global output buffer instance for early boot (pre per CPU
   data being available).

 - Allocating an output buffer per console for threaded printers once
   printer threads become available.

 - Allocating per CPU output buffers per console for printing from
   all contexts not covered by the other buffers.

 - Choosing the appropriate buffer is handled in the acquire/release
   functions.

The output buffer is wrapped into a separate data structure so other
context related fields can be added in later steps.

Co-developed-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Thomas Gleixner (Intel) <tglx@linutronix.de>
---
 include/linux/console.h      | 13 ++++++
 kernel/printk/internal.h     | 22 +++++++--
 kernel/printk/printk.c       | 26 +++++++----
 kernel/printk/printk_nobkl.c | 90 +++++++++++++++++++++++++++++++++++-
 4 files changed, 137 insertions(+), 14 deletions(-)

diff --git a/include/linux/console.h b/include/linux/console.h
index 2c95fcc765e6..3d989104240f 100644
--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -178,6 +178,7 @@ enum cons_flags {
  *
  * @locked:	Console is locked by a writer
  * @unsafe:	Console is busy in a non takeover region
+ * @thread:	Current owner is the printk thread
  * @cur_prio:	The priority of the current output
  * @req_prio:	The priority of a handover request
  * @cpu:	The CPU on which the writer runs
@@ -203,6 +204,7 @@ struct cons_state {
 				struct {
 					u32 locked	:  1;
 					u32 unsafe	:  1;
+					u32 thread	:  1;
 					u32 cur_prio	:  2;
 					u32 req_prio	:  2;
 					u32 cpu		: 18;
@@ -233,6 +235,7 @@ enum cons_prio {
 };
 
 struct console;
+struct printk_buffers;
 
 /**
  * struct cons_context - Context for console acquire/release
@@ -244,6 +247,8 @@ struct console;
  * @req_state:		The request state for spin and cleanup
  * @spinwait_max_us:	Limit for spinwait acquire
  * @prio:		Priority of the context
+ * @pbufs:		Pointer to the text buffer for this context
+ * @thread:		The acquire is printk thread context
  * @hostile:		Hostile takeover requested. Cleared on normal
  *			acquire or friendly handover
  * @spinwait:		Spinwait on acquire if possible
@@ -256,6 +261,8 @@ struct cons_context {
 	struct cons_state	req_state;
 	unsigned int		spinwait_max_us;
 	enum cons_prio		prio;
+	struct printk_buffers	*pbufs;
+	unsigned int		thread		: 1;
 	unsigned int		hostile		: 1;
 	unsigned int		spinwait	: 1;
 };
@@ -274,6 +281,8 @@ struct cons_write_context {
 	bool			unsafe;
 };
 
+struct cons_context_data;
+
 /**
  * struct console - The console descriptor structure
  * @name:		The name of the console driver
@@ -295,6 +304,8 @@ struct cons_write_context {
  * @node:		hlist node for the console list
  *
  * @atomic_state:	State array for NOBKL consoles; real and handover
+ * @thread_pbufs:	Pointer to thread private buffer
+ * @pcpu_data:		Pointer to percpu context data
  */
 struct console {
 	char			name[16];
@@ -317,6 +328,8 @@ struct console {
 
 	/* NOBKL console specific members */
 	atomic_long_t		__private atomic_state[2];
+	struct printk_buffers	*thread_pbufs;
+	struct cons_context_data	__percpu *pcpu_data;
 };
 
 #ifdef CONFIG_LOCKDEP
diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h
index da380579263b..61ecdde5c872 100644
--- a/kernel/printk/internal.h
+++ b/kernel/printk/internal.h
@@ -13,8 +13,13 @@ int devkmsg_sysctl_set_loglvl(struct ctl_table *table, int write,
 #define printk_sysctl_init() do { } while (0)
 #endif
 
-#ifdef CONFIG_PRINTK
+#define con_printk(lvl, con, fmt, ...)				\
+	printk(lvl pr_fmt("%s%sconsole [%s%d] " fmt),		\
+	       (con->flags & CON_NO_BKL) ? "" : "legacy ",	\
+	       (con->flags & CON_BOOT) ? "boot" : "",		\
+	       con->name, con->index, ##__VA_ARGS__)
 
+#ifdef CONFIG_PRINTK
 #ifdef CONFIG_PRINTK_CALLER
 #define PRINTK_PREFIX_MAX	48
 #else
@@ -64,7 +69,8 @@ u16 printk_parse_prefix(const char *text, int *level,
 			enum printk_info_flags *flags);
 
 void cons_nobkl_cleanup(struct console *con);
-void cons_nobkl_init(struct console *con);
+bool cons_nobkl_init(struct console *con);
+bool cons_alloc_percpu_data(struct console *con);
 
 #else
 
@@ -81,7 +87,7 @@ void cons_nobkl_init(struct console *con);
 #define printk_safe_exit_irqrestore(flags) local_irq_restore(flags)
 
 static inline bool printk_percpu_data_ready(void) { return false; }
-static inline void cons_nobkl_init(struct console *con) { }
+static inline bool cons_nobkl_init(struct console *con) { return true; }
 static inline void cons_nobkl_cleanup(struct console *con) { }
 
 #endif /* CONFIG_PRINTK */
@@ -113,3 +119,13 @@ struct printk_message {
 	u64			seq;
 	unsigned long		dropped;
 };
+
+/**
+ * struct cons_context_data - console context data
+ * @pbufs:		Buffer for storing the text
+ *
+ * Used for early boot and for per CPU data.
+ */
+struct cons_context_data {
+	struct printk_buffers		pbufs;
+};
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index b2c7c92c3d79..3abefdead7ae 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -459,6 +459,8 @@ static bool have_bkl_console;
  */
 bool have_boot_console;
 
+static int unregister_console_locked(struct console *console);
+
 #ifdef CONFIG_PRINTK
 DECLARE_WAIT_QUEUE_HEAD(log_wait);
 /* All 3 protected by @syslog_lock. */
@@ -1117,7 +1119,19 @@ static inline void log_buf_add_cpu(void) {}
 
 static void __init set_percpu_data_ready(void)
 {
+	struct hlist_node *tmp;
+	struct console *con;
+
+	console_list_lock();
+
+	hlist_for_each_entry_safe(con, tmp, &console_list, node) {
+		if (!cons_alloc_percpu_data(con))
+			unregister_console_locked(con);
+	}
+
 	__printk_percpu_data_ready = true;
+
+	console_list_unlock();
 }
 
 static unsigned int __init add_to_rb(struct printk_ringbuffer *rb,
@@ -3329,12 +3343,6 @@ static void try_enable_default_console(struct console *newcon)
 		newcon->flags |= CON_CONSDEV;
 }
 
-#define con_printk(lvl, con, fmt, ...)				\
-	printk(lvl pr_fmt("%s%sconsole [%s%d] " fmt),		\
-	       (con->flags & CON_NO_BKL) ? "" : "legacy ",	\
-	       (con->flags & CON_BOOT) ? "boot" : "",		\
-	       con->name, con->index, ##__VA_ARGS__)
-
 static void console_init_seq(struct console *newcon, bool bootcon_registered)
 {
 	struct console *con;
@@ -3399,8 +3407,6 @@ static void console_init_seq(struct console *newcon, bool bootcon_registered)
 #define console_first()				\
 	hlist_entry(console_list.first, struct console, node)
 
-static int unregister_console_locked(struct console *console);
-
 /*
  * The console driver calls this routine during kernel initialization
  * to register the console printing procedure with printk() and to
@@ -3494,8 +3500,8 @@ void register_console(struct console *newcon)
 
 	if (!(newcon->flags & CON_NO_BKL))
 		have_bkl_console = true;
-	else
-		cons_nobkl_init(newcon);
+	else if (!cons_nobkl_init(newcon))
+		goto unlock;
 
 	if (newcon->flags & CON_BOOT)
 		have_boot_console = true;
diff --git a/kernel/printk/printk_nobkl.c b/kernel/printk/printk_nobkl.c
index 78136347a328..7db56ffd263a 100644
--- a/kernel/printk/printk_nobkl.c
+++ b/kernel/printk/printk_nobkl.c
@@ -166,6 +166,47 @@ static inline bool cons_check_panic(void)
 	return pcpu != PANIC_CPU_INVALID && pcpu != smp_processor_id();
 }
 
+static struct cons_context_data early_cons_ctxt_data __initdata;
+
+/**
+ * cons_context_set_pbufs - Set the output text buffer for the current context
+ * @ctxt:	Pointer to the acquire context
+ *
+ * Buffer selection:
+ *   1) Early boot uses the global (initdata) buffer
+ *   2) Printer threads use the dynamically allocated per-console buffers
+ *   3) All other contexts use the per CPU buffers
+ *
+ * This guarantees that there is no concurrency on the output records ever.
+ * Early boot and per CPU nesting is not a problem. The takeover logic
+ * tells the interrupted context that the buffer has been overwritten.
+ *
+ * There are two critical regions that matter:
+ *
+ * 1) Context is filling the buffer with a record. After interruption
+ *    it continues to sprintf() the record and before it goes to
+ *    write it out, it checks the state, notices the takeover, discards
+ *    the content and backs out.
+ *
+ * 2) Context is in a unsafe critical region in the driver. After
+ *    interruption it might read overwritten data from the output
+ *    buffer. When it leaves the critical region it notices and backs
+ *    out. Hostile takeovers in driver critical regions are best effort
+ *    and there is not much that can be done about that.
+ */
+static __ref void cons_context_set_pbufs(struct cons_context *ctxt)
+{
+	struct console *con = ctxt->console;
+
+	/* Thread context or early boot? */
+	if (ctxt->thread)
+		ctxt->pbufs = con->thread_pbufs;
+	else if (!con->pcpu_data)
+		ctxt->pbufs = &early_cons_ctxt_data.pbufs;
+	else
+		ctxt->pbufs = &(this_cpu_ptr(con->pcpu_data)->pbufs);
+}
+
 /**
  * cons_cleanup_handover - Cleanup a handover request
  * @ctxt:	Pointer to acquire context
@@ -501,6 +542,7 @@ static bool __cons_try_acquire(struct cons_context *ctxt)
 	}
 success:
 	/* Common updates on success */
+	cons_context_set_pbufs(ctxt);
 	return true;
 
 check_hostile:
@@ -623,6 +665,9 @@ static bool cons_release(struct cons_context *ctxt)
 {
 	bool ret = __cons_release(ctxt);
 
+	/* Invalidate the buffer pointer. It is no longer valid. */
+	ctxt->pbufs = NULL;
+
 	ctxt->state.atom = 0;
 	return ret;
 }
@@ -643,16 +688,58 @@ bool console_release(struct cons_write_context *wctxt)
 }
 EXPORT_SYMBOL(console_release);
 
+/**
+ * cons_alloc_percpu_data - Allocate percpu data for a console
+ * @con:	Console to allocate for
+ *
+ * Returns: True on success. False otherwise and the console cannot be used.
+ *
+ * If it is not yet possible to allocate per CPU data, success is returned.
+ * When per CPU data becomes possible, set_percpu_data_ready() will call
+ * this function again for all registered consoles.
+ */
+bool cons_alloc_percpu_data(struct console *con)
+{
+	if (!printk_percpu_data_ready())
+		return true;
+
+	con->pcpu_data = alloc_percpu(typeof(*con->pcpu_data));
+	if (con->pcpu_data)
+		return true;
+
+	con_printk(KERN_WARNING, con, "failed to allocate percpu buffers\n");
+	return false;
+}
+
+/**
+ * cons_free_percpu_data - Free percpu data of a console on unregister
+ * @con:	Console to clean up
+ */
+static void cons_free_percpu_data(struct console *con)
+{
+	if (!con->pcpu_data)
+		return;
+
+	free_percpu(con->pcpu_data);
+	con->pcpu_data = NULL;
+}
+
 /**
  * cons_nobkl_init - Initialize the NOBKL console specific data
  * @con:	Console to initialize
+ *
+ * Returns: True on success. False otherwise and the console cannot be used.
  */
-void cons_nobkl_init(struct console *con)
+bool cons_nobkl_init(struct console *con)
 {
 	struct cons_state state = { };
 
+	if (!cons_alloc_percpu_data(con))
+		return false;
+
 	cons_state_set(con, CON_STATE_CUR, &state);
 	cons_state_set(con, CON_STATE_REQ, &state);
+	return true;
 }
 
 /**
@@ -665,4 +752,5 @@ void cons_nobkl_cleanup(struct console *con)
 
 	cons_state_set(con, CON_STATE_CUR, &state);
 	cons_state_set(con, CON_STATE_REQ, &state);
+	cons_free_percpu_data(con);
 }
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH printk v1 08/18] printk: nobkl: Add sequence handling
  2023-03-02 19:56 [PATCH printk v1 00/18] threaded/atomic console support John Ogness
                   ` (6 preceding siblings ...)
  2023-03-02 19:56 ` [PATCH printk v1 07/18] printk: nobkl: Add buffer management John Ogness
@ 2023-03-02 19:56 ` John Ogness
  2023-03-27 15:45   ` Petr Mladek
  2023-03-02 19:56 ` [PATCH printk v1 09/18] printk: nobkl: Add print state functions John Ogness
                   ` (11 subsequent siblings)
  19 siblings, 1 reply; 92+ messages in thread
From: John Ogness @ 2023-03-02 19:56 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman

From: Thomas Gleixner <tglx@linutronix.de>

On 64bit systems the sequence tracking is embedded into the atomic
console state, on 32bit it has to be stored in a separate atomic
member. The latter needs to handle the non-atomicity in hostile
takeover cases, while 64bit can completely rely on the state
atomicity.

The ringbuffer sequence number is 64bit, but having a 32bit
representation in the console is sufficient. If a console ever gets
more than 2^31 records behind the ringbuffer then this is the least
of the problems.

On acquire() the atomic 32bit sequence number is expanded to 64 bit
by folding the ringbuffer's sequence into it carefully.

Co-developed-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Thomas Gleixner (Intel) <tglx@linutronix.de>
---
 include/linux/console.h      |   8 ++
 kernel/printk/internal.h     |   4 +
 kernel/printk/printk.c       |  61 +++++++---
 kernel/printk/printk_nobkl.c | 224 +++++++++++++++++++++++++++++++++++
 4 files changed, 280 insertions(+), 17 deletions(-)

diff --git a/include/linux/console.h b/include/linux/console.h
index 3d989104240f..942cc7f57798 100644
--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -246,6 +246,8 @@ struct printk_buffers;
  * @hov_state:		The handover state for spin and cleanup
  * @req_state:		The request state for spin and cleanup
  * @spinwait_max_us:	Limit for spinwait acquire
+ * @oldseq:		The sequence number at acquire()
+ * @newseq:		The sequence number for progress
  * @prio:		Priority of the context
  * @pbufs:		Pointer to the text buffer for this context
  * @thread:		The acquire is printk thread context
@@ -259,6 +261,8 @@ struct cons_context {
 	struct cons_state	old_state;
 	struct cons_state	hov_state;
 	struct cons_state	req_state;
+	u64			oldseq;
+	u64			newseq;
 	unsigned int		spinwait_max_us;
 	enum cons_prio		prio;
 	struct printk_buffers	*pbufs;
@@ -304,6 +308,7 @@ struct cons_context_data;
  * @node:		hlist node for the console list
  *
  * @atomic_state:	State array for NOBKL consoles; real and handover
+ * @atomic_seq:		Sequence for record tracking (32bit only)
  * @thread_pbufs:	Pointer to thread private buffer
  * @pcpu_data:		Pointer to percpu context data
  */
@@ -328,6 +333,9 @@ struct console {
 
 	/* NOBKL console specific members */
 	atomic_long_t		__private atomic_state[2];
+#ifndef CONFIG_64BIT
+	atomic_t		__private atomic_seq;
+#endif
 	struct printk_buffers	*thread_pbufs;
 	struct cons_context_data	__percpu *pcpu_data;
 };
diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h
index 61ecdde5c872..15a412065327 100644
--- a/kernel/printk/internal.h
+++ b/kernel/printk/internal.h
@@ -4,6 +4,7 @@
  */
 #include <linux/percpu.h>
 #include <linux/console.h>
+#include "printk_ringbuffer.h"
 
 #if defined(CONFIG_PRINTK) && defined(CONFIG_SYSCTL)
 void __init printk_sysctl_init(void);
@@ -41,6 +42,8 @@ enum printk_info_flags {
 	LOG_CONT	= 8,	/* text is a fragment of a continuation line */
 };
 
+extern struct printk_ringbuffer *prb;
+
 __printf(4, 0)
 int vprintk_store(int facility, int level,
 		  const struct dev_printk_info *dev_info,
@@ -68,6 +71,7 @@ void defer_console_output(void);
 u16 printk_parse_prefix(const char *text, int *level,
 			enum printk_info_flags *flags);
 
+u64 cons_read_seq(struct console *con);
 void cons_nobkl_cleanup(struct console *con);
 bool cons_nobkl_init(struct console *con);
 bool cons_alloc_percpu_data(struct console *con);
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 3abefdead7ae..21b31183ff2b 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -511,7 +511,7 @@ _DEFINE_PRINTKRB(printk_rb_static, CONFIG_LOG_BUF_SHIFT - PRB_AVGBITS,
 
 static struct printk_ringbuffer printk_rb_dynamic;
 
-static struct printk_ringbuffer *prb = &printk_rb_static;
+struct printk_ringbuffer *prb = &printk_rb_static;
 
 /*
  * We cannot access per-CPU data (e.g. per-CPU flush irq_work) before
@@ -2728,30 +2728,39 @@ static bool abandon_console_lock_in_panic(void)
 
 /*
  * Check if the given console is currently capable and allowed to print
- * records.
- *
- * Requires the console_srcu_read_lock.
+ * records. If the caller only works with certain types of consoles, the
+ * caller is responsible for checking the console type before calling
+ * this function.
  */
-static inline bool console_is_usable(struct console *con)
+static inline bool console_is_usable(struct console *con, short flags)
 {
-	short flags = console_srcu_read_flags(con);
-
 	if (!(flags & CON_ENABLED))
 		return false;
 
 	if ((flags & CON_SUSPENDED))
 		return false;
 
-	if (!con->write)
-		return false;
-
 	/*
-	 * Console drivers may assume that per-cpu resources have been
-	 * allocated. So unless they're explicitly marked as being able to
-	 * cope (CON_ANYTIME) don't call them until this CPU is officially up.
+	 * The usability of a console varies depending on whether
+	 * it is a NOBKL console or not.
 	 */
-	if (!cpu_online(raw_smp_processor_id()) && !(flags & CON_ANYTIME))
-		return false;
+
+	if (flags & CON_NO_BKL) {
+		if (have_boot_console)
+			return false;
+
+	} else {
+		if (!con->write)
+			return false;
+		/*
+		 * Console drivers may assume that per-cpu resources have
+		 * been allocated. So unless they're explicitly marked as
+		 * being able to cope (CON_ANYTIME) don't call them until
+		 * this CPU is officially up.
+		 */
+		if (!cpu_online(raw_smp_processor_id()) && !(flags & CON_ANYTIME))
+			return false;
+	}
 
 	return true;
 }
@@ -3001,9 +3010,14 @@ static bool console_flush_all(bool do_cond_resched, u64 *next_seq, bool *handove
 
 		cookie = console_srcu_read_lock();
 		for_each_console_srcu(con) {
+			short flags = console_srcu_read_flags(con);
 			bool progress;
 
-			if (!console_is_usable(con))
+			/* console_flush_all() is only for legacy consoles. */
+			if (flags & CON_NO_BKL)
+				continue;
+
+			if (!console_is_usable(con, flags))
 				continue;
 			any_usable = true;
 
@@ -3775,10 +3789,23 @@ static bool __pr_flush(struct console *con, int timeout_ms, bool reset_on_progre
 
 		cookie = console_srcu_read_lock();
 		for_each_console_srcu(c) {
+			short flags;
+
 			if (con && con != c)
 				continue;
-			if (!console_is_usable(c))
+
+			flags = console_srcu_read_flags(c);
+
+			if (!console_is_usable(c, flags))
 				continue;
+
+			/*
+			 * Since the console is locked, use this opportunity
+			 * to update console->seq for NOBKL consoles.
+			 */
+			if (flags & CON_NO_BKL)
+				c->seq = cons_read_seq(c);
+
 			printk_seq = c->seq;
 			if (printk_seq < seq)
 				diff += seq - printk_seq;
diff --git a/kernel/printk/printk_nobkl.c b/kernel/printk/printk_nobkl.c
index 7db56ffd263a..7184a93a5b0d 100644
--- a/kernel/printk/printk_nobkl.c
+++ b/kernel/printk/printk_nobkl.c
@@ -5,6 +5,7 @@
 #include <linux/kernel.h>
 #include <linux/console.h>
 #include <linux/delay.h>
+#include "printk_ringbuffer.h"
 #include "internal.h"
 /*
  * Printk implementation for consoles that do not depend on the BKL style
@@ -207,6 +208,227 @@ static __ref void cons_context_set_pbufs(struct cons_context *ctxt)
 		ctxt->pbufs = &(this_cpu_ptr(con->pcpu_data)->pbufs);
 }
 
+/**
+ * cons_seq_init - Helper function to initialize the console sequence
+ * @con:	Console to work on
+ *
+ * Set @con->atomic_seq to the starting record, or if that record no
+ * longer exists, the oldest available record. For init only. Do not
+ * use for runtime updates.
+ */
+static void cons_seq_init(struct console *con)
+{
+	u32 seq = (u32)max_t(u64, con->seq, prb_first_valid_seq(prb));
+#ifdef CONFIG_64BIT
+	struct cons_state state;
+
+	cons_state_read(con, CON_STATE_CUR, &state);
+	state.seq = seq;
+	cons_state_set(con, CON_STATE_CUR, &state);
+#else
+	atomic_set(&ACCESS_PRIVATE(con, atomic_seq), seq);
+#endif
+}
+
+static inline u64 cons_expand_seq(u64 seq)
+{
+	u64 rbseq;
+
+	/*
+	 * The provided sequence is only the lower 32bits of the ringbuffer
+	 * sequence. It needs to be expanded to 64bit. Get the next sequence
+	 * number from the ringbuffer and fold it.
+	 */
+	rbseq = prb_next_seq(prb);
+	seq = rbseq - ((u32)rbseq - (u32)seq);
+
+	return seq;
+}
+
+/**
+ * cons_read_seq - Read the current console sequence
+ * @con:	Console to read the sequence of
+ *
+ * Returns:	Sequence number of the next record to print on @con.
+ */
+u64 cons_read_seq(struct console *con)
+{
+	u64 seq;
+#ifdef CONFIG_64BIT
+	struct cons_state state;
+
+	cons_state_read(con, CON_STATE_CUR, &state);
+	seq = state.seq;
+#else
+	seq = atomic_read(&ACCESS_PRIVATE(con, atomic_seq));
+#endif
+	return cons_expand_seq(seq);
+}
+
+/**
+ * cons_context_set_seq - Setup the context with the next sequence to print
+ * @ctxt:	Pointer to an acquire context that contains
+ *		all information about the acquire mode
+ *
+ * On return the retrieved sequence number is stored in ctxt->oldseq.
+ *
+ * The sequence number is safe in forceful takeover situations.
+ *
+ * Either the writer succeeded to update before it got interrupted
+ * or it failed. In the latter case the takeover will print the
+ * same line again.
+ *
+ * The sequence is only the lower 32bits of the ringbuffer sequence. The
+ * ringbuffer must be 2^31 records ahead to get out of sync. This needs
+ * some care when starting a console, i.e setting the sequence to 0 is
+ * wrong. It has to be set to the oldest valid sequence in the ringbuffer
+ * as that cannot be more than 2^31 records away
+ *
+ * On 64bit the 32bit sequence is part of console::state, which is saved
+ * in @ctxt->state. This prevents the 32bit update race.
+ */
+static void cons_context_set_seq(struct cons_context *ctxt)
+{
+#ifdef CONFIG_64BIT
+	ctxt->oldseq = ctxt->state.seq;
+#else
+	ctxt->oldseq = atomic_read(&ACCESS_PRIVATE(ctxt->console, atomic_seq));
+#endif
+	ctxt->oldseq = cons_expand_seq(ctxt->oldseq);
+	ctxt->newseq = ctxt->oldseq;
+}
+
+/**
+ * cons_seq_try_update - Try to update the console sequence number
+ * @ctxt:	Pointer to an acquire context that contains
+ *		all information about the acquire mode
+ *
+ * Returns:	True if the console sequence was updated, false otherwise.
+ *
+ * Internal helper as the logic is different on 32bit and 64bit.
+ *
+ * On 32 bit the sequence is separate from state and therefore
+ * subject to a subtle race in the case of hostile takeovers.
+ *
+ * On 64 bit the sequence is part of the state and therefore safe
+ * vs. hostile takeovers.
+ *
+ * In case of fail the console has been taken over and @ctxt is
+ * invalid. Caller has to reacquire the console.
+ */
+#ifdef CONFIG_64BIT
+static bool __maybe_unused cons_seq_try_update(struct cons_context *ctxt)
+{
+	struct console *con = ctxt->console;
+	struct cons_state old;
+	struct cons_state new;
+
+	cons_state_read(con, CON_STATE_CUR, &old);
+	do {
+		/* Make sure this context is still the owner. */
+		if (!cons_state_bits_match(old, ctxt->state))
+			return false;
+
+		/* Preserve bit state */
+		copy_bit_state(new, old);
+		new.seq = ctxt->newseq;
+
+		/*
+		 * Can race with hostile takeover or with a handover
+		 * request.
+		 */
+	} while (!cons_state_try_cmpxchg(con, CON_STATE_CUR, &old, &new));
+
+	copy_full_state(ctxt->state, new);
+	ctxt->oldseq = ctxt->newseq;
+
+	return true;
+}
+#else
+static bool cons_release(struct cons_context *ctxt);
+static bool __maybe_unused cons_seq_try_update(struct cons_context *ctxt)
+{
+	struct console *con = ctxt->console;
+	struct cons_state state;
+	int pcpu;
+	u32 old;
+	u32 new;
+
+	/*
+	 * There is a corner case that needs to be considered here:
+	 *
+	 * CPU0			CPU1
+	 * printk()
+	 *  acquire()		-> emergency
+	 *  write()		   acquire()
+	 *  update_seq()
+	 *    state == OK
+	 * --> NMI
+	 *			   takeover()
+	 * <---			     write()
+	 *  cmpxchg() succeeds	     update_seq()
+	 *			     cmpxchg() fails
+	 *
+	 * There is nothing that can be done about this other than having
+	 * yet another state bit that needs to be tracked and analyzed,
+	 * but fails to cover the problem completely.
+	 *
+	 * No other scenarios expose such a problem. On same CPU takeovers
+	 * the cmpxchg() always fails on the interrupted context after the
+	 * interrupting context finished printing, but that's fine as it
+	 * does not own the console anymore. The state check after the
+	 * failed cmpxchg prevents that.
+	 */
+	cons_state_read(con, CON_STATE_CUR, &state);
+	/* Make sure this context is still the owner. */
+	if (!cons_state_bits_match(state, ctxt->state))
+		return false;
+
+	/*
+	 * Get the original sequence number that was retrieved
+	 * from @con->atomic_seq. @con->atomic_seq should be still
+	 * the same. 32bit truncates. See cons_context_set_seq().
+	 */
+	old = (u32)ctxt->oldseq;
+	new = (u32)ctxt->newseq;
+	if (atomic_try_cmpxchg(&ACCESS_PRIVATE(con, atomic_seq), &old, new)) {
+		ctxt->oldseq = ctxt->newseq;
+		return true;
+	}
+
+	/*
+	 * Reread the state. If this context does not own the console anymore
+	 * then it cannot touch the sequence again.
+	 */
+	cons_state_read(con, CON_STATE_CUR, &state);
+	if (!cons_state_bits_match(state, ctxt->state))
+		return false;
+
+	pcpu = atomic_read(&panic_cpu);
+	if (pcpu == smp_processor_id()) {
+		/*
+		 * This is the panic CPU. Emitting a warning here does not
+		 * help at all. The callchain is clear and the priority is
+		 * to get the messages out. In the worst case duplicated
+		 * ones. That's a job for postprocessing.
+		 */
+		atomic_set(&ACCESS_PRIVATE(con, atomic_seq), new);
+		ctxt->oldseq = ctxt->newseq;
+		return true;
+	}
+
+	/*
+	 * Only emit a warning when this happens outside of a panic
+	 * situation as on panic it's neither useful nor helping to let the
+	 * panic CPU get the important stuff out.
+	 */
+	WARN_ON_ONCE(pcpu == PANIC_CPU_INVALID);
+
+	cons_release(ctxt);
+	return false;
+}
+#endif
+
 /**
  * cons_cleanup_handover - Cleanup a handover request
  * @ctxt:	Pointer to acquire context
@@ -542,6 +764,7 @@ static bool __cons_try_acquire(struct cons_context *ctxt)
 	}
 success:
 	/* Common updates on success */
+	cons_context_set_seq(ctxt);
 	cons_context_set_pbufs(ctxt);
 	return true;
 
@@ -739,6 +962,7 @@ bool cons_nobkl_init(struct console *con)
 
 	cons_state_set(con, CON_STATE_CUR, &state);
 	cons_state_set(con, CON_STATE_REQ, &state);
+	cons_seq_init(con);
 	return true;
 }
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH printk v1 09/18] printk: nobkl: Add print state functions
  2023-03-02 19:56 [PATCH printk v1 00/18] threaded/atomic console support John Ogness
                   ` (7 preceding siblings ...)
  2023-03-02 19:56 ` [PATCH printk v1 08/18] printk: nobkl: Add sequence handling John Ogness
@ 2023-03-02 19:56 ` John Ogness
  2023-03-29 13:58   ` buffer write race: " Petr Mladek
  2023-03-29 14:05   ` misc details: was: " Petr Mladek
  2023-03-02 19:56 ` [PATCH printk v1 10/18] printk: nobkl: Add emit function and callback functions for atomic printing John Ogness
                   ` (10 subsequent siblings)
  19 siblings, 2 replies; 92+ messages in thread
From: John Ogness @ 2023-03-02 19:56 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman

From: Thomas Gleixner <tglx@linutronix.de>

Provide three functions which are related to the safe handover
mechanism and allow console drivers to denote takeover unsafe
sections:

 - console_can_proceed()

   Invoked by a console driver to check whether a handover request
   is pending or whether the console was taken over in a hostile
   fashion.

 - console_enter/exit_unsafe()

   Invoked by a console driver to denote that the driver output
   function is about to enter or to leave an critical region where a
   hostile take over is unsafe. These functions are also
   cancellation points.

   The unsafe state is stored in the console state and allows a
   takeover attempt to make informed decisions whether to take over
   and/or output on such a console at all. The unsafe state is also
   available to the driver in the write context for the
   atomic_write() output function so the driver can make informed
   decisions about the required actions or take a special emergency
   path.

Co-developed-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Thomas Gleixner (Intel) <tglx@linutronix.de>
---
 include/linux/console.h      |   3 +
 kernel/printk/printk_nobkl.c | 139 +++++++++++++++++++++++++++++++++++
 2 files changed, 142 insertions(+)

diff --git a/include/linux/console.h b/include/linux/console.h
index 942cc7f57798..0779757cb917 100644
--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -464,6 +464,9 @@ static inline bool console_is_registered(const struct console *con)
 	lockdep_assert_console_list_lock_held();			\
 	hlist_for_each_entry(con, &console_list, node)
 
+extern bool console_can_proceed(struct cons_write_context *wctxt);
+extern bool console_enter_unsafe(struct cons_write_context *wctxt);
+extern bool console_exit_unsafe(struct cons_write_context *wctxt);
 extern bool console_try_acquire(struct cons_write_context *wctxt);
 extern bool console_release(struct cons_write_context *wctxt);
 
diff --git a/kernel/printk/printk_nobkl.c b/kernel/printk/printk_nobkl.c
index 7184a93a5b0d..3318a79a150a 100644
--- a/kernel/printk/printk_nobkl.c
+++ b/kernel/printk/printk_nobkl.c
@@ -947,6 +947,145 @@ static void cons_free_percpu_data(struct console *con)
 	con->pcpu_data = NULL;
 }
 
+/**
+ * console_can_proceed - Check whether printing can proceed
+ * @wctxt:	The write context that was handed to the write function
+ *
+ * Returns:	True if the state is correct. False if a handover
+ *		has been requested or if the console was taken
+ *		over.
+ *
+ * Must be invoked after the record was dumped into the assigned record
+ * buffer and at appropriate safe places in the driver.  For unsafe driver
+ * sections see console_enter_unsafe().
+ *
+ * When this function returns false then the calling context is not allowed
+ * to go forward and has to back out immediately and carefully. The buffer
+ * content is no longer trusted either and the console lock is no longer
+ * held.
+ */
+bool console_can_proceed(struct cons_write_context *wctxt)
+{
+	struct cons_context *ctxt = &ACCESS_PRIVATE(wctxt, ctxt);
+	struct console *con = ctxt->console;
+	struct cons_state state;
+
+	cons_state_read(con, CON_STATE_CUR, &state);
+	/* Store it for analysis or reuse */
+	copy_full_state(ctxt->old_state, state);
+
+	/* Make sure this context is still the owner. */
+	if (!cons_state_full_match(state, ctxt->state))
+		return false;
+
+	/*
+	 * Having a safe point for take over and eventually a few
+	 * duplicated characters or a full line is way better than a
+	 * hostile takeover. Post processing can take care of the garbage.
+	 * Continue if the requested priority is not sufficient.
+	 */
+	if (state.req_prio <= state.cur_prio)
+		return true;
+
+	/*
+	 * A console printer within an unsafe region is allowed to continue.
+	 * It can perform the handover when exiting the safe region. Otherwise
+	 * a hostile takeover will be necessary.
+	 */
+	if (state.unsafe)
+		return true;
+
+	/* Release and hand over */
+	cons_release(ctxt);
+	/*
+	 * This does not check whether the handover succeeded. The
+	 * outermost callsite has to make the final decision whether printing
+	 * should continue or not (via reacquire, possibly hostile). The
+	 * console is unlocked already so go back all the way instead of
+	 * trying to implement heuristics in tons of places.
+	 */
+	return false;
+}
+
+/**
+ * __console_update_unsafe - Update the unsafe bit in @con->atomic_state
+ * @wctxt:	The write context that was handed to the write function
+ *
+ * Returns:	True if the state is correct. False if a handover
+ *		has been requested or if the console was taken
+ *		over.
+ *
+ * Must be invoked before an unsafe driver section is entered.
+ *
+ * When this function returns false then the calling context is not allowed
+ * to go forward and has to back out immediately and carefully. The buffer
+ * content is no longer trusted either and the console lock is no longer
+ * held.
+ *
+ * Internal helper to avoid duplicated code
+ */
+static bool __console_update_unsafe(struct cons_write_context *wctxt, bool unsafe)
+{
+	struct cons_context *ctxt = &ACCESS_PRIVATE(wctxt, ctxt);
+	struct console *con = ctxt->console;
+	struct cons_state new;
+
+	do  {
+		if (!console_can_proceed(wctxt))
+			return false;
+		/*
+		 * console_can_proceed() saved the real state in
+		 * ctxt->old_state
+		 */
+		copy_full_state(new, ctxt->old_state);
+		new.unsafe = unsafe;
+
+	} while (!cons_state_try_cmpxchg(con, CON_STATE_CUR, &ctxt->old_state, &new));
+
+	copy_full_state(ctxt->state, new);
+	return true;
+}
+
+/**
+ * console_enter_unsafe - Enter an unsafe region in the driver
+ * @wctxt:	The write context that was handed to the write function
+ *
+ * Returns:	True if the state is correct. False if a handover
+ *		has been requested or if the console was taken
+ *		over.
+ *
+ * Must be invoked before an unsafe driver section is entered.
+ *
+ * When this function returns false then the calling context is not allowed
+ * to go forward and has to back out immediately and carefully. The buffer
+ * content is no longer trusted either and the console lock is no longer
+ * held.
+ */
+bool console_enter_unsafe(struct cons_write_context *wctxt)
+{
+	return __console_update_unsafe(wctxt, true);
+}
+
+/**
+ * console_exit_unsafe - Exit an unsafe region in the driver
+ * @wctxt:	The write context that was handed to the write function
+ *
+ * Returns:	True if the state is correct. False if a handover
+ *		has been requested or if the console was taken
+ *		over.
+ *
+ * Must be invoked before an unsafe driver section is exited.
+ *
+ * When this function returns false then the calling context is not allowed
+ * to go forward and has to back out immediately and carefully. The buffer
+ * content is no longer trusted either and the console lock is no longer
+ * held.
+ */
+bool console_exit_unsafe(struct cons_write_context *wctxt)
+{
+	return __console_update_unsafe(wctxt, false);
+}
+
 /**
  * cons_nobkl_init - Initialize the NOBKL console specific data
  * @con:	Console to initialize
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH printk v1 10/18] printk: nobkl: Add emit function and callback functions for atomic printing
  2023-03-02 19:56 [PATCH printk v1 00/18] threaded/atomic console support John Ogness
                   ` (8 preceding siblings ...)
  2023-03-02 19:56 ` [PATCH printk v1 09/18] printk: nobkl: Add print state functions John Ogness
@ 2023-03-02 19:56 ` John Ogness
  2023-03-03  0:19   ` kernel test robot
                     ` (2 more replies)
  2023-03-02 19:56 ` [PATCH printk v1 11/18] printk: nobkl: Introduce printer threads John Ogness
                   ` (9 subsequent siblings)
  19 siblings, 3 replies; 92+ messages in thread
From: John Ogness @ 2023-03-02 19:56 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman

From: Thomas Gleixner <tglx@linutronix.de>

Implement an emit function for non-BKL consoles to output printk
messages. It utilizes the lockless printk_get_next_message() and
console_prepend_dropped() functions to retrieve/build the output
message. The emit function includes the required safety points to
check for handover/takeover and calls a new write_atomic callback
of the console driver to output the message. It also includes proper
handling for updating the non-BKL console sequence number.

Co-developed-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Thomas Gleixner (Intel) <tglx@linutronix.de>
---
 include/linux/console.h      |   8 +++
 kernel/printk/internal.h     |   9 +++
 kernel/printk/printk.c       |  12 ++--
 kernel/printk/printk_nobkl.c | 121 ++++++++++++++++++++++++++++++++++-
 4 files changed, 141 insertions(+), 9 deletions(-)

diff --git a/include/linux/console.h b/include/linux/console.h
index 0779757cb917..15f71ccfcd9d 100644
--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -250,10 +250,12 @@ struct printk_buffers;
  * @newseq:		The sequence number for progress
  * @prio:		Priority of the context
  * @pbufs:		Pointer to the text buffer for this context
+ * @dropped:		Dropped counter for the current context
  * @thread:		The acquire is printk thread context
  * @hostile:		Hostile takeover requested. Cleared on normal
  *			acquire or friendly handover
  * @spinwait:		Spinwait on acquire if possible
+ * @backlog:		Ringbuffer has pending records
  */
 struct cons_context {
 	struct console		*console;
@@ -266,9 +268,11 @@ struct cons_context {
 	unsigned int		spinwait_max_us;
 	enum cons_prio		prio;
 	struct printk_buffers	*pbufs;
+	unsigned long		dropped;
 	unsigned int		thread		: 1;
 	unsigned int		hostile		: 1;
 	unsigned int		spinwait	: 1;
+	unsigned int		backlog		: 1;
 };
 
 /**
@@ -310,6 +314,7 @@ struct cons_context_data;
  * @atomic_state:	State array for NOBKL consoles; real and handover
  * @atomic_seq:		Sequence for record tracking (32bit only)
  * @thread_pbufs:	Pointer to thread private buffer
+ * @write_atomic:	Write callback for atomic context
  * @pcpu_data:		Pointer to percpu context data
  */
 struct console {
@@ -337,6 +342,9 @@ struct console {
 	atomic_t		__private atomic_seq;
 #endif
 	struct printk_buffers	*thread_pbufs;
+
+	bool (*write_atomic)(struct console *con, struct cons_write_context *wctxt);
+
 	struct cons_context_data	__percpu *pcpu_data;
 };
 
diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h
index 15a412065327..13dd0ce23c37 100644
--- a/kernel/printk/internal.h
+++ b/kernel/printk/internal.h
@@ -133,3 +133,12 @@ struct printk_message {
 struct cons_context_data {
 	struct printk_buffers		pbufs;
 };
+
+#ifdef CONFIG_PRINTK
+
+bool printk_get_next_message(struct printk_message *pmsg, u64 seq,
+			     bool is_extended, bool may_supress);
+void console_prepend_dropped(struct printk_message *pmsg,
+			     unsigned long dropped);
+
+#endif
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 21b31183ff2b..eab0358baa6f 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -715,9 +715,6 @@ static ssize_t msg_print_ext_body(char *buf, size_t size,
 	return len;
 }
 
-static bool printk_get_next_message(struct printk_message *pmsg, u64 seq,
-				    bool is_extended, bool may_supress);
-
 /* /dev/kmsg - userspace message inject/listen interface */
 struct devkmsg_user {
 	atomic64_t seq;
@@ -2786,7 +2783,7 @@ static void __console_unlock(void)
  * If @pmsg->pbufs->outbuf is modified, @pmsg->outbuf_len is updated.
  */
 #ifdef CONFIG_PRINTK
-static void console_prepend_dropped(struct printk_message *pmsg, unsigned long dropped)
+void console_prepend_dropped(struct printk_message *pmsg, unsigned long dropped)
 {
 	struct printk_buffers *pbufs = pmsg->pbufs;
 	const size_t scratchbuf_sz = sizeof(pbufs->scratchbuf);
@@ -2818,7 +2815,8 @@ static void console_prepend_dropped(struct printk_message *pmsg, unsigned long d
 	pmsg->outbuf_len += len;
 }
 #else
-#define console_prepend_dropped(pmsg, dropped)
+static inline void console_prepend_dropped(struct printk_message *pmsg,
+					   unsigned long dropped) { }
 #endif /* CONFIG_PRINTK */
 
 /*
@@ -2840,8 +2838,8 @@ static void console_prepend_dropped(struct printk_message *pmsg, unsigned long d
  * of @pmsg are valid. (See the documentation of struct printk_message
  * for information about the @pmsg fields.)
  */
-static bool printk_get_next_message(struct printk_message *pmsg, u64 seq,
-				    bool is_extended, bool may_suppress)
+bool printk_get_next_message(struct printk_message *pmsg, u64 seq,
+			     bool is_extended, bool may_suppress)
 {
 	static int panic_console_dropped;
 
diff --git a/kernel/printk/printk_nobkl.c b/kernel/printk/printk_nobkl.c
index 3318a79a150a..5c591bced1be 100644
--- a/kernel/printk/printk_nobkl.c
+++ b/kernel/printk/printk_nobkl.c
@@ -317,7 +317,7 @@ static void cons_context_set_seq(struct cons_context *ctxt)
  * invalid. Caller has to reacquire the console.
  */
 #ifdef CONFIG_64BIT
-static bool __maybe_unused cons_seq_try_update(struct cons_context *ctxt)
+static bool cons_seq_try_update(struct cons_context *ctxt)
 {
 	struct console *con = ctxt->console;
 	struct cons_state old;
@@ -346,7 +346,7 @@ static bool __maybe_unused cons_seq_try_update(struct cons_context *ctxt)
 }
 #else
 static bool cons_release(struct cons_context *ctxt);
-static bool __maybe_unused cons_seq_try_update(struct cons_context *ctxt)
+static bool cons_seq_try_update(struct cons_context *ctxt)
 {
 	struct console *con = ctxt->console;
 	struct cons_state state;
@@ -1086,6 +1086,123 @@ bool console_exit_unsafe(struct cons_write_context *wctxt)
 	return __console_update_unsafe(wctxt, false);
 }
 
+/**
+ * cons_get_record - Fill the buffer with the next pending ringbuffer record
+ * @wctxt:	The write context which will be handed to the write function
+ *
+ * Returns:	True if there are records available. If the next record should
+ *		be printed, the output buffer is filled and @wctxt->outbuf
+ *		points to the text to print. If @wctxt->outbuf is NULL after
+ *		the call, the record should not be printed but the caller must
+ *		still update the console sequence number.
+ *
+ *		False means that there are no pending records anymore and the
+ *		printing can stop.
+ */
+static bool cons_get_record(struct cons_write_context *wctxt)
+{
+	struct cons_context *ctxt = &ACCESS_PRIVATE(wctxt, ctxt);
+	struct console *con = ctxt->console;
+	bool is_extended = console_srcu_read_flags(con) & CON_EXTENDED;
+	struct printk_message pmsg = {
+		.pbufs = ctxt->pbufs,
+	};
+
+	if (!printk_get_next_message(&pmsg, ctxt->newseq, is_extended, true))
+		return false;
+
+	ctxt->newseq = pmsg.seq;
+	ctxt->dropped += pmsg.dropped;
+
+	if (pmsg.outbuf_len == 0) {
+		wctxt->outbuf = NULL;
+	} else {
+		if (ctxt->dropped && !is_extended)
+			console_prepend_dropped(&pmsg, ctxt->dropped);
+		wctxt->outbuf = &pmsg.pbufs->outbuf[0];
+	}
+
+	wctxt->len = pmsg.outbuf_len;
+
+	return true;
+}
+
+/**
+ * cons_emit_record - Emit record in the acquired context
+ * @wctxt:	The write context that will be handed to the write function
+ *
+ * Returns:	False if the operation was aborted (takeover or handover).
+ *		True otherwise
+ *
+ * When false is returned, the caller is not allowed to touch console state.
+ * The console is owned by someone else. If the caller wants to print more
+ * it has to reacquire the console first.
+ *
+ * When true is returned, @wctxt->ctxt.backlog indicates whether there are
+ * still records pending in the ringbuffer,
+ */
+static int __maybe_unused cons_emit_record(struct cons_write_context *wctxt)
+{
+	struct cons_context *ctxt = &ACCESS_PRIVATE(wctxt, ctxt);
+	struct console *con = ctxt->console;
+	bool done = false;
+
+	/*
+	 * @con->dropped is not protected in case of hostile takeovers so
+	 * the update below is racy. Annotate it accordingly.
+	 */
+	ctxt->dropped = data_race(READ_ONCE(con->dropped));
+
+	/* Fill the output buffer with the next record */
+	ctxt->backlog = cons_get_record(wctxt);
+	if (!ctxt->backlog)
+		return true;
+
+	/* Safety point. Don't touch state in case of takeover */
+	if (!console_can_proceed(wctxt))
+		return false;
+
+	/* Counterpart to the read above */
+	WRITE_ONCE(con->dropped, ctxt->dropped);
+
+	/*
+	 * In case of skipped records, Update sequence state in @con.
+	 */
+	if (!wctxt->outbuf)
+		goto update;
+
+	/* Tell the driver about potential unsafe state */
+	wctxt->unsafe = ctxt->state.unsafe;
+
+	if (!ctxt->thread && con->write_atomic) {
+		done = con->write_atomic(con, wctxt);
+	} else {
+		cons_release(ctxt);
+		WARN_ON_ONCE(1);
+		return false;
+	}
+
+	/* If not done, the write was aborted due to takeover */
+	if (!done)
+		return false;
+
+	/* If there was a dropped message, it has now been output. */
+	if (ctxt->dropped) {
+		ctxt->dropped = 0;
+		/* Counterpart to the read above */
+		WRITE_ONCE(con->dropped, ctxt->dropped);
+	}
+update:
+	ctxt->newseq++;
+	/*
+	 * The sequence update attempt is not part of console_release()
+	 * because in panic situations the console is not released by
+	 * the panic CPU until all records are written. On 32bit the
+	 * sequence is separate from state anyway.
+	 */
+	return cons_seq_try_update(ctxt);
+}
+
 /**
  * cons_nobkl_init - Initialize the NOBKL console specific data
  * @con:	Console to initialize
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH printk v1 11/18] printk: nobkl: Introduce printer threads
  2023-03-02 19:56 [PATCH printk v1 00/18] threaded/atomic console support John Ogness
                   ` (9 preceding siblings ...)
  2023-03-02 19:56 ` [PATCH printk v1 10/18] printk: nobkl: Add emit function and callback functions for atomic printing John Ogness
@ 2023-03-02 19:56 ` John Ogness
  2023-03-03  1:23   ` kernel test robot
                     ` (5 more replies)
  2023-03-02 19:56 ` [PATCH printk v1 12/18] printk: nobkl: Add printer thread wakeups John Ogness
                   ` (8 subsequent siblings)
  19 siblings, 6 replies; 92+ messages in thread
From: John Ogness @ 2023-03-02 19:56 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman

From: Thomas Gleixner <tglx@linutronix.de>

Add the infrastructure to create a printer thread per console along
with the required thread function, which is takeover/handover aware.

Co-developed-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Thomas Gleixner (Intel) <tglx@linutronix.de>
---
 include/linux/console.h      |  11 ++
 kernel/printk/internal.h     |  54 ++++++++
 kernel/printk/printk.c       |  52 ++-----
 kernel/printk/printk_nobkl.c | 259 ++++++++++++++++++++++++++++++++++-
 4 files changed, 336 insertions(+), 40 deletions(-)

diff --git a/include/linux/console.h b/include/linux/console.h
index 15f71ccfcd9d..2c120c3f3c6e 100644
--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -17,6 +17,7 @@
 #include <linux/atomic.h>
 #include <linux/bits.h>
 #include <linux/rculist.h>
+#include <linux/rcuwait.h>
 #include <linux/types.h>
 
 struct vc_data;
@@ -314,7 +315,12 @@ struct cons_context_data;
  * @atomic_state:	State array for NOBKL consoles; real and handover
  * @atomic_seq:		Sequence for record tracking (32bit only)
  * @thread_pbufs:	Pointer to thread private buffer
+ * @kthread:		Pointer to kernel thread
+ * @rcuwait:		RCU wait for the kernel thread
+ * @kthread_waiting:	Indicator whether the kthread is waiting to be woken
  * @write_atomic:	Write callback for atomic context
+ * @write_thread:	Write callback for printk threaded printing
+ * @port_lock:		Callback to lock/unlock the port lock
  * @pcpu_data:		Pointer to percpu context data
  */
 struct console {
@@ -342,8 +348,13 @@ struct console {
 	atomic_t		__private atomic_seq;
 #endif
 	struct printk_buffers	*thread_pbufs;
+	struct task_struct	*kthread;
+	struct rcuwait		rcuwait;
+	atomic_t		kthread_waiting;
 
 	bool (*write_atomic)(struct console *con, struct cons_write_context *wctxt);
+	bool (*write_thread)(struct console *con, struct cons_write_context *wctxt);
+	void (*port_lock)(struct console *con, bool do_lock, unsigned long *flags);
 
 	struct cons_context_data	__percpu *pcpu_data;
 };
diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h
index 13dd0ce23c37..8856beed65da 100644
--- a/kernel/printk/internal.h
+++ b/kernel/printk/internal.h
@@ -44,6 +44,8 @@ enum printk_info_flags {
 
 extern struct printk_ringbuffer *prb;
 
+extern bool have_boot_console;
+
 __printf(4, 0)
 int vprintk_store(int facility, int level,
 		  const struct dev_printk_info *dev_info,
@@ -75,6 +77,55 @@ u64 cons_read_seq(struct console *con);
 void cons_nobkl_cleanup(struct console *con);
 bool cons_nobkl_init(struct console *con);
 bool cons_alloc_percpu_data(struct console *con);
+void cons_kthread_create(struct console *con);
+
+/*
+ * Check if the given console is currently capable and allowed to print
+ * records. If the caller only works with certain types of consoles, the
+ * caller is responsible for checking the console type before calling
+ * this function.
+ */
+static inline bool console_is_usable(struct console *con, short flags)
+{
+	if (!(flags & CON_ENABLED))
+		return false;
+
+	if ((flags & CON_SUSPENDED))
+		return false;
+
+	/*
+	 * The usability of a console varies depending on whether
+	 * it is a NOBKL console or not.
+	 */
+
+	if (flags & CON_NO_BKL) {
+		if (have_boot_console)
+			return false;
+
+	} else {
+		if (!con->write)
+			return false;
+		/*
+		 * Console drivers may assume that per-cpu resources have
+		 * been allocated. So unless they're explicitly marked as
+		 * being able to cope (CON_ANYTIME) don't call them until
+		 * this CPU is officially up.
+		 */
+		if (!cpu_online(raw_smp_processor_id()) && !(flags & CON_ANYTIME))
+			return false;
+	}
+
+	return true;
+}
+
+/**
+ * cons_kthread_wake - Wake up a printk thread
+ * @con:        Console to operate on
+ */
+static inline void cons_kthread_wake(struct console *con)
+{
+	rcuwait_wake_up(&con->rcuwait);
+}
 
 #else
 
@@ -82,6 +133,9 @@ bool cons_alloc_percpu_data(struct console *con);
 #define PRINTK_MESSAGE_MAX	0
 #define PRINTKRB_RECORD_MAX	0
 
+static inline void cons_kthread_wake(struct console *con) { }
+static inline void cons_kthread_create(struct console *con) { }
+
 /*
  * In !PRINTK builds we still export console_sem
  * semaphore and some of console functions (console_unlock()/etc.), so
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index eab0358baa6f..4c6abb033ec1 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2723,45 +2723,6 @@ static bool abandon_console_lock_in_panic(void)
 	return atomic_read(&panic_cpu) != raw_smp_processor_id();
 }
 
-/*
- * Check if the given console is currently capable and allowed to print
- * records. If the caller only works with certain types of consoles, the
- * caller is responsible for checking the console type before calling
- * this function.
- */
-static inline bool console_is_usable(struct console *con, short flags)
-{
-	if (!(flags & CON_ENABLED))
-		return false;
-
-	if ((flags & CON_SUSPENDED))
-		return false;
-
-	/*
-	 * The usability of a console varies depending on whether
-	 * it is a NOBKL console or not.
-	 */
-
-	if (flags & CON_NO_BKL) {
-		if (have_boot_console)
-			return false;
-
-	} else {
-		if (!con->write)
-			return false;
-		/*
-		 * Console drivers may assume that per-cpu resources have
-		 * been allocated. So unless they're explicitly marked as
-		 * being able to cope (CON_ANYTIME) don't call them until
-		 * this CPU is officially up.
-		 */
-		if (!cpu_online(raw_smp_processor_id()) && !(flags & CON_ANYTIME))
-			return false;
-	}
-
-	return true;
-}
-
 static void __console_unlock(void)
 {
 	console_locked = 0;
@@ -3573,10 +3534,14 @@ EXPORT_SYMBOL(register_console);
 /* Must be called under console_list_lock(). */
 static int unregister_console_locked(struct console *console)
 {
+	struct console *c;
+	bool is_boot_con;
 	int res;
 
 	lockdep_assert_console_list_lock_held();
 
+	is_boot_con = console->flags & CON_BOOT;
+
 	con_printk(KERN_INFO, console, "disabled\n");
 
 	res = _braille_unregister_console(console);
@@ -3620,6 +3585,15 @@ static int unregister_console_locked(struct console *console)
 	if (console->exit)
 		res = console->exit(console);
 
+	/*
+	 * Each time a boot console unregisters, try to start up the printing
+	 * threads. They will only start if this was the last boot console.
+	 */
+	if (is_boot_con) {
+		for_each_console(c)
+			cons_kthread_create(c);
+	}
+
 	return res;
 }
 
diff --git a/kernel/printk/printk_nobkl.c b/kernel/printk/printk_nobkl.c
index 5c591bced1be..bc3b69223897 100644
--- a/kernel/printk/printk_nobkl.c
+++ b/kernel/printk/printk_nobkl.c
@@ -5,6 +5,8 @@
 #include <linux/kernel.h>
 #include <linux/console.h>
 #include <linux/delay.h>
+#include <linux/kthread.h>
+#include <linux/slab.h>
 #include "printk_ringbuffer.h"
 #include "internal.h"
 /*
@@ -700,6 +702,7 @@ static bool __cons_try_acquire(struct cons_context *ctxt)
 	/* Set up the new state for takeover */
 	copy_full_state(new, old);
 	new.locked = 1;
+	new.thread = ctxt->thread;
 	new.cur_prio = ctxt->prio;
 	new.req_prio = CONS_PRIO_NONE;
 	new.cpu = cpu;
@@ -714,6 +717,14 @@ static bool __cons_try_acquire(struct cons_context *ctxt)
 		goto success;
 	}
 
+	/*
+	 * A threaded printer context will never spin or perform a
+	 * hostile takeover. The atomic writer will wake the thread
+	 * when it is done with the important output.
+	 */
+	if (ctxt->thread)
+		return false;
+
 	/*
 	 * If the active context is on the same CPU then there is
 	 * obviously no handshake possible.
@@ -871,6 +882,9 @@ static bool __cons_release(struct cons_context *ctxt)
 	return true;
 }
 
+static bool printk_threads_enabled __ro_after_init;
+static bool printk_force_atomic __initdata;
+
 /**
  * cons_release - Release the console after output is done
  * @ctxt:	The acquire context that contains the state
@@ -1141,7 +1155,7 @@ static bool cons_get_record(struct cons_write_context *wctxt)
  * When true is returned, @wctxt->ctxt.backlog indicates whether there are
  * still records pending in the ringbuffer,
  */
-static int __maybe_unused cons_emit_record(struct cons_write_context *wctxt)
+static bool cons_emit_record(struct cons_write_context *wctxt)
 {
 	struct cons_context *ctxt = &ACCESS_PRIVATE(wctxt, ctxt);
 	struct console *con = ctxt->console;
@@ -1176,6 +1190,8 @@ static int __maybe_unused cons_emit_record(struct cons_write_context *wctxt)
 
 	if (!ctxt->thread && con->write_atomic) {
 		done = con->write_atomic(con, wctxt);
+	} else if (ctxt->thread && con->write_thread) {
+		done = con->write_thread(con, wctxt);
 	} else {
 		cons_release(ctxt);
 		WARN_ON_ONCE(1);
@@ -1203,6 +1219,243 @@ static int __maybe_unused cons_emit_record(struct cons_write_context *wctxt)
 	return cons_seq_try_update(ctxt);
 }
 
+/**
+ * cons_kthread_should_wakeup - Check whether the printk thread should wakeup
+ * @con:	Console to operate on
+ * @ctxt:	The acquire context that contains the state
+ *		at console_acquire()
+ *
+ * Returns: True if the thread should shutdown or if the console is allowed to
+ * print and a record is available. False otherwise
+ *
+ * After the thread wakes up, it must first check if it should shutdown before
+ * attempting any printing.
+ */
+static bool cons_kthread_should_wakeup(struct console *con, struct cons_context *ctxt)
+{
+	bool is_usable;
+	short flags;
+	int cookie;
+
+	if (kthread_should_stop())
+		return true;
+
+	cookie = console_srcu_read_lock();
+	flags = console_srcu_read_flags(con);
+	is_usable = console_is_usable(con, flags);
+	console_srcu_read_unlock(cookie);
+
+	if (!is_usable)
+		return false;
+
+	/* This reads state and sequence on 64bit. On 32bit only state */
+	cons_state_read(con, CON_STATE_CUR, &ctxt->state);
+
+	/*
+	 * Atomic printing is running on some other CPU. The owner
+	 * will wake the console thread on unlock if necessary.
+	 */
+	if (ctxt->state.locked)
+		return false;
+
+	/* Bring the sequence in @ctxt up to date */
+	cons_context_set_seq(ctxt);
+
+	return prb_read_valid(prb, ctxt->oldseq, NULL);
+}
+
+/**
+ * cons_kthread_func - The printk thread function
+ * @__console:	Console to operate on
+ */
+static int cons_kthread_func(void *__console)
+{
+	struct console *con = __console;
+	struct cons_write_context wctxt = {
+		.ctxt.console	= con,
+		.ctxt.prio	= CONS_PRIO_NORMAL,
+		.ctxt.thread	= 1,
+	};
+	struct cons_context *ctxt = &ACCESS_PRIVATE(&wctxt, ctxt);
+	unsigned long flags;
+	short con_flags;
+	bool backlog;
+	int cookie;
+	int ret;
+
+	for (;;) {
+		atomic_inc(&con->kthread_waiting);
+
+		/*
+		 * Provides a full memory barrier vs. cons_kthread_wake().
+		 */
+		ret = rcuwait_wait_event(&con->rcuwait,
+					 cons_kthread_should_wakeup(con, ctxt),
+					 TASK_INTERRUPTIBLE);
+
+		atomic_dec(&con->kthread_waiting);
+
+		if (kthread_should_stop())
+			break;
+
+		/* Wait was interrupted by a spurious signal, go back to sleep */
+		if (ret)
+			continue;
+
+		for (;;) {
+			cookie = console_srcu_read_lock();
+
+			/*
+			 * Ensure this stays on the CPU to make handover and
+			 * takeover possible.
+			 */
+			if (con->port_lock)
+				con->port_lock(con, true, &flags);
+			else
+				migrate_disable();
+
+			/*
+			 * Try to acquire the console without attempting to
+			 * take over. If an atomic printer wants to hand
+			 * back to the thread it simply wakes it up.
+			 */
+			if (!cons_try_acquire(ctxt))
+				break;
+
+			con_flags = console_srcu_read_flags(con);
+
+			if (console_is_usable(con, con_flags)) {
+				/*
+				 * If the emit fails, this context is no
+				 * longer the owner. Abort the processing and
+				 * wait for new records to print.
+				 */
+				if (!cons_emit_record(&wctxt))
+					break;
+				backlog = ctxt->backlog;
+			} else {
+				backlog = false;
+			}
+
+			/*
+			 * If the release fails, this context was not the
+			 * owner. Abort the processing and wait for new
+			 * records to print.
+			 */
+			if (!cons_release(ctxt))
+				break;
+
+			/* Backlog done? */
+			if (!backlog)
+				break;
+
+			if (con->port_lock)
+				con->port_lock(con, false, &flags);
+			else
+				migrate_enable();
+
+			console_srcu_read_unlock(cookie);
+
+			cond_resched();
+		}
+		if (con->port_lock)
+			con->port_lock(con, false, &flags);
+		else
+			migrate_enable();
+
+		console_srcu_read_unlock(cookie);
+	}
+	return 0;
+}
+
+/**
+ * cons_kthread_stop - Stop a printk thread
+ * @con:	Console to operate on
+ */
+static void cons_kthread_stop(struct console *con)
+{
+	lockdep_assert_console_list_lock_held();
+
+	if (!con->kthread)
+		return;
+
+	kthread_stop(con->kthread);
+	con->kthread = NULL;
+
+	kfree(con->thread_pbufs);
+	con->thread_pbufs = NULL;
+}
+
+/**
+ * cons_kthread_create - Create a printk thread
+ * @con:	Console to operate on
+ *
+ * If it fails, let the console proceed. The atomic part might
+ * be usable and useful.
+ */
+void cons_kthread_create(struct console *con)
+{
+	struct task_struct *kt;
+	struct console *c;
+
+	lockdep_assert_console_list_lock_held();
+
+	if (!(con->flags & CON_NO_BKL) || !con->write_thread)
+		return;
+
+	if (!printk_threads_enabled || con->kthread)
+		return;
+
+	/*
+	 * Printer threads cannot be started as long as any boot console is
+	 * registered because there is no way to synchronize the hardware
+	 * registers between boot console code and regular console code.
+	 */
+	for_each_console(c) {
+		if (c->flags & CON_BOOT)
+			return;
+	}
+	have_boot_console = false;
+
+	con->thread_pbufs = kmalloc(sizeof(*con->thread_pbufs), GFP_KERNEL);
+	if (!con->thread_pbufs) {
+		con_printk(KERN_ERR, con, "failed to allocate printing thread buffers\n");
+		return;
+	}
+
+	kt = kthread_run(cons_kthread_func, con, "pr/%s%d", con->name, con->index);
+	if (IS_ERR(kt)) {
+		con_printk(KERN_ERR, con, "failed to start printing thread\n");
+		kfree(con->thread_pbufs);
+		con->thread_pbufs = NULL;
+		return;
+	}
+
+	con->kthread = kt;
+
+	/*
+	 * It is important that console printing threads are scheduled
+	 * shortly after a printk call and with generous runtime budgets.
+	 */
+	sched_set_normal(con->kthread, -20);
+}
+
+static int __init printk_setup_threads(void)
+{
+	struct console *con;
+
+	if (printk_force_atomic)
+		return 0;
+
+	console_list_lock();
+	printk_threads_enabled = true;
+	for_each_console(con)
+		cons_kthread_create(con);
+	console_list_unlock();
+	return 0;
+}
+early_initcall(printk_setup_threads);
+
 /**
  * cons_nobkl_init - Initialize the NOBKL console specific data
  * @con:	Console to initialize
@@ -1216,9 +1469,12 @@ bool cons_nobkl_init(struct console *con)
 	if (!cons_alloc_percpu_data(con))
 		return false;
 
+	rcuwait_init(&con->rcuwait);
+	atomic_set(&con->kthread_waiting, 0);
 	cons_state_set(con, CON_STATE_CUR, &state);
 	cons_state_set(con, CON_STATE_REQ, &state);
 	cons_seq_init(con);
+	cons_kthread_create(con);
 	return true;
 }
 
@@ -1230,6 +1486,7 @@ void cons_nobkl_cleanup(struct console *con)
 {
 	struct cons_state state = { };
 
+	cons_kthread_stop(con);
 	cons_state_set(con, CON_STATE_CUR, &state);
 	cons_state_set(con, CON_STATE_REQ, &state);
 	cons_free_percpu_data(con);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH printk v1 12/18] printk: nobkl: Add printer thread wakeups
  2023-03-02 19:56 [PATCH printk v1 00/18] threaded/atomic console support John Ogness
                   ` (10 preceding siblings ...)
  2023-03-02 19:56 ` [PATCH printk v1 11/18] printk: nobkl: Introduce printer threads John Ogness
@ 2023-03-02 19:56 ` John Ogness
  2023-04-12  9:38   ` Petr Mladek
  2023-03-02 19:56 ` [PATCH printk v1 13/18] printk: nobkl: Add write context storage for atomic writes John Ogness
                   ` (7 subsequent siblings)
  19 siblings, 1 reply; 92+ messages in thread
From: John Ogness @ 2023-03-02 19:56 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman

From: Thomas Gleixner <tglx@linutronix.de>

Add a function to wakeup the printer threads. Use the new function
when:

  - records are added to the printk ringbuffer
  - consoles are started
  - consoles are resumed

The actual waking is performed via irq_work so that the wakeup can
be triggered from any context.

Co-developed-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Thomas Gleixner (Intel) <tglx@linutronix.de>
---
 include/linux/console.h      |  3 +++
 kernel/printk/internal.h     |  1 +
 kernel/printk/printk.c       | 26 ++++++++++++++++++++++++++
 kernel/printk/printk_nobkl.c | 32 ++++++++++++++++++++++++++++++++
 4 files changed, 62 insertions(+)

diff --git a/include/linux/console.h b/include/linux/console.h
index 2c120c3f3c6e..710f1e72cd0f 100644
--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -16,6 +16,7 @@
 
 #include <linux/atomic.h>
 #include <linux/bits.h>
+#include <linux/irq_work.h>
 #include <linux/rculist.h>
 #include <linux/rcuwait.h>
 #include <linux/types.h>
@@ -317,6 +318,7 @@ struct cons_context_data;
  * @thread_pbufs:	Pointer to thread private buffer
  * @kthread:		Pointer to kernel thread
  * @rcuwait:		RCU wait for the kernel thread
+ * @irq_work:		IRQ work for thread wakeup
  * @kthread_waiting:	Indicator whether the kthread is waiting to be woken
  * @write_atomic:	Write callback for atomic context
  * @write_thread:	Write callback for printk threaded printing
@@ -350,6 +352,7 @@ struct console {
 	struct printk_buffers	*thread_pbufs;
 	struct task_struct	*kthread;
 	struct rcuwait		rcuwait;
+	struct irq_work		irq_work;
 	atomic_t		kthread_waiting;
 
 	bool (*write_atomic)(struct console *con, struct cons_write_context *wctxt);
diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h
index 8856beed65da..a72402c1ac93 100644
--- a/kernel/printk/internal.h
+++ b/kernel/printk/internal.h
@@ -78,6 +78,7 @@ void cons_nobkl_cleanup(struct console *con);
 bool cons_nobkl_init(struct console *con);
 bool cons_alloc_percpu_data(struct console *con);
 void cons_kthread_create(struct console *con);
+void cons_wake_threads(void);
 
 /*
  * Check if the given console is currently capable and allowed to print
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 4c6abb033ec1..19f682fcae10 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2345,6 +2345,7 @@ asmlinkage int vprintk_emit(int facility, int level,
 		preempt_enable();
 	}
 
+	cons_wake_threads();
 	if (in_sched)
 		defer_console_output();
 	else
@@ -2615,6 +2616,8 @@ void suspend_console(void)
 void resume_console(void)
 {
 	struct console *con;
+	short flags;
+	int cookie;
 
 	if (!console_suspend_enabled)
 		return;
@@ -2634,6 +2637,14 @@ void resume_console(void)
 	 */
 	synchronize_srcu(&console_srcu);
 
+	cookie = console_srcu_read_lock();
+	for_each_console_srcu(con) {
+		flags = console_srcu_read_flags(con);
+		if (flags & CON_NO_BKL)
+			cons_kthread_wake(con);
+	}
+	console_srcu_read_unlock(cookie);
+
 	pr_flush(1000, true);
 }
 
@@ -3226,9 +3237,23 @@ EXPORT_SYMBOL(console_stop);
 
 void console_start(struct console *console)
 {
+	short flags;
+
 	console_list_lock();
 	console_srcu_write_flags(console, console->flags | CON_ENABLED);
+	flags = console->flags;
 	console_list_unlock();
+
+	/*
+	 * Ensure that all SRCU list walks have completed. The related
+	 * printing context must be able to see it is enabled so that
+	 * it is guaranteed to wake up and resume printing.
+	 */
+	synchronize_srcu(&console_srcu);
+
+	if (flags & CON_NO_BKL)
+		cons_kthread_wake(console);
+
 	__pr_flush(console, 1000, true);
 }
 EXPORT_SYMBOL(console_start);
@@ -3918,6 +3943,7 @@ void defer_console_output(void)
 
 void printk_trigger_flush(void)
 {
+	cons_wake_threads();
 	defer_console_output();
 }
 
diff --git a/kernel/printk/printk_nobkl.c b/kernel/printk/printk_nobkl.c
index bc3b69223897..890fc8d44f1d 100644
--- a/kernel/printk/printk_nobkl.c
+++ b/kernel/printk/printk_nobkl.c
@@ -1368,6 +1368,37 @@ static int cons_kthread_func(void *__console)
 	return 0;
 }
 
+/**
+ * cons_irq_work - irq work to wake printk thread
+ * @irq_work:	The irq work to operate on
+ */
+static void cons_irq_work(struct irq_work *irq_work)
+{
+	struct console *con = container_of(irq_work, struct console, irq_work);
+
+	cons_kthread_wake(con);
+}
+
+/**
+ * cons_wake_threads - Wake up printing threads
+ *
+ * A printing thread is only woken if it is within the @kthread_waiting
+ * block. If it is not within the block (or enters the block later), it
+ * will see any new records and continue printing on its own.
+ */
+void cons_wake_threads(void)
+{
+	struct console *con;
+	int cookie;
+
+	cookie = console_srcu_read_lock();
+	for_each_console_srcu(con) {
+		if (con->kthread && atomic_read(&con->kthread_waiting))
+			irq_work_queue(&con->irq_work);
+	}
+	console_srcu_read_unlock(cookie);
+}
+
 /**
  * cons_kthread_stop - Stop a printk thread
  * @con:	Console to operate on
@@ -1471,6 +1502,7 @@ bool cons_nobkl_init(struct console *con)
 
 	rcuwait_init(&con->rcuwait);
 	atomic_set(&con->kthread_waiting, 0);
+	init_irq_work(&con->irq_work, cons_irq_work);
 	cons_state_set(con, CON_STATE_CUR, &state);
 	cons_state_set(con, CON_STATE_REQ, &state);
 	cons_seq_init(con);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH printk v1 13/18] printk: nobkl: Add write context storage for atomic writes
  2023-03-02 19:56 [PATCH printk v1 00/18] threaded/atomic console support John Ogness
                   ` (11 preceding siblings ...)
  2023-03-02 19:56 ` [PATCH printk v1 12/18] printk: nobkl: Add printer thread wakeups John Ogness
@ 2023-03-02 19:56 ` John Ogness
  2023-03-02 19:56 ` [PATCH printk v1 14/18] printk: nobkl: Provide functions for atomic write enforcement John Ogness
                   ` (6 subsequent siblings)
  19 siblings, 0 replies; 92+ messages in thread
From: John Ogness @ 2023-03-02 19:56 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman

From: Thomas Gleixner <tglx@linutronix.de>

The number of consoles is unknown at compile time and allocating
write contexts on stack in emergency/panic situations is not desired
either.

Allocate a write context array (one for each priority level) along
with the per CPU output buffers, thus allowing atomic contexts on
multiple CPUs and priority levels to execute simultaneously without
clobbering each other's write context.

Co-developed-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Thomas Gleixner (Intel) <tglx@linutronix.de>
---
 include/linux/console.h  | 2 ++
 kernel/printk/internal.h | 5 +++++
 2 files changed, 7 insertions(+)

diff --git a/include/linux/console.h b/include/linux/console.h
index 710f1e72cd0f..089a94a3dd8d 100644
--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -222,6 +222,7 @@ struct cons_state {
  * @CONS_PRIO_NORMAL:		Regular printk
  * @CONS_PRIO_EMERGENCY:	Emergency output (WARN/OOPS...)
  * @CONS_PRIO_PANIC:		Panic output
+ * @CONS_PRIO_MAX:		The number of priority levels
  *
  * Emergency output can carefully takeover the console even without consent
  * of the owner, ideally only when @cons_state::unsafe is not set. Panic
@@ -234,6 +235,7 @@ enum cons_prio {
 	CONS_PRIO_NORMAL,
 	CONS_PRIO_EMERGENCY,
 	CONS_PRIO_PANIC,
+	CONS_PRIO_MAX,
 };
 
 struct console;
diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h
index a72402c1ac93..a417e3992b7a 100644
--- a/kernel/printk/internal.h
+++ b/kernel/printk/internal.h
@@ -181,11 +181,16 @@ struct printk_message {
 
 /**
  * struct cons_context_data - console context data
+ * @wctxt:		Write context per priority level
  * @pbufs:		Buffer for storing the text
  *
  * Used for early boot and for per CPU data.
+ *
+ * The write contexts are allocated to avoid having them on stack, e.g. in
+ * warn() or panic().
  */
 struct cons_context_data {
+	struct cons_write_context	wctxt[CONS_PRIO_MAX];
 	struct printk_buffers		pbufs;
 };
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH printk v1 14/18] printk: nobkl: Provide functions for atomic write enforcement
  2023-03-02 19:56 [PATCH printk v1 00/18] threaded/atomic console support John Ogness
                   ` (12 preceding siblings ...)
  2023-03-02 19:56 ` [PATCH printk v1 13/18] printk: nobkl: Add write context storage for atomic writes John Ogness
@ 2023-03-02 19:56 ` John Ogness
  2023-04-12 14:53   ` Petr Mladek
  2023-03-02 19:56 ` [PATCH printk v1 15/18] printk: nobkl: Stop threads on shutdown/reboot John Ogness
                   ` (5 subsequent siblings)
  19 siblings, 1 reply; 92+ messages in thread
From: John Ogness @ 2023-03-02 19:56 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman

From: Thomas Gleixner <tglx@linutronix.de>

Threaded printk is the preferred mechanism to tame the noisyness of
printk, but WARN/OOPS/PANIC require printing out immediately since
the printer threads might not be able to run.

Add per CPU state to denote the priority/urgency of the output and
provide functions to flush the printk backlog for priority elevated
contexts and when the printing threads are not available (such as
early boot).

Note that when a CPU is in a priority elevated state, flushing only
occurs when dropping back to a lower priority. This allows the full
set of printk records (WARN/OOPS/PANIC output) to be stored in the
ringbuffer before beginning to flush the backlog.

Co-developed-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Thomas Gleixner (Intel) <tglx@linutronix.de>
---
 include/linux/console.h      |   8 ++
 include/linux/printk.h       |   9 ++
 kernel/printk/printk.c       |  35 +++--
 kernel/printk/printk_nobkl.c | 240 +++++++++++++++++++++++++++++++++++
 4 files changed, 283 insertions(+), 9 deletions(-)

diff --git a/include/linux/console.h b/include/linux/console.h
index 089a94a3dd8d..afc683e722bb 100644
--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -494,6 +494,14 @@ extern bool console_exit_unsafe(struct cons_write_context *wctxt);
 extern bool console_try_acquire(struct cons_write_context *wctxt);
 extern bool console_release(struct cons_write_context *wctxt);
 
+#ifdef CONFIG_PRINTK
+extern enum cons_prio cons_atomic_enter(enum cons_prio prio);
+extern void cons_atomic_exit(enum cons_prio prio, enum cons_prio prev_prio);
+#else
+static inline enum cons_prio cons_atomic_enter(enum cons_prio prio) { return CONS_PRIO_NONE; }
+static inline void cons_atomic_exit(enum cons_prio prio, enum cons_prio prev_prio) { }
+#endif
+
 extern int console_set_on_cmdline;
 extern struct console *early_console;
 
diff --git a/include/linux/printk.h b/include/linux/printk.h
index 8ef499ab3c1e..d2aafc79b611 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -139,6 +139,7 @@ void early_printk(const char *s, ...) { }
 #endif
 
 struct dev_printk_info;
+struct cons_write_context;
 
 #ifdef CONFIG_PRINTK
 asmlinkage __printf(4, 0)
@@ -192,6 +193,8 @@ void show_regs_print_info(const char *log_lvl);
 extern asmlinkage void dump_stack_lvl(const char *log_lvl) __cold;
 extern asmlinkage void dump_stack(void) __cold;
 void printk_trigger_flush(void);
+extern void cons_atomic_flush(struct cons_write_context *printk_caller_wctxt,
+			      bool skip_unsafe);
 #else
 static inline __printf(1, 0)
 int vprintk(const char *s, va_list args)
@@ -271,6 +274,12 @@ static inline void dump_stack(void)
 static inline void printk_trigger_flush(void)
 {
 }
+
+static inline void cons_atomic_flush(struct cons_write_context *printk_caller_wctxt,
+				     bool skip_unsafe)
+{
+}
+
 #endif
 
 #ifdef CONFIG_SMP
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 19f682fcae10..015c240f9f04 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2304,6 +2304,7 @@ asmlinkage int vprintk_emit(int facility, int level,
 			    const struct dev_printk_info *dev_info,
 			    const char *fmt, va_list args)
 {
+	struct cons_write_context wctxt = { };
 	int printed_len;
 	bool in_sched = false;
 
@@ -2324,16 +2325,25 @@ asmlinkage int vprintk_emit(int facility, int level,
 
 	printed_len = vprintk_store(facility, level, dev_info, fmt, args);
 
+	/*
+	 * The caller may be holding system-critical or
+	 * timing-sensitive locks. Disable preemption during
+	 * printing of all remaining records to all consoles so that
+	 * this context can return as soon as possible. Hopefully
+	 * another printk() caller will take over the printing.
+	 */
+	preempt_disable();
+
+	/*
+	 * Flush the non-BKL consoles. This only leads to direct atomic
+	 * printing for non-BKL consoles that do not have a printer
+	 * thread available. Otherwise the printer thread will perform
+	 * the printing.
+	 */
+	cons_atomic_flush(&wctxt, true);
+
 	/* If called from the scheduler, we can not call up(). */
 	if (!in_sched && have_bkl_console) {
-		/*
-		 * The caller may be holding system-critical or
-		 * timing-sensitive locks. Disable preemption during
-		 * printing of all remaining records to all consoles so that
-		 * this context can return as soon as possible. Hopefully
-		 * another printk() caller will take over the printing.
-		 */
-		preempt_disable();
 		/*
 		 * Try to acquire and then immediately release the console
 		 * semaphore. The release will print out buffers. With the
@@ -2342,9 +2352,10 @@ asmlinkage int vprintk_emit(int facility, int level,
 		 */
 		if (console_trylock_spinning())
 			console_unlock();
-		preempt_enable();
 	}
 
+	preempt_enable();
+
 	cons_wake_threads();
 	if (in_sched)
 		defer_console_output();
@@ -3943,6 +3954,12 @@ void defer_console_output(void)
 
 void printk_trigger_flush(void)
 {
+	struct cons_write_context wctxt = { };
+
+	preempt_disable();
+	cons_atomic_flush(&wctxt, true);
+	preempt_enable();
+
 	cons_wake_threads();
 	defer_console_output();
 }
diff --git a/kernel/printk/printk_nobkl.c b/kernel/printk/printk_nobkl.c
index 890fc8d44f1d..001a1ca9793f 100644
--- a/kernel/printk/printk_nobkl.c
+++ b/kernel/printk/printk_nobkl.c
@@ -1399,6 +1399,246 @@ void cons_wake_threads(void)
 	console_srcu_read_unlock(cookie);
 }
 
+/**
+ * struct cons_cpu_state - Per CPU printk context state
+ * @prio:	The current context priority level
+ * @nesting:	Per priority nest counter
+ */
+struct cons_cpu_state {
+	enum cons_prio	prio;
+	int		nesting[CONS_PRIO_MAX];
+};
+
+static DEFINE_PER_CPU(struct cons_cpu_state, cons_pcpu_state);
+static struct cons_cpu_state early_cons_pcpu_state __initdata;
+
+/**
+ * cons_get_cpu_state - Get the per CPU console state pointer
+ *
+ * Returns either a pointer to the per CPU state of the current CPU or to
+ * the init data state during early boot.
+ */
+static __ref struct cons_cpu_state *cons_get_cpu_state(void)
+{
+	if (!printk_percpu_data_ready())
+		return &early_cons_pcpu_state;
+
+	return this_cpu_ptr(&cons_pcpu_state);
+}
+
+/**
+ * cons_get_wctxt - Get the write context for atomic printing
+ * @con:	Console to operate on
+ * @prio:	Priority of the context
+ *
+ * Returns either the per CPU context or the builtin context for
+ * early boot.
+ */
+static __ref struct cons_write_context *cons_get_wctxt(struct console *con,
+						       enum cons_prio prio)
+{
+	if (!con->pcpu_data)
+		return &early_cons_ctxt_data.wctxt[prio];
+
+	return &this_cpu_ptr(con->pcpu_data)->wctxt[prio];
+}
+
+/**
+ * cons_atomic_try_acquire - Try to acquire the console for atomic printing
+ * @con:	The console to acquire
+ * @ctxt:	The console context instance to work on
+ * @prio:	The priority of the current context
+ */
+static bool cons_atomic_try_acquire(struct console *con, struct cons_context *ctxt,
+				    enum cons_prio prio, bool skip_unsafe)
+{
+	memset(ctxt, 0, sizeof(*ctxt));
+	ctxt->console		= con;
+	ctxt->spinwait_max_us	= 2000;
+	ctxt->prio		= prio;
+	ctxt->spinwait		= 1;
+
+	/* Try to acquire it directly or via a friendly handover */
+	if (cons_try_acquire(ctxt))
+		return true;
+
+	/* Investigate whether a hostile takeover is due */
+	if (ctxt->old_state.cur_prio >= prio)
+		return false;
+
+	if (!ctxt->old_state.unsafe || !skip_unsafe)
+		ctxt->hostile = 1;
+	return cons_try_acquire(ctxt);
+}
+
+/**
+ * cons_atomic_flush_con - Flush one console in atomic mode
+ * @wctxt:		The write context struct to use for this context
+ * @con:		The console to flush
+ * @prio:		The priority of the current context
+ * @skip_unsafe:	True, to avoid unsafe hostile takeovers
+ */
+static void cons_atomic_flush_con(struct cons_write_context *wctxt, struct console *con,
+				  enum cons_prio prio, bool skip_unsafe)
+{
+	struct cons_context *ctxt = &ACCESS_PRIVATE(wctxt, ctxt);
+	bool wake_thread = false;
+	short flags;
+
+	if (!cons_atomic_try_acquire(con, ctxt, prio, skip_unsafe))
+		return;
+
+	do {
+		flags = console_srcu_read_flags(con);
+
+		if (!console_is_usable(con, flags))
+			break;
+
+		/*
+		 * For normal prio messages let the printer thread handle
+		 * the printing if it is available.
+		 */
+		if (prio <= CONS_PRIO_NORMAL && con->kthread) {
+			wake_thread = true;
+			break;
+		}
+
+		/*
+		 * cons_emit_record() returns false when the console was
+		 * handed over or taken over. In both cases the context is
+		 * no longer valid.
+		 */
+		if (!cons_emit_record(wctxt))
+			return;
+	} while (ctxt->backlog);
+
+	cons_release(ctxt);
+
+	if (wake_thread && atomic_read(&con->kthread_waiting))
+		irq_work_queue(&con->irq_work);
+}
+
+/**
+ * cons_atomic_flush - Flush consoles in atomic mode if required
+ * @printk_caller_wctxt:	The write context struct to use for this
+ *				context (for printk() context only)
+ * @skip_unsafe:		True, to avoid unsafe hostile takeovers
+ */
+void cons_atomic_flush(struct cons_write_context *printk_caller_wctxt, bool skip_unsafe)
+{
+	struct cons_write_context *wctxt;
+	struct cons_cpu_state *cpu_state;
+	struct console *con;
+	short flags;
+	int cookie;
+
+	cpu_state = cons_get_cpu_state();
+
+	/*
+	 * When in an elevated priority, the printk() calls are not
+	 * individually flushed. This is to allow the full output to
+	 * be dumped to the ringbuffer before starting with printing
+	 * the backlog.
+	 */
+	if (cpu_state->prio > CONS_PRIO_NORMAL && printk_caller_wctxt)
+		return;
+
+	/*
+	 * Let the outermost write of this priority print. This avoids
+	 * nasty hackery for nested WARN() where the printing itself
+	 * generates one.
+	 *
+	 * cpu_state->prio <= CONS_PRIO_NORMAL is not subject to nesting
+	 * and can proceed in order to allow atomic printing when consoles
+	 * do not have a printer thread.
+	 */
+	if (cpu_state->prio > CONS_PRIO_NORMAL &&
+	    cpu_state->nesting[cpu_state->prio] != 1)
+		return;
+
+	cookie = console_srcu_read_lock();
+	for_each_console_srcu(con) {
+		if (!con->write_atomic)
+			continue;
+
+		flags = console_srcu_read_flags(con);
+
+		if (!console_is_usable(con, flags))
+			continue;
+
+		if (cpu_state->prio > CONS_PRIO_NORMAL || !con->kthread) {
+			if (printk_caller_wctxt)
+				wctxt = printk_caller_wctxt;
+			else
+				wctxt = cons_get_wctxt(con, cpu_state->prio);
+			cons_atomic_flush_con(wctxt, con, cpu_state->prio, skip_unsafe);
+		}
+	}
+	console_srcu_read_unlock(cookie);
+}
+
+/**
+ * cons_atomic_enter - Enter a context that enforces atomic printing
+ * @prio:	Priority of the context
+ *
+ * Returns:	The previous priority that needs to be fed into
+ *		the corresponding cons_atomic_exit()
+ */
+enum cons_prio cons_atomic_enter(enum cons_prio prio)
+{
+	struct cons_cpu_state *cpu_state;
+	enum cons_prio prev_prio;
+
+	migrate_disable();
+	cpu_state = cons_get_cpu_state();
+
+	prev_prio = cpu_state->prio;
+	if (prev_prio < prio)
+		cpu_state->prio = prio;
+
+	/*
+	 * Increment the nesting on @cpu_state->prio so a WARN()
+	 * nested into a panic printout does not attempt to
+	 * scribble state.
+	 */
+	cpu_state->nesting[cpu_state->prio]++;
+
+	return prev_prio;
+}
+
+/**
+ * cons_atomic_exit - Exit a context that enforces atomic printing
+ * @prio:	Priority of the context to leave
+ * @prev_prio:	Priority of the previous context for restore
+ *
+ * @prev_prio is the priority returned by the corresponding cons_atomic_enter().
+ */
+void cons_atomic_exit(enum cons_prio prio, enum cons_prio prev_prio)
+{
+	struct cons_cpu_state *cpu_state;
+
+	cons_atomic_flush(NULL, true);
+
+	cpu_state = cons_get_cpu_state();
+
+	if (cpu_state->prio == CONS_PRIO_PANIC)
+		cons_atomic_flush(NULL, false);
+
+	/*
+	 * Undo the nesting of cons_atomic_enter() at the CPU state
+	 * priority.
+	 */
+	cpu_state->nesting[cpu_state->prio]--;
+
+	/*
+	 * Restore the previous priority, which was returned by
+	 * cons_atomic_enter().
+	 */
+	cpu_state->prio = prev_prio;
+
+	migrate_enable();
+}
+
 /**
  * cons_kthread_stop - Stop a printk thread
  * @con:	Console to operate on
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH printk v1 15/18] printk: nobkl: Stop threads on shutdown/reboot
  2023-03-02 19:56 [PATCH printk v1 00/18] threaded/atomic console support John Ogness
                   ` (13 preceding siblings ...)
  2023-03-02 19:56 ` [PATCH printk v1 14/18] printk: nobkl: Provide functions for atomic write enforcement John Ogness
@ 2023-03-02 19:56 ` John Ogness
  2023-04-13  9:03   ` Petr Mladek
  2023-03-02 19:56 ` [PATCH printk v1 16/18] kernel/panic: Add atomic write enforcement to warn/panic John Ogness
                   ` (4 subsequent siblings)
  19 siblings, 1 reply; 92+ messages in thread
From: John Ogness @ 2023-03-02 19:56 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel

Register a syscore_ops shutdown function to stop all threaded
printers on shutdown/reboot. This allows printk to transition back
to atomic printing in order to provide a robust mechanism for
outputting the final messages.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
---
 kernel/printk/printk_nobkl.c | 31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)

diff --git a/kernel/printk/printk_nobkl.c b/kernel/printk/printk_nobkl.c
index 001a1ca9793f..53989c8f1dbc 100644
--- a/kernel/printk/printk_nobkl.c
+++ b/kernel/printk/printk_nobkl.c
@@ -7,6 +7,7 @@
 #include <linux/delay.h>
 #include <linux/kthread.h>
 #include <linux/slab.h>
+#include <linux/syscore_ops.h>
 #include "printk_ringbuffer.h"
 #include "internal.h"
 /*
@@ -1763,3 +1764,33 @@ void cons_nobkl_cleanup(struct console *con)
 	cons_state_set(con, CON_STATE_REQ, &state);
 	cons_free_percpu_data(con);
 }
+
+/**
+ * printk_kthread_shutdown - shutdown all threaded printers
+ *
+ * On system shutdown all threaded printers are stopped. This allows printk
+ * to transition back to atomic printing, thus providing a robust mechanism
+ * for the final shutdown/reboot messages to be output.
+ */
+static void printk_kthread_shutdown(void)
+{
+	struct console *con;
+
+	console_list_lock();
+	for_each_console(con) {
+		if (con->flags & CON_NO_BKL)
+			cons_kthread_stop(con);
+	}
+	console_list_unlock();
+}
+
+static struct syscore_ops printk_syscore_ops = {
+	.shutdown = printk_kthread_shutdown,
+};
+
+static int __init printk_init_ops(void)
+{
+	register_syscore_ops(&printk_syscore_ops);
+	return 0;
+}
+device_initcall(printk_init_ops);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH printk v1 16/18] kernel/panic: Add atomic write enforcement to warn/panic
  2023-03-02 19:56 [PATCH printk v1 00/18] threaded/atomic console support John Ogness
                   ` (14 preceding siblings ...)
  2023-03-02 19:56 ` [PATCH printk v1 15/18] printk: nobkl: Stop threads on shutdown/reboot John Ogness
@ 2023-03-02 19:56 ` John Ogness
  2023-04-13 10:08   ` Petr Mladek
  2023-03-02 19:56 ` [PATCH printk v1 17/18] rcu: Add atomic write enforcement for rcu stalls John Ogness
                   ` (3 subsequent siblings)
  19 siblings, 1 reply; 92+ messages in thread
From: John Ogness @ 2023-03-02 19:56 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Andrew Morton, Guilherme G. Piccoli,
	Luis Chamberlain, David Gow, Tiezhu Yang, Daniel Vetter,
	tangmeng

From: Thomas Gleixner <tglx@linutronix.de>

Invoke the atomic write enforcement functions for warn/panic to
ensure that the information gets out to the consoles.

For the panic case, add explicit intermediate atomic flush calls to
ensure immediate flushing at important points. Otherwise the atomic
flushing only occurs when dropping out of the elevated priority,
which for panic may never happen.

It is important to note that if there are any legacy consoles
registered, they will be attempting to directly print from the
printk-caller context, which may jeopardize the reliability of the
atomic consoles. Optimally there should be no legacy consoles
registered.

Co-developed-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Thomas Gleixner (Intel) <tglx@linutronix.de>
---
 kernel/panic.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/kernel/panic.c b/kernel/panic.c
index da323209f583..db9834fbdf26 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -209,6 +209,7 @@ static void panic_print_sys_info(bool console_flush)
  */
 void panic(const char *fmt, ...)
 {
+	enum cons_prio prev_prio;
 	static char buf[1024];
 	va_list args;
 	long i, i_next = 0, len;
@@ -256,6 +257,8 @@ void panic(const char *fmt, ...)
 	if (old_cpu != PANIC_CPU_INVALID && old_cpu != this_cpu)
 		panic_smp_self_stop();
 
+	prev_prio = cons_atomic_enter(CONS_PRIO_PANIC);
+
 	console_verbose();
 	bust_spinlocks(1);
 	va_start(args, fmt);
@@ -329,6 +332,8 @@ void panic(const char *fmt, ...)
 	if (_crash_kexec_post_notifiers)
 		__crash_kexec(NULL);
 
+	cons_atomic_flush(NULL, true);
+
 	console_unblank();
 
 	/*
@@ -353,6 +358,7 @@ void panic(const char *fmt, ...)
 		 * We can't use the "normal" timers since we just panicked.
 		 */
 		pr_emerg("Rebooting in %d seconds..\n", panic_timeout);
+		cons_atomic_flush(NULL, true);
 
 		for (i = 0; i < panic_timeout * 1000; i += PANIC_TIMER_STEP) {
 			touch_nmi_watchdog();
@@ -371,6 +377,7 @@ void panic(const char *fmt, ...)
 		 */
 		if (panic_reboot_mode != REBOOT_UNDEFINED)
 			reboot_mode = panic_reboot_mode;
+		cons_atomic_flush(NULL, true);
 		emergency_restart();
 	}
 #ifdef __sparc__
@@ -383,12 +390,16 @@ void panic(const char *fmt, ...)
 	}
 #endif
 #if defined(CONFIG_S390)
+	cons_atomic_flush(NULL, true);
 	disabled_wait();
 #endif
 	pr_emerg("---[ end Kernel panic - not syncing: %s ]---\n", buf);
 
 	/* Do not scroll important messages printed above */
 	suppress_printk = 1;
+
+	cons_atomic_exit(CONS_PRIO_PANIC, prev_prio);
+
 	local_irq_enable();
 	for (i = 0; ; i += PANIC_TIMER_STEP) {
 		touch_softlockup_watchdog();
@@ -599,6 +610,10 @@ struct warn_args {
 void __warn(const char *file, int line, void *caller, unsigned taint,
 	    struct pt_regs *regs, struct warn_args *args)
 {
+	enum cons_prio prev_prio;
+
+	prev_prio = cons_atomic_enter(CONS_PRIO_EMERGENCY);
+
 	disable_trace_on_warning();
 
 	if (file)
@@ -630,6 +645,8 @@ void __warn(const char *file, int line, void *caller, unsigned taint,
 
 	/* Just a warning, don't kill lockdep. */
 	add_taint(taint, LOCKDEP_STILL_OK);
+
+	cons_atomic_exit(CONS_PRIO_EMERGENCY, prev_prio);
 }
 
 #ifndef __WARN_FLAGS
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH printk v1 17/18] rcu: Add atomic write enforcement for rcu stalls
  2023-03-02 19:56 [PATCH printk v1 00/18] threaded/atomic console support John Ogness
                   ` (15 preceding siblings ...)
  2023-03-02 19:56 ` [PATCH printk v1 16/18] kernel/panic: Add atomic write enforcement to warn/panic John Ogness
@ 2023-03-02 19:56 ` John Ogness
  2023-04-13 12:10   ` Petr Mladek
  2023-03-02 19:56 ` [PATCH printk v1 18/18] printk: Perform atomic flush in console_flush_on_panic() John Ogness
                   ` (2 subsequent siblings)
  19 siblings, 1 reply; 92+ messages in thread
From: John Ogness @ 2023-03-02 19:56 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Paul E. McKenney, Frederic Weisbecker,
	Neeraj Upadhyay, Josh Triplett, Mathieu Desnoyers, Lai Jiangshan,
	Joel Fernandes, rcu

Invoke the atomic write enforcement functions for rcu stalls to
ensure that the information gets out to the consoles.

It is important to note that if there are any legacy consoles
registered, they will be attempting to directly print from the
printk-caller context, which may jeopardize the reliability of the
atomic consoles. Optimally there should be no legacy consoles
registered.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
---
 kernel/rcu/tree_stall.h | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
index 5653560573e2..25207a213e7a 100644
--- a/kernel/rcu/tree_stall.h
+++ b/kernel/rcu/tree_stall.h
@@ -8,6 +8,7 @@
  */
 
 #include <linux/kvm_para.h>
+#include <linux/console.h>
 
 //////////////////////////////////////////////////////////////////////////////
 //
@@ -551,6 +552,7 @@ static void rcu_check_gp_kthread_expired_fqs_timer(void)
 
 static void print_other_cpu_stall(unsigned long gp_seq, unsigned long gps)
 {
+	enum cons_prio prev_prio;
 	int cpu;
 	unsigned long flags;
 	unsigned long gpa;
@@ -566,6 +568,8 @@ static void print_other_cpu_stall(unsigned long gp_seq, unsigned long gps)
 	if (rcu_stall_is_suppressed())
 		return;
 
+	prev_prio = cons_atomic_enter(CONS_PRIO_EMERGENCY);
+
 	/*
 	 * OK, time to rat on our buddy...
 	 * See Documentation/RCU/stallwarn.rst for info on how to debug
@@ -620,6 +624,8 @@ static void print_other_cpu_stall(unsigned long gp_seq, unsigned long gps)
 	panic_on_rcu_stall();
 
 	rcu_force_quiescent_state();  /* Kick them all. */
+
+	cons_atomic_exit(CONS_PRIO_EMERGENCY, prev_prio);
 }
 
 static void print_cpu_stall(unsigned long gps)
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH printk v1 18/18] printk: Perform atomic flush in console_flush_on_panic()
  2023-03-02 19:56 [PATCH printk v1 00/18] threaded/atomic console support John Ogness
                   ` (16 preceding siblings ...)
  2023-03-02 19:56 ` [PATCH printk v1 17/18] rcu: Add atomic write enforcement for rcu stalls John Ogness
@ 2023-03-02 19:56 ` John Ogness
  2023-04-13 12:20   ` Petr Mladek
  2023-03-02 19:58 ` [PATCH printk v1 00/18] serial: 8250: implement non-BKL console John Ogness
  2023-03-09 10:55 ` [PATCH printk v1 00/18] threaded/atomic console support Daniel Thompson
  19 siblings, 1 reply; 92+ messages in thread
From: John Ogness @ 2023-03-02 19:56 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel

Typically the panic() function will take care of atomic flushing the
non-BKL consoles on panic. However, there are several users of
console_flush_on_panic() outside of panic().

Also perform atomic flushing in console_flush_on_panic(). A new
function cons_force_seq() is implemented to support the
mode=CONSOLE_REPLAY_ALL feature.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
---
 kernel/printk/internal.h     |  2 ++
 kernel/printk/printk.c       | 28 ++++++++++++++++++++++------
 kernel/printk/printk_nobkl.c | 24 ++++++++++++++++++++++++
 3 files changed, 48 insertions(+), 6 deletions(-)

diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h
index a417e3992b7a..f147ca386afa 100644
--- a/kernel/printk/internal.h
+++ b/kernel/printk/internal.h
@@ -79,6 +79,7 @@ bool cons_nobkl_init(struct console *con);
 bool cons_alloc_percpu_data(struct console *con);
 void cons_kthread_create(struct console *con);
 void cons_wake_threads(void);
+void cons_force_seq(struct console *con, u64 seq);
 
 /*
  * Check if the given console is currently capable and allowed to print
@@ -148,6 +149,7 @@ static inline void cons_kthread_create(struct console *con) { }
 static inline bool printk_percpu_data_ready(void) { return false; }
 static inline bool cons_nobkl_init(struct console *con) { return true; }
 static inline void cons_nobkl_cleanup(struct console *con) { }
+static inline void cons_force_seq(struct console *con, u64 seq) { }
 
 #endif /* CONFIG_PRINTK */
 
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 015c240f9f04..9a8ba8b3dca5 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -3160,6 +3160,28 @@ void console_unblank(void)
  */
 void console_flush_on_panic(enum con_flush_mode mode)
 {
+	struct console *c;
+	short flags;
+	int cookie;
+	u64 seq;
+
+	seq = prb_first_valid_seq(prb);
+
+	/*
+	 * Safely flush the atomic consoles before trying to flush any
+	 * BKL/legacy consoles.
+	 */
+	if (mode == CONSOLE_REPLAY_ALL) {
+		cookie = console_srcu_read_lock();
+		for_each_console_srcu(c) {
+			flags = console_srcu_read_flags(c);
+			if (flags & CON_NO_BKL)
+				cons_force_seq(c, seq);
+		}
+		console_srcu_read_unlock(cookie);
+	}
+	cons_atomic_flush(NULL, true);
+
 	if (!have_bkl_console)
 		return;
 
@@ -3174,12 +3196,6 @@ void console_flush_on_panic(enum con_flush_mode mode)
 	console_may_schedule = 0;
 
 	if (mode == CONSOLE_REPLAY_ALL) {
-		struct console *c;
-		int cookie;
-		u64 seq;
-
-		seq = prb_first_valid_seq(prb);
-
 		cookie = console_srcu_read_lock();
 		for_each_console_srcu(c) {
 			/*
diff --git a/kernel/printk/printk_nobkl.c b/kernel/printk/printk_nobkl.c
index 53989c8f1dbc..ac2ba785500e 100644
--- a/kernel/printk/printk_nobkl.c
+++ b/kernel/printk/printk_nobkl.c
@@ -233,6 +233,30 @@ static void cons_seq_init(struct console *con)
 #endif
 }
 
+/**
+ * cons_force_seq - Force a specified sequence number for a console
+ * @con:	Console to work on
+ * @seq:	Sequence number to force
+ *
+ * This function is only intended to be used in emergency situations. In
+ * particular: console_flush_on_panic(CONSOLE_REPLAY_ALL)
+ */
+void cons_force_seq(struct console *con, u64 seq)
+{
+#ifdef CONFIG_64BIT
+	struct cons_state old;
+	struct cons_state new;
+
+	do {
+		cons_state_read(con, CON_STATE_CUR, &old);
+		copy_bit_state(new, old);
+		new.seq = seq;
+	} while (!cons_state_try_cmpxchg(con, CON_STATE_CUR, &old, &new));
+#else
+	atomic_set(&ACCESS_PRIVATE(con, atomic_seq), seq);
+#endif
+}
+
 static inline u64 cons_expand_seq(u64 seq)
 {
 	u64 rbseq;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH printk v1 00/18] serial: 8250: implement non-BKL console
  2023-03-02 19:56 [PATCH printk v1 00/18] threaded/atomic console support John Ogness
                   ` (17 preceding siblings ...)
  2023-03-02 19:56 ` [PATCH printk v1 18/18] printk: Perform atomic flush in console_flush_on_panic() John Ogness
@ 2023-03-02 19:58 ` John Ogness
  2023-03-28 13:33   ` locking API: was: " Petr Mladek
  2023-03-28 13:59   ` [PATCH printk v1 00/18] POC: serial: 8250: implement nbcon console John Ogness
  2023-03-09 10:55 ` [PATCH printk v1 00/18] threaded/atomic console support Daniel Thompson
  19 siblings, 2 replies; 92+ messages in thread
From: John Ogness @ 2023-03-02 19:58 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Jason Wessel, Daniel Thompson, Douglas Anderson,
	Aaron Tomlin, Luis Chamberlain, kgdb-bugreport,
	Greg Kroah-Hartman, linux-fsdevel, Andrew Morton,
	Guilherme G. Piccoli, David Gow, Tiezhu Yang, Daniel Vetter,
	tangmeng, Paul E. McKenney, Frederic Weisbecker, Neeraj Upadhyay,
	Josh Triplett, Mathieu Desnoyers, Lai Jiangshan, Joel Fernandes,
	rcu

Implement the necessary callbacks to allow the 8250 console driver
to perform as a non-BKL console. Remove the implementation for the
legacy console callback (write) and add implementations for the
non-BKL consoles (write_atomic, write_thread, port_lock) and add
CON_NO_BKL to the initial flags.

This is an all-in-one commit meant only for testing the new printk
non-BKL infrastructure. It is not meant to be included mainline in
this form. In particular, it includes mainline driver fixes that
need to be submitted individually.

Although non-BKL consoles can coexist with legacy consoles, you
will only receive all the benefits of the non-BKL consoles, if
this console driver is the only console. That means no netconsole,
no tty1, no earlyprintk, no earlycon. Just the uart8250.

For example: console=ttyS0,115200

Signed-off-by: John Ogness <john.ogness@linutronix.de>
diff --git a/drivers/tty/serial/8250/8250.h b/drivers/tty/serial/8250/8250.h
index 287153d32536..d8da34bb9ae3 100644
--- a/drivers/tty/serial/8250/8250.h
+++ b/drivers/tty/serial/8250/8250.h
@@ -177,12 +177,154 @@ static inline void serial_dl_write(struct uart_8250_port *up, int value)
 	up->dl_write(up, value);
 }
 
+static inline bool serial8250_is_console(struct uart_port *port)
+{
+	return uart_console(port) && !hlist_unhashed_lockless(&port->cons->node);
+}
+
+static inline void serial8250_init_wctxt(struct cons_write_context *wctxt,
+					 struct console *cons)
+{
+	struct cons_context *ctxt = &ACCESS_PRIVATE(wctxt, ctxt);
+
+	memset(wctxt, 0, sizeof(*wctxt));
+	ctxt->console = cons;
+	ctxt->prio = CONS_PRIO_NORMAL;
+	/* Both require the port lock, so they cannot clobber each other. */
+	ctxt->thread = 1;
+}
+
+static inline void serial8250_console_acquire(struct cons_write_context *wctxt,
+					      struct console *cons)
+{
+	serial8250_init_wctxt(wctxt, cons);
+	while (!console_try_acquire(wctxt)) {
+		cpu_relax();
+		serial8250_init_wctxt(wctxt, cons);
+	}
+}
+
+static inline void serial8250_enter_unsafe(struct uart_8250_port *up)
+{
+	struct uart_port *port = &up->port;
+
+	lockdep_assert_held_once(&port->lock);
+
+	for (;;) {
+		up->cookie = console_srcu_read_lock();
+
+		serial8250_console_acquire(&up->wctxt, port->cons);
+
+		if (console_enter_unsafe(&up->wctxt))
+			break;
+
+		console_srcu_read_unlock(up->cookie);
+		cpu_relax();
+	}
+}
+
+static inline void serial8250_exit_unsafe(struct uart_8250_port *up)
+{
+	struct uart_port *port = &up->port;
+
+	lockdep_assert_held_once(&port->lock);
+
+	/*
+	 * FIXME: The 8250 driver does not support hostile takeovers
+	 * in the unsafe section.
+	 */
+	if (!WARN_ON_ONCE(!console_exit_unsafe(&up->wctxt)))
+		WARN_ON_ONCE(!console_release(&up->wctxt));
+
+	console_srcu_read_unlock(up->cookie);
+}
+
+static inline int serial8250_in_IER(struct uart_8250_port *up)
+{
+	struct uart_port *port = &up->port;
+	bool is_console;
+	int ier;
+
+	is_console = serial8250_is_console(port);
+
+	if (is_console)
+		serial8250_enter_unsafe(up);
+
+	ier = serial_in(up, UART_IER);
+
+	if (is_console)
+		serial8250_exit_unsafe(up);
+
+	return ier;
+}
+
+static inline bool __serial8250_set_IER(struct uart_8250_port *up,
+					struct cons_write_context *wctxt,
+					int ier)
+{
+	if (wctxt && !console_can_proceed(wctxt))
+		return false;
+	serial_out(up, UART_IER, ier);
+	return true;
+}
+
+static inline void serial8250_set_IER(struct uart_8250_port *up, int ier)
+{
+	struct uart_port *port = &up->port;
+	bool is_console;
+
+	is_console = serial8250_is_console(port);
+
+	if (is_console) {
+		serial8250_enter_unsafe(up);
+		__serial8250_set_IER(up, &up->wctxt, ier);
+		serial8250_exit_unsafe(up);
+	} else {
+		__serial8250_set_IER(up, NULL, ier);
+	}
+}
+
+static inline bool __serial8250_clear_IER(struct uart_8250_port *up,
+					  struct cons_write_context *wctxt,
+					  int *prior)
+{
+	unsigned int clearval = 0;
+
+	if (up->capabilities & UART_CAP_UUE)
+		clearval = UART_IER_UUE;
+
+	*prior = serial_in(up, UART_IER);
+	if (wctxt && !console_can_proceed(wctxt))
+		return false;
+	serial_out(up, UART_IER, clearval);
+	return true;
+}
+
+static inline int serial8250_clear_IER(struct uart_8250_port *up)
+{
+	struct uart_port *port = &up->port;
+	bool is_console;
+	int prior;
+
+	is_console = serial8250_is_console(port);
+
+	if (is_console) {
+		serial8250_enter_unsafe(up);
+		__serial8250_clear_IER(up, &up->wctxt, &prior);
+		serial8250_exit_unsafe(up);
+	} else {
+		__serial8250_clear_IER(up, NULL, &prior);
+	}
+
+	return prior;
+}
+
 static inline bool serial8250_set_THRI(struct uart_8250_port *up)
 {
 	if (up->ier & UART_IER_THRI)
 		return false;
 	up->ier |= UART_IER_THRI;
-	serial_out(up, UART_IER, up->ier);
+	serial8250_set_IER(up, up->ier);
 	return true;
 }
 
@@ -191,7 +333,7 @@ static inline bool serial8250_clear_THRI(struct uart_8250_port *up)
 	if (!(up->ier & UART_IER_THRI))
 		return false;
 	up->ier &= ~UART_IER_THRI;
-	serial_out(up, UART_IER, up->ier);
+	serial8250_set_IER(up, up->ier);
 	return true;
 }
 
diff --git a/drivers/tty/serial/8250/8250_aspeed_vuart.c b/drivers/tty/serial/8250/8250_aspeed_vuart.c
index 9d2a7856784f..7cc6b527c088 100644
--- a/drivers/tty/serial/8250/8250_aspeed_vuart.c
+++ b/drivers/tty/serial/8250/8250_aspeed_vuart.c
@@ -278,7 +278,7 @@ static void __aspeed_vuart_set_throttle(struct uart_8250_port *up,
 	up->ier &= ~irqs;
 	if (!throttle)
 		up->ier |= irqs;
-	serial_out(up, UART_IER, up->ier);
+	serial8250_set_IER(up, up->ier);
 }
 static void aspeed_vuart_set_throttle(struct uart_port *port, bool throttle)
 {
diff --git a/drivers/tty/serial/8250/8250_bcm7271.c b/drivers/tty/serial/8250/8250_bcm7271.c
index ed5a94747692..adb1a3247807 100644
--- a/drivers/tty/serial/8250/8250_bcm7271.c
+++ b/drivers/tty/serial/8250/8250_bcm7271.c
@@ -606,8 +606,10 @@ static int brcmuart_startup(struct uart_port *port)
 	 * Disable the Receive Data Interrupt because the DMA engine
 	 * will handle this.
 	 */
+	spin_lock_irq(&port->lock);
 	up->ier &= ~UART_IER_RDI;
-	serial_port_out(port, UART_IER, up->ier);
+	serial8250_set_IER(up, up->ier);
+	spin_unlock_irq(&port->lock);
 
 	priv->tx_running = false;
 	priv->dma.rx_dma = NULL;
@@ -787,6 +789,12 @@ static int brcmuart_handle_irq(struct uart_port *p)
 		spin_lock_irqsave(&p->lock, flags);
 		status = serial_port_in(p, UART_LSR);
 		if ((status & UART_LSR_DR) == 0) {
+			bool is_console;
+
+			is_console = serial8250_is_console(port);
+
+			if (is_console)
+				serial8250_enter_unsafe(p);
 
 			ier = serial_port_in(p, UART_IER);
 			/*
@@ -807,6 +815,9 @@ static int brcmuart_handle_irq(struct uart_port *p)
 				serial_port_in(p, UART_RX);
 			}
 
+			if (is_console)
+				serial8250_exit_unsafe(p);
+
 			handled = 1;
 		}
 		spin_unlock_irqrestore(&p->lock, flags);
@@ -844,12 +855,22 @@ static enum hrtimer_restart brcmuart_hrtimer_func(struct hrtimer *t)
 	/* re-enable receive unless upper layer has disabled it */
 	if ((up->ier & (UART_IER_RLSI | UART_IER_RDI)) ==
 	    (UART_IER_RLSI | UART_IER_RDI)) {
+		bool is_console;
+
+		is_console = serial8250_is_console(port);
+
+		if (is_console)
+			serial8250_enter_unsafe(p);
+
 		status = serial_port_in(p, UART_IER);
 		status |= (UART_IER_RLSI | UART_IER_RDI);
 		serial_port_out(p, UART_IER, status);
 		status = serial_port_in(p, UART_MCR);
 		status |= UART_MCR_RTS;
 		serial_port_out(p, UART_MCR, status);
+
+		if (is_console)
+			serial8250_exit_unsafe(p);
 	}
 	spin_unlock_irqrestore(&p->lock, flags);
 	return HRTIMER_NORESTART;
diff --git a/drivers/tty/serial/8250/8250_core.c b/drivers/tty/serial/8250/8250_core.c
index ab63c308be0a..688ecfc6e1d5 100644
--- a/drivers/tty/serial/8250/8250_core.c
+++ b/drivers/tty/serial/8250/8250_core.c
@@ -256,6 +256,7 @@ static void serial8250_timeout(struct timer_list *t)
 static void serial8250_backup_timeout(struct timer_list *t)
 {
 	struct uart_8250_port *up = from_timer(up, t, timer);
+	struct uart_port *port = &up->port;
 	unsigned int iir, ier = 0, lsr;
 	unsigned long flags;
 
@@ -266,8 +267,18 @@ static void serial8250_backup_timeout(struct timer_list *t)
 	 * based handler.
 	 */
 	if (up->port.irq) {
+		bool is_console;
+
+		is_console = serial8250_is_console(port);
+
+		if (is_console)
+			serial8250_enter_unsafe(up);
+
 		ier = serial_in(up, UART_IER);
 		serial_out(up, UART_IER, 0);
+
+		if (is_console)
+			serial8250_exit_unsafe(up);
 	}
 
 	iir = serial_in(up, UART_IIR);
@@ -290,7 +301,7 @@ static void serial8250_backup_timeout(struct timer_list *t)
 		serial8250_tx_chars(up);
 
 	if (up->port.irq)
-		serial_out(up, UART_IER, ier);
+		serial8250_set_IER(up, ier);
 
 	spin_unlock_irqrestore(&up->port.lock, flags);
 
@@ -576,12 +587,30 @@ serial8250_register_ports(struct uart_driver *drv, struct device *dev)
 
 #ifdef CONFIG_SERIAL_8250_CONSOLE
 
-static void univ8250_console_write(struct console *co, const char *s,
-				   unsigned int count)
+static void univ8250_console_port_lock(struct console *con, bool do_lock, unsigned long *flags)
+{
+	struct uart_8250_port *up = &serial8250_ports[con->index];
+
+	if (do_lock)
+		spin_lock_irqsave(&up->port.lock, *flags);
+	else
+		spin_unlock_irqrestore(&up->port.lock, *flags);
+}
+
+static bool univ8250_console_write_atomic(struct console *co,
+					  struct cons_write_context *wctxt)
+{
+	struct uart_8250_port *up = &serial8250_ports[co->index];
+
+	return serial8250_console_write_atomic(up, wctxt);
+}
+
+static bool univ8250_console_write_thread(struct console *co,
+					  struct cons_write_context *wctxt)
 {
 	struct uart_8250_port *up = &serial8250_ports[co->index];
 
-	serial8250_console_write(up, s, count);
+	return serial8250_console_write_thread(up, wctxt);
 }
 
 static int univ8250_console_setup(struct console *co, char *options)
@@ -669,12 +698,14 @@ static int univ8250_console_match(struct console *co, char *name, int idx,
 
 static struct console univ8250_console = {
 	.name		= "ttyS",
-	.write		= univ8250_console_write,
+	.write_atomic	= univ8250_console_write_atomic,
+	.write_thread	= univ8250_console_write_thread,
+	.port_lock	= univ8250_console_port_lock,
 	.device		= uart_console_device,
 	.setup		= univ8250_console_setup,
 	.exit		= univ8250_console_exit,
 	.match		= univ8250_console_match,
-	.flags		= CON_PRINTBUFFER | CON_ANYTIME,
+	.flags		= CON_PRINTBUFFER | CON_ANYTIME | CON_NO_BKL,
 	.index		= -1,
 	.data		= &serial8250_reg,
 };
@@ -962,7 +993,7 @@ static void serial_8250_overrun_backoff_work(struct work_struct *work)
 	spin_lock_irqsave(&port->lock, flags);
 	up->ier |= UART_IER_RLSI | UART_IER_RDI;
 	up->port.read_status_mask |= UART_LSR_DR;
-	serial_out(up, UART_IER, up->ier);
+	serial8250_set_IER(up, up->ier);
 	spin_unlock_irqrestore(&port->lock, flags);
 }
 
diff --git a/drivers/tty/serial/8250/8250_exar.c b/drivers/tty/serial/8250/8250_exar.c
index 64770c62bbec..ccb70b20b1f4 100644
--- a/drivers/tty/serial/8250/8250_exar.c
+++ b/drivers/tty/serial/8250/8250_exar.c
@@ -185,6 +185,10 @@ static void xr17v35x_set_divisor(struct uart_port *p, unsigned int baud,
 
 static int xr17v35x_startup(struct uart_port *port)
 {
+	struct uart_8250_port *up = up_to_u8250p(port);
+
+	spin_lock_irq(&port->lock);
+
 	/*
 	 * First enable access to IER [7:5], ISR [5:4], FCR [5:4],
 	 * MCR [7:5] and MSR [7:0]
@@ -195,7 +199,9 @@ static int xr17v35x_startup(struct uart_port *port)
 	 * Make sure all interrups are masked until initialization is
 	 * complete and the FIFOs are cleared
 	 */
-	serial_port_out(port, UART_IER, 0);
+	serial8250_set_IER(up, 0);
+
+	spin_unlock_irq(&port->lock);
 
 	return serial8250_do_startup(port);
 }
diff --git a/drivers/tty/serial/8250/8250_fsl.c b/drivers/tty/serial/8250/8250_fsl.c
index 8aad15622a2e..74bb85b705e7 100644
--- a/drivers/tty/serial/8250/8250_fsl.c
+++ b/drivers/tty/serial/8250/8250_fsl.c
@@ -58,7 +58,8 @@ int fsl8250_handle_irq(struct uart_port *port)
 	if ((orig_lsr & UART_LSR_OE) && (up->overrun_backoff_time_ms > 0)) {
 		unsigned long delay;
 
-		up->ier = port->serial_in(port, UART_IER);
+		up->ier = serial8250_in_IER(up);
+
 		if (up->ier & (UART_IER_RLSI | UART_IER_RDI)) {
 			port->ops->stop_rx(port);
 		} else {
diff --git a/drivers/tty/serial/8250/8250_ingenic.c b/drivers/tty/serial/8250/8250_ingenic.c
index 617b8ce60d6b..548904c3d11b 100644
--- a/drivers/tty/serial/8250/8250_ingenic.c
+++ b/drivers/tty/serial/8250/8250_ingenic.c
@@ -171,6 +171,7 @@ OF_EARLYCON_DECLARE(x1000_uart, "ingenic,x1000-uart",
 
 static void ingenic_uart_serial_out(struct uart_port *p, int offset, int value)
 {
+	struct uart_8250_port *up = up_to_u8250p(p);
 	int ier;
 
 	switch (offset) {
@@ -192,7 +193,7 @@ static void ingenic_uart_serial_out(struct uart_port *p, int offset, int value)
 		 * If we have enabled modem status IRQs we should enable
 		 * modem mode.
 		 */
-		ier = p->serial_in(p, UART_IER);
+		ier = serial8250_in_IER(up);
 
 		if (ier & UART_IER_MSI)
 			value |= UART_MCR_MDCE | UART_MCR_FCM;
diff --git a/drivers/tty/serial/8250/8250_mtk.c b/drivers/tty/serial/8250/8250_mtk.c
index fb1d5ec0940e..bf7ab55c8923 100644
--- a/drivers/tty/serial/8250/8250_mtk.c
+++ b/drivers/tty/serial/8250/8250_mtk.c
@@ -222,12 +222,38 @@ static void mtk8250_shutdown(struct uart_port *port)
 
 static void mtk8250_disable_intrs(struct uart_8250_port *up, int mask)
 {
-	serial_out(up, UART_IER, serial_in(up, UART_IER) & (~mask));
+	struct uart_port *port = &up->port;
+	bool is_console;
+	int ier;
+
+	is_console = serial8250_is_console(port);
+
+	if (is_console)
+		serial8250_enter_unsafe(up);
+
+	ier = serial_in(up, UART_IER);
+	serial_out(up, UART_IER, ier & (~mask));
+
+	if (is_console)
+		serial8250_exit_unsafe(up);
 }
 
 static void mtk8250_enable_intrs(struct uart_8250_port *up, int mask)
 {
-	serial_out(up, UART_IER, serial_in(up, UART_IER) | mask);
+	struct uart_port *port = &up->port;
+	bool is_console;
+	int ier;
+
+	is_console = serial8250_is_console(port);
+
+	if (is_console)
+		serial8250_enter_unsafe(up);
+
+	ier = serial_in(up, UART_IER);
+	serial_out(up, UART_IER, ier | mask);
+
+	if (is_console)
+		serial8250_exit_unsafe(up);
 }
 
 static void mtk8250_set_flow_ctrl(struct uart_8250_port *up, int mode)
diff --git a/drivers/tty/serial/8250/8250_omap.c b/drivers/tty/serial/8250/8250_omap.c
index 734f092ef839..bfa50a26349d 100644
--- a/drivers/tty/serial/8250/8250_omap.c
+++ b/drivers/tty/serial/8250/8250_omap.c
@@ -334,8 +334,7 @@ static void omap8250_restore_regs(struct uart_8250_port *up)
 
 	/* drop TCR + TLR access, we setup XON/XOFF later */
 	serial8250_out_MCR(up, mcr);
-
-	serial_out(up, UART_IER, up->ier);
+	serial8250_set_IER(up, up->ier);
 
 	serial_out(up, UART_LCR, UART_LCR_CONF_MODE_B);
 	serial_dl_write(up, priv->quot);
@@ -523,16 +522,21 @@ static void omap_8250_pm(struct uart_port *port, unsigned int state,
 	u8 efr;
 
 	pm_runtime_get_sync(port->dev);
+
+	spin_lock_irq(&port->lock);
+
 	serial_out(up, UART_LCR, UART_LCR_CONF_MODE_B);
 	efr = serial_in(up, UART_EFR);
 	serial_out(up, UART_EFR, efr | UART_EFR_ECB);
 	serial_out(up, UART_LCR, 0);
 
-	serial_out(up, UART_IER, (state != 0) ? UART_IERX_SLEEP : 0);
+	serial8250_set_IER(up, (state != 0) ? UART_IERX_SLEEP : 0);
 	serial_out(up, UART_LCR, UART_LCR_CONF_MODE_B);
 	serial_out(up, UART_EFR, efr);
 	serial_out(up, UART_LCR, 0);
 
+	spin_unlock_irq(&port->lock);
+
 	pm_runtime_mark_last_busy(port->dev);
 	pm_runtime_put_autosuspend(port->dev);
 }
@@ -649,7 +653,8 @@ static irqreturn_t omap8250_irq(int irq, void *dev_id)
 	if ((lsr & UART_LSR_OE) && up->overrun_backoff_time_ms > 0) {
 		unsigned long delay;
 
-		up->ier = port->serial_in(port, UART_IER);
+		spin_lock(&port->lock);
+		up->ier = serial8250_in_IER(up);
 		if (up->ier & (UART_IER_RLSI | UART_IER_RDI)) {
 			port->ops->stop_rx(port);
 		} else {
@@ -658,6 +663,7 @@ static irqreturn_t omap8250_irq(int irq, void *dev_id)
 			 */
 			cancel_delayed_work(&up->overrun_backoff);
 		}
+		spin_unlock(&port->lock);
 
 		delay = msecs_to_jiffies(up->overrun_backoff_time_ms);
 		schedule_delayed_work(&up->overrun_backoff, delay);
@@ -707,8 +713,10 @@ static int omap_8250_startup(struct uart_port *port)
 	if (ret < 0)
 		goto err;
 
+	spin_lock_irq(&port->lock);
 	up->ier = UART_IER_RLSI | UART_IER_RDI;
-	serial_out(up, UART_IER, up->ier);
+	serial8250_set_IER(up, up->ier);
+	spin_unlock_irq(&port->lock);
 
 #ifdef CONFIG_PM
 	up->capabilities |= UART_CAP_RPM;
@@ -748,8 +756,10 @@ static void omap_8250_shutdown(struct uart_port *port)
 	if (priv->habit & UART_HAS_EFR2)
 		serial_out(up, UART_OMAP_EFR2, 0x0);
 
+	spin_lock_irq(&port->lock);
 	up->ier = 0;
-	serial_out(up, UART_IER, 0);
+	serial8250_set_IER(up, 0);
+	spin_unlock_irq(&port->lock);
 
 	if (up->dma)
 		serial8250_release_dma(up);
@@ -797,7 +807,7 @@ static void omap_8250_unthrottle(struct uart_port *port)
 		up->dma->rx_dma(up);
 	up->ier |= UART_IER_RLSI | UART_IER_RDI;
 	port->read_status_mask |= UART_LSR_DR;
-	serial_out(up, UART_IER, up->ier);
+	serial8250_set_IER(up, up->ier);
 	spin_unlock_irqrestore(&port->lock, flags);
 
 	pm_runtime_mark_last_busy(port->dev);
@@ -956,7 +966,7 @@ static void __dma_rx_complete(void *param)
 	__dma_rx_do_complete(p);
 	if (!priv->throttled) {
 		p->ier |= UART_IER_RLSI | UART_IER_RDI;
-		serial_out(p, UART_IER, p->ier);
+		serial8250_set_IER(p, p->ier);
 		if (!(priv->habit & UART_HAS_EFR2))
 			omap_8250_rx_dma(p);
 	}
@@ -1013,7 +1023,7 @@ static int omap_8250_rx_dma(struct uart_8250_port *p)
 			 * callback to run.
 			 */
 			p->ier &= ~(UART_IER_RLSI | UART_IER_RDI);
-			serial_out(p, UART_IER, p->ier);
+			serial8250_set_IER(p, p->ier);
 		}
 		goto out;
 	}
@@ -1226,12 +1236,12 @@ static void am654_8250_handle_rx_dma(struct uart_8250_port *up, u8 iir,
 		 * periodic timeouts, re-enable interrupts.
 		 */
 		up->ier &= ~(UART_IER_RLSI | UART_IER_RDI);
-		serial_out(up, UART_IER, up->ier);
+		serial8250_set_IER(up, up->ier);
 		omap_8250_rx_dma_flush(up);
 		serial_in(up, UART_IIR);
 		serial_out(up, UART_OMAP_EFR2, 0x0);
 		up->ier |= UART_IER_RLSI | UART_IER_RDI;
-		serial_out(up, UART_IER, up->ier);
+		serial8250_set_IER(up, up->ier);
 	}
 }
 
@@ -1717,12 +1727,16 @@ static int omap8250_runtime_resume(struct device *dev)
 
 	up = serial8250_get_port(priv->line);
 
+	spin_lock_irq(&up->port.lock);
+
 	if (omap8250_lost_context(up))
 		omap8250_restore_regs(up);
 
 	if (up->dma && up->dma->rxchan && !(priv->habit & UART_HAS_EFR2))
 		omap_8250_rx_dma(up);
 
+	spin_unlock_irq(&up->port.lock);
+
 	priv->latency = priv->calc_latency;
 	schedule_work(&priv->qos_work);
 	return 0;
diff --git a/drivers/tty/serial/8250/8250_port.c b/drivers/tty/serial/8250/8250_port.c
index fa43df05342b..f1976d9a8a38 100644
--- a/drivers/tty/serial/8250/8250_port.c
+++ b/drivers/tty/serial/8250/8250_port.c
@@ -744,6 +744,7 @@ static void serial8250_set_sleep(struct uart_8250_port *p, int sleep)
 	serial8250_rpm_get(p);
 
 	if (p->capabilities & UART_CAP_SLEEP) {
+		spin_lock_irq(&p->port.lock);
 		if (p->capabilities & UART_CAP_EFR) {
 			lcr = serial_in(p, UART_LCR);
 			efr = serial_in(p, UART_EFR);
@@ -751,25 +752,18 @@ static void serial8250_set_sleep(struct uart_8250_port *p, int sleep)
 			serial_out(p, UART_EFR, UART_EFR_ECB);
 			serial_out(p, UART_LCR, 0);
 		}
-		serial_out(p, UART_IER, sleep ? UART_IERX_SLEEP : 0);
+		serial8250_set_IER(p, sleep ? UART_IERX_SLEEP : 0);
 		if (p->capabilities & UART_CAP_EFR) {
 			serial_out(p, UART_LCR, UART_LCR_CONF_MODE_B);
 			serial_out(p, UART_EFR, efr);
 			serial_out(p, UART_LCR, lcr);
 		}
+		spin_unlock_irq(&p->port.lock);
 	}
 
 	serial8250_rpm_put(p);
 }
 
-static void serial8250_clear_IER(struct uart_8250_port *up)
-{
-	if (up->capabilities & UART_CAP_UUE)
-		serial_out(up, UART_IER, UART_IER_UUE);
-	else
-		serial_out(up, UART_IER, 0);
-}
-
 #ifdef CONFIG_SERIAL_8250_RSA
 /*
  * Attempts to turn on the RSA FIFO.  Returns zero on failure.
@@ -1033,8 +1027,10 @@ static int broken_efr(struct uart_8250_port *up)
  */
 static void autoconfig_16550a(struct uart_8250_port *up)
 {
+	struct uart_port *port = &up->port;
 	unsigned char status1, status2;
 	unsigned int iersave;
+	bool is_console;
 
 	up->port.type = PORT_16550A;
 	up->capabilities |= UART_CAP_FIFO;
@@ -1150,6 +1146,11 @@ static void autoconfig_16550a(struct uart_8250_port *up)
 		return;
 	}
 
+	is_console = serial8250_is_console(port);
+
+	if (is_console)
+		serial8250_enter_unsafe(up);
+
 	/*
 	 * Try writing and reading the UART_IER_UUE bit (b6).
 	 * If it works, this is probably one of the Xscale platform's
@@ -1185,6 +1186,9 @@ static void autoconfig_16550a(struct uart_8250_port *up)
 	}
 	serial_out(up, UART_IER, iersave);
 
+	if (is_console)
+		serial8250_exit_unsafe(up);
+
 	/*
 	 * We distinguish between 16550A and U6 16550A by counting
 	 * how many bytes are in the FIFO.
@@ -1226,6 +1230,13 @@ static void autoconfig(struct uart_8250_port *up)
 	up->bugs = 0;
 
 	if (!(port->flags & UPF_BUGGY_UART)) {
+		bool is_console;
+
+		is_console = serial8250_is_console(port);
+
+		if (is_console)
+			serial8250_enter_unsafe(up);
+
 		/*
 		 * Do a simple existence test first; if we fail this,
 		 * there's no point trying anything else.
@@ -1255,6 +1266,10 @@ static void autoconfig(struct uart_8250_port *up)
 #endif
 		scratch3 = serial_in(up, UART_IER) & UART_IER_ALL_INTR;
 		serial_out(up, UART_IER, scratch);
+
+		if (is_console)
+			serial8250_exit_unsafe(up);
+
 		if (scratch2 != 0 || scratch3 != UART_IER_ALL_INTR) {
 			/*
 			 * We failed; there's nothing here
@@ -1376,6 +1391,7 @@ static void autoconfig_irq(struct uart_8250_port *up)
 	unsigned char save_ICP = 0;
 	unsigned int ICP = 0;
 	unsigned long irqs;
+	bool is_console;
 	int irq;
 
 	if (port->flags & UPF_FOURPORT) {
@@ -1385,8 +1401,12 @@ static void autoconfig_irq(struct uart_8250_port *up)
 		inb_p(ICP);
 	}
 
-	if (uart_console(port))
+	is_console = serial8250_is_console(port);
+
+	if (is_console) {
 		console_lock();
+		serial8250_enter_unsafe(up);
+	}
 
 	/* forget possible initially masked and pending IRQ */
 	probe_irq_off(probe_irq_on());
@@ -1418,8 +1438,10 @@ static void autoconfig_irq(struct uart_8250_port *up)
 	if (port->flags & UPF_FOURPORT)
 		outb_p(save_ICP, ICP);
 
-	if (uart_console(port))
+	if (is_console) {
+		serial8250_exit_unsafe(up);
 		console_unlock();
+	}
 
 	port->irq = (irq > 0) ? irq : 0;
 }
@@ -1432,7 +1454,7 @@ static void serial8250_stop_rx(struct uart_port *port)
 
 	up->ier &= ~(UART_IER_RLSI | UART_IER_RDI);
 	up->port.read_status_mask &= ~UART_LSR_DR;
-	serial_port_out(port, UART_IER, up->ier);
+	serial8250_set_IER(up, up->ier);
 
 	serial8250_rpm_put(up);
 }
@@ -1462,7 +1484,7 @@ void serial8250_em485_stop_tx(struct uart_8250_port *p)
 		serial8250_clear_and_reinit_fifos(p);
 
 		p->ier |= UART_IER_RLSI | UART_IER_RDI;
-		serial_port_out(&p->port, UART_IER, p->ier);
+		serial8250_set_IER(p, p->ier);
 	}
 }
 EXPORT_SYMBOL_GPL(serial8250_em485_stop_tx);
@@ -1709,7 +1731,7 @@ static void serial8250_disable_ms(struct uart_port *port)
 	mctrl_gpio_disable_ms(up->gpios);
 
 	up->ier &= ~UART_IER_MSI;
-	serial_port_out(port, UART_IER, up->ier);
+	serial8250_set_IER(up, up->ier);
 }
 
 static void serial8250_enable_ms(struct uart_port *port)
@@ -1725,7 +1747,7 @@ static void serial8250_enable_ms(struct uart_port *port)
 	up->ier |= UART_IER_MSI;
 
 	serial8250_rpm_get(up);
-	serial_port_out(port, UART_IER, up->ier);
+	serial8250_set_IER(up, up->ier);
 	serial8250_rpm_put(up);
 }
 
@@ -2160,9 +2182,10 @@ static void serial8250_put_poll_char(struct uart_port *port,
 	serial8250_rpm_get(up);
 	/*
 	 *	First save the IER then disable the interrupts
+	 *
+	 *	Best-effort IER access because other CPUs are quiesced.
 	 */
-	ier = serial_port_in(port, UART_IER);
-	serial8250_clear_IER(up);
+	__serial8250_clear_IER(up, NULL, &ier);
 
 	wait_for_xmitr(up, UART_LSR_BOTH_EMPTY);
 	/*
@@ -2175,7 +2198,7 @@ static void serial8250_put_poll_char(struct uart_port *port,
 	 *	and restore the IER
 	 */
 	wait_for_xmitr(up, UART_LSR_BOTH_EMPTY);
-	serial_port_out(port, UART_IER, ier);
+	__serial8250_set_IER(up, NULL, ier);
 	serial8250_rpm_put(up);
 }
 
@@ -2186,6 +2209,7 @@ int serial8250_do_startup(struct uart_port *port)
 	struct uart_8250_port *up = up_to_u8250p(port);
 	unsigned long flags;
 	unsigned char iir;
+	bool is_console;
 	int retval;
 	u16 lsr;
 
@@ -2203,21 +2227,25 @@ int serial8250_do_startup(struct uart_port *port)
 	serial8250_rpm_get(up);
 	if (port->type == PORT_16C950) {
 		/* Wake up and initialize UART */
+		spin_lock_irqsave(&port->lock, flags);
 		up->acr = 0;
 		serial_port_out(port, UART_LCR, UART_LCR_CONF_MODE_B);
 		serial_port_out(port, UART_EFR, UART_EFR_ECB);
-		serial_port_out(port, UART_IER, 0);
+		serial8250_set_IER(up, 0);
 		serial_port_out(port, UART_LCR, 0);
 		serial_icr_write(up, UART_CSR, 0); /* Reset the UART */
 		serial_port_out(port, UART_LCR, UART_LCR_CONF_MODE_B);
 		serial_port_out(port, UART_EFR, UART_EFR_ECB);
 		serial_port_out(port, UART_LCR, 0);
+		spin_unlock_irqrestore(&port->lock, flags);
 	}
 
 	if (port->type == PORT_DA830) {
 		/* Reset the port */
-		serial_port_out(port, UART_IER, 0);
+		spin_lock_irqsave(&port->lock, flags);
+		serial8250_set_IER(up, 0);
 		serial_port_out(port, UART_DA830_PWREMU_MGMT, 0);
+		spin_unlock_irqrestore(&port->lock, flags);
 		mdelay(10);
 
 		/* Enable Tx, Rx and free run mode */
@@ -2315,6 +2343,8 @@ int serial8250_do_startup(struct uart_port *port)
 	if (retval)
 		goto out;
 
+	is_console = serial8250_is_console(port);
+
 	if (port->irq && !(up->port.flags & UPF_NO_THRE_TEST)) {
 		unsigned char iir1;
 
@@ -2331,6 +2361,9 @@ int serial8250_do_startup(struct uart_port *port)
 		 */
 		spin_lock_irqsave(&port->lock, flags);
 
+		if (is_console)
+			serial8250_enter_unsafe(up);
+
 		wait_for_xmitr(up, UART_LSR_THRE);
 		serial_port_out_sync(port, UART_IER, UART_IER_THRI);
 		udelay(1); /* allow THRE to set */
@@ -2341,6 +2374,9 @@ int serial8250_do_startup(struct uart_port *port)
 		iir = serial_port_in(port, UART_IIR);
 		serial_port_out(port, UART_IER, 0);
 
+		if (is_console)
+			serial8250_exit_unsafe(up);
+
 		spin_unlock_irqrestore(&port->lock, flags);
 
 		if (port->irqflags & IRQF_SHARED)
@@ -2395,10 +2431,14 @@ int serial8250_do_startup(struct uart_port *port)
 	 * Do a quick test to see if we receive an interrupt when we enable
 	 * the TX irq.
 	 */
+	if (is_console)
+		serial8250_enter_unsafe(up);
 	serial_port_out(port, UART_IER, UART_IER_THRI);
 	lsr = serial_port_in(port, UART_LSR);
 	iir = serial_port_in(port, UART_IIR);
 	serial_port_out(port, UART_IER, 0);
+	if (is_console)
+		serial8250_exit_unsafe(up);
 
 	if (lsr & UART_LSR_TEMT && iir & UART_IIR_NO_INT) {
 		if (!(up->bugs & UART_BUG_TXEN)) {
@@ -2430,7 +2470,7 @@ int serial8250_do_startup(struct uart_port *port)
 	if (up->dma) {
 		const char *msg = NULL;
 
-		if (uart_console(port))
+		if (is_console)
 			msg = "forbid DMA for kernel console";
 		else if (serial8250_request_dma(up))
 			msg = "failed to request DMA";
@@ -2481,7 +2521,7 @@ void serial8250_do_shutdown(struct uart_port *port)
 	 */
 	spin_lock_irqsave(&port->lock, flags);
 	up->ier = 0;
-	serial_port_out(port, UART_IER, 0);
+	serial8250_set_IER(up, 0);
 	spin_unlock_irqrestore(&port->lock, flags);
 
 	synchronize_irq(port->irq);
@@ -2847,7 +2887,7 @@ serial8250_do_set_termios(struct uart_port *port, struct ktermios *termios,
 	if (up->capabilities & UART_CAP_RTOIE)
 		up->ier |= UART_IER_RTOIE;
 
-	serial_port_out(port, UART_IER, up->ier);
+	serial8250_set_IER(up, up->ier);
 
 	if (up->capabilities & UART_CAP_EFR) {
 		unsigned char efr = 0;
@@ -3312,12 +3352,21 @@ EXPORT_SYMBOL_GPL(serial8250_set_defaults);
 
 #ifdef CONFIG_SERIAL_8250_CONSOLE
 
-static void serial8250_console_putchar(struct uart_port *port, unsigned char ch)
+static bool serial8250_console_putchar(struct uart_port *port, unsigned char ch,
+				       struct cons_write_context *wctxt)
 {
 	struct uart_8250_port *up = up_to_u8250p(port);
 
 	wait_for_xmitr(up, UART_LSR_THRE);
+	if (!console_can_proceed(wctxt))
+		return false;
 	serial_port_out(port, UART_TX, ch);
+	if (ch == '\n')
+		up->console_newline_needed = false;
+	else
+		up->console_newline_needed = true;
+
+	return true;
 }
 
 /*
@@ -3346,33 +3395,134 @@ static void serial8250_console_restore(struct uart_8250_port *up)
 	serial8250_out_MCR(up, up->mcr | UART_MCR_DTR | UART_MCR_RTS);
 }
 
+static bool __serial8250_console_write(struct uart_port *port, struct cons_write_context *wctxt,
+		const char *s, unsigned int count,
+		bool (*putchar)(struct uart_port *, unsigned char, struct cons_write_context *))
+{
+	bool finished = false;
+	unsigned int i;
+
+	for (i = 0; i < count; i++, s++) {
+		if (*s == '\n') {
+			if (!putchar(port, '\r', wctxt))
+				goto out;
+		}
+		if (!putchar(port, *s, wctxt))
+			goto out;
+	}
+	finished = true;
+out:
+	return finished;
+}
+
+static bool serial8250_console_write(struct uart_port *port, struct cons_write_context *wctxt,
+		const char *s, unsigned int count,
+		bool (*putchar)(struct uart_port *, unsigned char, struct cons_write_context *))
+{
+	return __serial8250_console_write(port, wctxt, s, count, putchar);
+}
+
+static bool atomic_print_line(struct uart_8250_port *up,
+			      struct cons_write_context *wctxt)
+{
+	struct uart_port *port = &up->port;
+	char buf[4];
+
+	if (up->console_newline_needed &&
+	    !__serial8250_console_write(port, wctxt, "\n", 1, serial8250_console_putchar)) {
+		return false;
+	}
+
+	sprintf(buf, "A%d", raw_smp_processor_id());
+	if (!__serial8250_console_write(port, wctxt, buf, strlen(buf), serial8250_console_putchar))
+		return false;
+
+	return __serial8250_console_write(port, wctxt, wctxt->outbuf, wctxt->len,
+					  serial8250_console_putchar);
+}
+
+static void atomic_console_reacquire(struct cons_write_context *wctxt,
+				     struct cons_write_context *wctxt_init)
+{
+	memcpy(wctxt, wctxt_init, sizeof(*wctxt));
+	while (!console_try_acquire(wctxt)) {
+		cpu_relax();
+		memcpy(wctxt, wctxt_init, sizeof(*wctxt));
+	}
+}
+
 /*
- * Print a string to the serial port using the device FIFO
- *
- * It sends fifosize bytes and then waits for the fifo
- * to get empty.
+ * It should be possible to support a hostile takeover in an unsafe
+ * section if it is write_atomic() that is being taken over. But where
+ * to put this policy?
  */
-static void serial8250_console_fifo_write(struct uart_8250_port *up,
-					  const char *s, unsigned int count)
+bool serial8250_console_write_atomic(struct uart_8250_port *up,
+				     struct cons_write_context *wctxt)
 {
-	int i;
-	const char *end = s + count;
-	unsigned int fifosize = up->tx_loadsz;
-	bool cr_sent = false;
-
-	while (s != end) {
-		wait_for_lsr(up, UART_LSR_THRE);
-
-		for (i = 0; i < fifosize && s != end; ++i) {
-			if (*s == '\n' && !cr_sent) {
-				serial_out(up, UART_TX, '\r');
-				cr_sent = true;
-			} else {
-				serial_out(up, UART_TX, *s++);
-				cr_sent = false;
-			}
+	struct cons_write_context wctxt_init = {};
+	struct cons_context *ctxt_init = &ACCESS_PRIVATE(&wctxt_init, ctxt);
+	struct cons_context *ctxt = &ACCESS_PRIVATE(wctxt, ctxt);
+	bool can_print = true;
+	unsigned int ier;
+
+	/* With write_atomic, another context may hold the port->lock. */
+
+	ctxt_init->console = ctxt->console;
+	ctxt_init->prio = ctxt->prio;
+	ctxt_init->thread = ctxt->thread;
+
+	touch_nmi_watchdog();
+
+	/*
+	 * Enter unsafe in order to disable interrupts. If the console is
+	 * lost before the interrupts are disabled, bail out because another
+	 * context took over the printing. If the console is lost after the
+	 * interrutps are disabled, the console must be reacquired in order
+	 * to re-enable the interrupts. However in that case no printing is
+	 * allowed because another context took over the printing.
+	 */
+
+	if (!console_enter_unsafe(wctxt))
+		return false;
+
+	if (!__serial8250_clear_IER(up, wctxt, &ier))
+		return false;
+
+	if (console_exit_unsafe(wctxt)) {
+		can_print = atomic_print_line(up, wctxt);
+		if (!can_print)
+			atomic_console_reacquire(wctxt, &wctxt_init);
+
+		if (can_print) {
+			can_print = console_can_proceed(wctxt);
+			if (can_print)
+				wait_for_xmitr(up, UART_LSR_BOTH_EMPTY);
+			else
+				atomic_console_reacquire(wctxt, &wctxt_init);
+		}
+	} else {
+		atomic_console_reacquire(wctxt, &wctxt_init);
+	}
+
+	/*
+	 * Enter unsafe in order to enable interrupts. If the console is
+	 * lost before the interrupts are enabled, the console must be
+	 * reacquired in order to re-enable the interrupts.
+	 */
+
+	for (;;) {
+		if (console_enter_unsafe(wctxt) &&
+		    __serial8250_set_IER(up, wctxt, ier)) {
+			break;
 		}
+
+		/* HW-IRQs still disabled. Reacquire to enable them. */
+		atomic_console_reacquire(wctxt, &wctxt_init);
 	}
+
+	console_exit_unsafe(wctxt);
+
+	return can_print;
 }
 
 /*
@@ -3384,64 +3534,54 @@ static void serial8250_console_fifo_write(struct uart_8250_port *up,
  *	Doing runtime PM is really a bad idea for the kernel console.
  *	Thus, we assume the function is called when device is powered up.
  */
-void serial8250_console_write(struct uart_8250_port *up, const char *s,
-			      unsigned int count)
+bool serial8250_console_write_thread(struct uart_8250_port *up,
+				     struct cons_write_context *wctxt)
 {
 	struct uart_8250_em485 *em485 = up->em485;
 	struct uart_port *port = &up->port;
-	unsigned long flags;
-	unsigned int ier, use_fifo;
-	int locked = 1;
-
-	touch_nmi_watchdog();
-
-	if (oops_in_progress)
-		locked = spin_trylock_irqsave(&port->lock, flags);
-	else
-		spin_lock_irqsave(&port->lock, flags);
+	unsigned int count = wctxt->len;
+	const char *s = wctxt->outbuf;
+	bool finished = false;
+	unsigned int ier;
+	char buf[4];
 
 	/*
 	 *	First save the IER then disable the interrupts
 	 */
-	ier = serial_port_in(port, UART_IER);
-	serial8250_clear_IER(up);
+	if (!console_enter_unsafe(wctxt) ||
+	    !__serial8250_clear_IER(up, wctxt, &ier)) {
+		goto out;
+	}
+	if (!console_exit_unsafe(wctxt))
+		goto out;
 
 	/* check scratch reg to see if port powered off during system sleep */
 	if (up->canary && (up->canary != serial_port_in(port, UART_SCR))) {
+		if (!console_enter_unsafe(wctxt))
+			goto out;
 		serial8250_console_restore(up);
+		if (!console_exit_unsafe(wctxt))
+			goto out;
 		up->canary = 0;
 	}
 
 	if (em485) {
-		if (em485->tx_stopped)
+		if (em485->tx_stopped) {
+			if (!console_enter_unsafe(wctxt))
+				goto out;
 			up->rs485_start_tx(up);
-		mdelay(port->rs485.delay_rts_before_send);
+			if (!console_exit_unsafe(wctxt))
+				goto out;
+		}
+		mdelay(port->rs485.delay_rts_before_send); /* WTF?! Seriously?! */
 	}
 
-	use_fifo = (up->capabilities & UART_CAP_FIFO) &&
-		/*
-		 * BCM283x requires to check the fifo
-		 * after each byte.
-		 */
-		!(up->capabilities & UART_CAP_MINI) &&
-		/*
-		 * tx_loadsz contains the transmit fifo size
-		 */
-		up->tx_loadsz > 1 &&
-		(up->fcr & UART_FCR_ENABLE_FIFO) &&
-		port->state &&
-		test_bit(TTY_PORT_INITIALIZED, &port->state->port.iflags) &&
-		/*
-		 * After we put a data in the fifo, the controller will send
-		 * it regardless of the CTS state. Therefore, only use fifo
-		 * if we don't use control flow.
-		 */
-		!(up->port.flags & UPF_CONS_FLOW);
+	sprintf(buf, "T%d", raw_smp_processor_id());
+	if (serial8250_console_write(port, wctxt, buf, strlen(buf), serial8250_console_putchar))
+		finished = serial8250_console_write(port, wctxt, s, count, serial8250_console_putchar);
 
-	if (likely(use_fifo))
-		serial8250_console_fifo_write(up, s, count);
-	else
-		uart_console_write(port, s, count, serial8250_console_putchar);
+	if (!finished)
+		goto out;
 
 	/*
 	 *	Finally, wait for transmitter to become empty
@@ -3450,12 +3590,20 @@ void serial8250_console_write(struct uart_8250_port *up, const char *s,
 	wait_for_xmitr(up, UART_LSR_BOTH_EMPTY);
 
 	if (em485) {
-		mdelay(port->rs485.delay_rts_after_send);
-		if (em485->tx_stopped)
+		mdelay(port->rs485.delay_rts_after_send); /* WTF?! Seriously?! */
+		if (em485->tx_stopped) {
+			if (!console_enter_unsafe(wctxt))
+				goto out;
 			up->rs485_stop_tx(up);
+			if (!console_exit_unsafe(wctxt))
+				goto out;
+		}
 	}
-
-	serial_port_out(port, UART_IER, ier);
+	if (!console_enter_unsafe(wctxt))
+		goto out;
+	WARN_ON_ONCE(!__serial8250_set_IER(up, wctxt, ier));
+	if (!console_exit_unsafe(wctxt))
+		goto out;
 
 	/*
 	 *	The receive handling will happen properly because the
@@ -3464,11 +3612,15 @@ void serial8250_console_write(struct uart_8250_port *up, const char *s,
 	 *	call it if we have saved something in the saved flags
 	 *	while processing with interrupts off.
 	 */
-	if (up->msr_saved_flags)
+	if (up->msr_saved_flags) {
+		if (!console_enter_unsafe(wctxt))
+			goto out;
 		serial8250_modem_status(up);
-
-	if (locked)
-		spin_unlock_irqrestore(&port->lock, flags);
+		if (!console_exit_unsafe(wctxt))
+			goto out;
+	}
+out:
+	return finished;
 }
 
 static unsigned int probe_baud(struct uart_port *port)
@@ -3488,6 +3640,7 @@ static unsigned int probe_baud(struct uart_port *port)
 
 int serial8250_console_setup(struct uart_port *port, char *options, bool probe)
 {
+	struct uart_8250_port *up = up_to_u8250p(port);
 	int baud = 9600;
 	int bits = 8;
 	int parity = 'n';
@@ -3497,6 +3650,8 @@ int serial8250_console_setup(struct uart_port *port, char *options, bool probe)
 	if (!port->iobase && !port->membase)
 		return -ENODEV;
 
+	up->console_newline_needed = false;
+
 	if (options)
 		uart_parse_options(options, &baud, &parity, &bits, &flow);
 	else if (probe)
diff --git a/drivers/tty/serial/8250/Kconfig b/drivers/tty/serial/8250/Kconfig
index 978dc196c29b..22656e8370ea 100644
--- a/drivers/tty/serial/8250/Kconfig
+++ b/drivers/tty/serial/8250/Kconfig
@@ -9,6 +9,7 @@ config SERIAL_8250
 	depends on !S390
 	select SERIAL_CORE
 	select SERIAL_MCTRL_GPIO if GPIOLIB
+	select HAVE_ATOMIC_CONSOLE
 	help
 	  This selects whether you want to include the driver for the standard
 	  serial ports.  The standard answer is Y.  People who might say N
diff --git a/drivers/tty/serial/serial_core.c b/drivers/tty/serial/serial_core.c
index 2bd32c8ece39..9901f916dc1a 100644
--- a/drivers/tty/serial/serial_core.c
+++ b/drivers/tty/serial/serial_core.c
@@ -2336,8 +2336,11 @@ int uart_suspend_port(struct uart_driver *drv, struct uart_port *uport)
 	 * able to Re-start_rx later.
 	 */
 	if (!console_suspend_enabled && uart_console(uport)) {
-		if (uport->ops->start_rx)
+		if (uport->ops->start_rx) {
+			spin_lock_irq(&uport->lock);
 			uport->ops->stop_rx(uport);
+			spin_unlock_irq(&uport->lock);
+		}
 		goto unlock;
 	}
 
@@ -2430,8 +2433,11 @@ int uart_resume_port(struct uart_driver *drv, struct uart_port *uport)
 		if (console_suspend_enabled)
 			uart_change_pm(state, UART_PM_STATE_ON);
 		uport->ops->set_termios(uport, &termios, NULL);
-		if (!console_suspend_enabled && uport->ops->start_rx)
+		if (!console_suspend_enabled && uport->ops->start_rx) {
+			spin_lock_irq(&uport->lock);
 			uport->ops->start_rx(uport);
+			spin_unlock_irq(&uport->lock);
+		}
 		if (console_suspend_enabled)
 			console_start(uport->cons);
 	}
diff --git a/include/linux/serial_8250.h b/include/linux/serial_8250.h
index 19376bee9667..9055a22992ed 100644
--- a/include/linux/serial_8250.h
+++ b/include/linux/serial_8250.h
@@ -125,6 +125,8 @@ struct uart_8250_port {
 #define MSR_SAVE_FLAGS UART_MSR_ANY_DELTA
 	unsigned char		msr_saved_flags;
 
+	bool			console_newline_needed;
+
 	struct uart_8250_dma	*dma;
 	const struct uart_8250_ops *ops;
 
@@ -139,6 +141,9 @@ struct uart_8250_port {
 	/* Serial port overrun backoff */
 	struct delayed_work overrun_backoff;
 	u32 overrun_backoff_time_ms;
+
+	struct cons_write_context wctxt;
+	int cookie;
 };
 
 static inline struct uart_8250_port *up_to_u8250p(struct uart_port *up)
@@ -178,8 +183,10 @@ void serial8250_tx_chars(struct uart_8250_port *up);
 unsigned int serial8250_modem_status(struct uart_8250_port *up);
 void serial8250_init_port(struct uart_8250_port *up);
 void serial8250_set_defaults(struct uart_8250_port *up);
-void serial8250_console_write(struct uart_8250_port *up, const char *s,
-			      unsigned int count);
+bool serial8250_console_write_atomic(struct uart_8250_port *up,
+				     struct cons_write_context *wctxt);
+bool serial8250_console_write_thread(struct uart_8250_port *up,
+				     struct cons_write_context *wctxt);
 int serial8250_console_setup(struct uart_port *port, char *options, bool probe);
 int serial8250_console_exit(struct uart_port *port);
 

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* Re: [PATCH printk v1 10/18] printk: nobkl: Add emit function and callback functions for atomic printing
  2023-03-02 19:56 ` [PATCH printk v1 10/18] printk: nobkl: Add emit function and callback functions for atomic printing John Ogness
@ 2023-03-03  0:19   ` kernel test robot
  2023-03-03 10:55     ` John Ogness
  2023-03-31 10:29   ` dropped handling: was: " Petr Mladek
  2023-03-31 10:36   ` semantic: " Petr Mladek
  2 siblings, 1 reply; 92+ messages in thread
From: kernel test robot @ 2023-03-03  0:19 UTC (permalink / raw)
  To: John Ogness, Petr Mladek
  Cc: oe-kbuild-all, Sergey Senozhatsky, Steven Rostedt,
	Thomas Gleixner, linux-kernel, Greg Kroah-Hartman

Hi John,

I love your patch! Perhaps something to improve:

[auto build test WARNING on 10d639febe5629687dac17c4a7500a96537ce11a]

url:    https://github.com/intel-lab-lkp/linux/commits/John-Ogness/kdb-do-not-assume-write-callback-available/20230303-040039
base:   10d639febe5629687dac17c4a7500a96537ce11a
patch link:    https://lore.kernel.org/r/20230302195618.156940-11-john.ogness%40linutronix.de
patch subject: [PATCH printk v1 10/18] printk: nobkl: Add emit function and callback functions for atomic printing
config: nios2-buildonly-randconfig-r004-20230302 (https://download.01.org/0day-ci/archive/20230303/202303030859.j7DLimWU-lkp@intel.com/config)
compiler: nios2-linux-gcc (GCC) 12.1.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/cae46beabb2dfe79a4c4c602601fa538a8d840f7
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review John-Ogness/kdb-do-not-assume-write-callback-available/20230303-040039
        git checkout cae46beabb2dfe79a4c4c602601fa538a8d840f7
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=nios2 olddefconfig
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=nios2 SHELL=/bin/bash kernel/printk/

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>
| Link: https://lore.kernel.org/oe-kbuild-all/202303030859.j7DLimWU-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> kernel/printk/printk.c:2841:6: warning: no previous prototype for 'printk_get_next_message' [-Wmissing-prototypes]
    2841 | bool printk_get_next_message(struct printk_message *pmsg, u64 seq,
         |      ^~~~~~~~~~~~~~~~~~~~~~~


vim +/printk_get_next_message +2841 kernel/printk/printk.c

  2821	
  2822	/*
  2823	 * Read and format the specified record (or a later record if the specified
  2824	 * record is not available).
  2825	 *
  2826	 * @pmsg will contain the formatted result. @pmsg->pbufs must point to a
  2827	 * struct printk_buffers.
  2828	 *
  2829	 * @seq is the record to read and format. If it is not available, the next
  2830	 * valid record is read.
  2831	 *
  2832	 * @is_extended specifies if the message should be formatted for extended
  2833	 * console output.
  2834	 *
  2835	 * @may_supress specifies if records may be skipped based on loglevel.
  2836	 *
  2837	 * Returns false if no record is available. Otherwise true and all fields
  2838	 * of @pmsg are valid. (See the documentation of struct printk_message
  2839	 * for information about the @pmsg fields.)
  2840	 */
> 2841	bool printk_get_next_message(struct printk_message *pmsg, u64 seq,
  2842				     bool is_extended, bool may_suppress)
  2843	{
  2844		static int panic_console_dropped;
  2845	
  2846		struct printk_buffers *pbufs = pmsg->pbufs;
  2847		const size_t scratchbuf_sz = sizeof(pbufs->scratchbuf);
  2848		const size_t outbuf_sz = sizeof(pbufs->outbuf);
  2849		char *scratchbuf = &pbufs->scratchbuf[0];
  2850		char *outbuf = &pbufs->outbuf[0];
  2851		struct printk_info info;
  2852		struct printk_record r;
  2853		size_t len = 0;
  2854	
  2855		/*
  2856		 * Formatting extended messages requires a separate buffer, so use the
  2857		 * scratch buffer to read in the ringbuffer text.
  2858		 *
  2859		 * Formatting normal messages is done in-place, so read the ringbuffer
  2860		 * text directly into the output buffer.
  2861		 */
  2862		if (is_extended)
  2863			prb_rec_init_rd(&r, &info, scratchbuf, scratchbuf_sz);
  2864		else
  2865			prb_rec_init_rd(&r, &info, outbuf, outbuf_sz);
  2866	
  2867		if (!prb_read_valid(prb, seq, &r))
  2868			return false;
  2869	
  2870		pmsg->seq = r.info->seq;
  2871		pmsg->dropped = r.info->seq - seq;
  2872	
  2873		/*
  2874		 * Check for dropped messages in panic here so that printk
  2875		 * suppression can occur as early as possible if necessary.
  2876		 */
  2877		if (pmsg->dropped &&
  2878		    panic_in_progress() &&
  2879		    panic_console_dropped++ > 10) {
  2880			suppress_panic_printk = 1;
  2881			pr_warn_once("Too many dropped messages. Suppress messages on non-panic CPUs to prevent livelock.\n");
  2882		}
  2883	
  2884		/* Skip record that has level above the console loglevel. */
  2885		if (may_suppress && suppress_message_printing(r.info->level))
  2886			goto out;
  2887	
  2888		if (is_extended) {
  2889			len = info_print_ext_header(outbuf, outbuf_sz, r.info);
  2890			len += msg_print_ext_body(outbuf + len, outbuf_sz - len,
  2891						  &r.text_buf[0], r.info->text_len, &r.info->dev_info);
  2892		} else {
  2893			len = record_print_text(&r, console_msg_format & MSG_FORMAT_SYSLOG, printk_time);
  2894		}
  2895	out:
  2896		pmsg->outbuf_len = len;
  2897		return true;
  2898	}
  2899	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH printk v1 11/18] printk: nobkl: Introduce printer threads
  2023-03-02 19:56 ` [PATCH printk v1 11/18] printk: nobkl: Introduce printer threads John Ogness
@ 2023-03-03  1:23   ` kernel test robot
  2023-03-03 10:56     ` John Ogness
  2023-04-05 10:48   ` boot console: was: " Petr Mladek
                     ` (4 subsequent siblings)
  5 siblings, 1 reply; 92+ messages in thread
From: kernel test robot @ 2023-03-03  1:23 UTC (permalink / raw)
  To: John Ogness, Petr Mladek
  Cc: oe-kbuild-all, Sergey Senozhatsky, Steven Rostedt,
	Thomas Gleixner, linux-kernel, Greg Kroah-Hartman

Hi John,

I love your patch! Yet something to improve:

[auto build test ERROR on 10d639febe5629687dac17c4a7500a96537ce11a]

url:    https://github.com/intel-lab-lkp/linux/commits/John-Ogness/kdb-do-not-assume-write-callback-available/20230303-040039
base:   10d639febe5629687dac17c4a7500a96537ce11a
patch link:    https://lore.kernel.org/r/20230302195618.156940-12-john.ogness%40linutronix.de
patch subject: [PATCH printk v1 11/18] printk: nobkl: Introduce printer threads
config: nios2-buildonly-randconfig-r004-20230302 (https://download.01.org/0day-ci/archive/20230303/202303030957.Hkt9zcFz-lkp@intel.com/config)
compiler: nios2-linux-gcc (GCC) 12.1.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/72ef8a364036e7e813e7f7dfa8d37a4466d1ca8a
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review John-Ogness/kdb-do-not-assume-write-callback-available/20230303-040039
        git checkout 72ef8a364036e7e813e7f7dfa8d37a4466d1ca8a
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=nios2 olddefconfig
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=nios2 SHELL=/bin/bash kernel/printk/

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>
| Link: https://lore.kernel.org/oe-kbuild-all/202303030957.Hkt9zcFz-lkp@intel.com/

All errors (new ones prefixed by >>):

   kernel/printk/printk.c:2802:6: warning: no previous prototype for 'printk_get_next_message' [-Wmissing-prototypes]
    2802 | bool printk_get_next_message(struct printk_message *pmsg, u64 seq,
         |      ^~~~~~~~~~~~~~~~~~~~~~~
   kernel/printk/printk.c: In function 'console_flush_all':
>> kernel/printk/printk.c:2979:30: error: implicit declaration of function 'console_is_usable'; did you mean 'console_exit_unsafe'? [-Werror=implicit-function-declaration]
    2979 |                         if (!console_is_usable(con, flags))
         |                              ^~~~~~~~~~~~~~~~~
         |                              console_exit_unsafe
   cc1: some warnings being treated as errors


vim +2979 kernel/printk/printk.c

a699449bb13b70b kernel/printk/printk.c John Ogness             2022-04-21  2933  
a699449bb13b70b kernel/printk/printk.c John Ogness             2022-04-21  2934  /*
a699449bb13b70b kernel/printk/printk.c John Ogness             2022-04-21  2935   * Print out all remaining records to all consoles.
a699449bb13b70b kernel/printk/printk.c John Ogness             2022-04-21  2936   *
a699449bb13b70b kernel/printk/printk.c John Ogness             2022-04-21  2937   * @do_cond_resched is set by the caller. It can be true only in schedulable
a699449bb13b70b kernel/printk/printk.c John Ogness             2022-04-21  2938   * context.
a699449bb13b70b kernel/printk/printk.c John Ogness             2022-04-21  2939   *
a699449bb13b70b kernel/printk/printk.c John Ogness             2022-04-21  2940   * @next_seq is set to the sequence number after the last available record.
a699449bb13b70b kernel/printk/printk.c John Ogness             2022-04-21  2941   * The value is valid only when this function returns true. It means that all
a699449bb13b70b kernel/printk/printk.c John Ogness             2022-04-21  2942   * usable consoles are completely flushed.
a699449bb13b70b kernel/printk/printk.c John Ogness             2022-04-21  2943   *
a699449bb13b70b kernel/printk/printk.c John Ogness             2022-04-21  2944   * @handover will be set to true if a printk waiter has taken over the
a699449bb13b70b kernel/printk/printk.c John Ogness             2022-04-21  2945   * console_lock, in which case the caller is no longer holding the
a699449bb13b70b kernel/printk/printk.c John Ogness             2022-04-21  2946   * console_lock. Otherwise it is set to false.
a699449bb13b70b kernel/printk/printk.c John Ogness             2022-04-21  2947   *
a699449bb13b70b kernel/printk/printk.c John Ogness             2022-04-21  2948   * Returns true when there was at least one usable console and all messages
a699449bb13b70b kernel/printk/printk.c John Ogness             2022-04-21  2949   * were flushed to all usable consoles. A returned false informs the caller
a699449bb13b70b kernel/printk/printk.c John Ogness             2022-04-21  2950   * that everything was not flushed (either there were no usable consoles or
a699449bb13b70b kernel/printk/printk.c John Ogness             2022-04-21  2951   * another context has taken over printing or it is a panic situation and this
5831788afb17b89 kernel/printk/printk.c Petr Mladek             2022-06-23  2952   * is not the panic CPU). Regardless the reason, the caller should assume it
5831788afb17b89 kernel/printk/printk.c Petr Mladek             2022-06-23  2953   * is not useful to immediately try again.
a699449bb13b70b kernel/printk/printk.c John Ogness             2022-04-21  2954   *
a699449bb13b70b kernel/printk/printk.c John Ogness             2022-04-21  2955   * Requires the console_lock.
a699449bb13b70b kernel/printk/printk.c John Ogness             2022-04-21  2956   */
a699449bb13b70b kernel/printk/printk.c John Ogness             2022-04-21  2957  static bool console_flush_all(bool do_cond_resched, u64 *next_seq, bool *handover)
a699449bb13b70b kernel/printk/printk.c John Ogness             2022-04-21  2958  {
a699449bb13b70b kernel/printk/printk.c John Ogness             2022-04-21  2959  	bool any_usable = false;
a699449bb13b70b kernel/printk/printk.c John Ogness             2022-04-21  2960  	struct console *con;
a699449bb13b70b kernel/printk/printk.c John Ogness             2022-04-21  2961  	bool any_progress;
fc956ae0de7fa25 kernel/printk/printk.c John Ogness             2022-11-16  2962  	int cookie;
a699449bb13b70b kernel/printk/printk.c John Ogness             2022-04-21  2963  
a699449bb13b70b kernel/printk/printk.c John Ogness             2022-04-21  2964  	*next_seq = 0;
a699449bb13b70b kernel/printk/printk.c John Ogness             2022-04-21  2965  	*handover = false;
a699449bb13b70b kernel/printk/printk.c John Ogness             2022-04-21  2966  
a699449bb13b70b kernel/printk/printk.c John Ogness             2022-04-21  2967  	do {
a699449bb13b70b kernel/printk/printk.c John Ogness             2022-04-21  2968  		any_progress = false;
a699449bb13b70b kernel/printk/printk.c John Ogness             2022-04-21  2969  
fc956ae0de7fa25 kernel/printk/printk.c John Ogness             2022-11-16  2970  		cookie = console_srcu_read_lock();
fc956ae0de7fa25 kernel/printk/printk.c John Ogness             2022-11-16  2971  		for_each_console_srcu(con) {
cfa886eee9834d5 kernel/printk/printk.c Thomas Gleixner         2023-03-02  2972  			short flags = console_srcu_read_flags(con);
a699449bb13b70b kernel/printk/printk.c John Ogness             2022-04-21  2973  			bool progress;
a699449bb13b70b kernel/printk/printk.c John Ogness             2022-04-21  2974  
cfa886eee9834d5 kernel/printk/printk.c Thomas Gleixner         2023-03-02  2975  			/* console_flush_all() is only for legacy consoles. */
cfa886eee9834d5 kernel/printk/printk.c Thomas Gleixner         2023-03-02  2976  			if (flags & CON_NO_BKL)
cfa886eee9834d5 kernel/printk/printk.c Thomas Gleixner         2023-03-02  2977  				continue;
cfa886eee9834d5 kernel/printk/printk.c Thomas Gleixner         2023-03-02  2978  
cfa886eee9834d5 kernel/printk/printk.c Thomas Gleixner         2023-03-02 @2979  			if (!console_is_usable(con, flags))
a699449bb13b70b kernel/printk/printk.c John Ogness             2022-04-21  2980  				continue;
a699449bb13b70b kernel/printk/printk.c John Ogness             2022-04-21  2981  			any_usable = true;
a699449bb13b70b kernel/printk/printk.c John Ogness             2022-04-21  2982  
daaab5b5bba36a5 kernel/printk/printk.c John Ogness             2023-01-09  2983  			progress = console_emit_next_record(con, handover, cookie);
fc956ae0de7fa25 kernel/printk/printk.c John Ogness             2022-11-16  2984  
fc956ae0de7fa25 kernel/printk/printk.c John Ogness             2022-11-16  2985  			/*
fc956ae0de7fa25 kernel/printk/printk.c John Ogness             2022-11-16  2986  			 * If a handover has occurred, the SRCU read lock
fc956ae0de7fa25 kernel/printk/printk.c John Ogness             2022-11-16  2987  			 * is already released.
fc956ae0de7fa25 kernel/printk/printk.c John Ogness             2022-11-16  2988  			 */
a699449bb13b70b kernel/printk/printk.c John Ogness             2022-04-21  2989  			if (*handover)
a699449bb13b70b kernel/printk/printk.c John Ogness             2022-04-21  2990  				return false;
a699449bb13b70b kernel/printk/printk.c John Ogness             2022-04-21  2991  
a699449bb13b70b kernel/printk/printk.c John Ogness             2022-04-21  2992  			/* Track the next of the highest seq flushed. */
a699449bb13b70b kernel/printk/printk.c John Ogness             2022-04-21  2993  			if (con->seq > *next_seq)
a699449bb13b70b kernel/printk/printk.c John Ogness             2022-04-21  2994  				*next_seq = con->seq;
8d91f8b15361dfb kernel/printk/printk.c Tejun Heo               2016-01-15  2995  
a699449bb13b70b kernel/printk/printk.c John Ogness             2022-04-21  2996  			if (!progress)
a699449bb13b70b kernel/printk/printk.c John Ogness             2022-04-21  2997  				continue;
a699449bb13b70b kernel/printk/printk.c John Ogness             2022-04-21  2998  			any_progress = true;
a699449bb13b70b kernel/printk/printk.c John Ogness             2022-04-21  2999  
a699449bb13b70b kernel/printk/printk.c John Ogness             2022-04-21  3000  			/* Allow panic_cpu to take over the consoles safely. */
8ebc476fd51e6c0 kernel/printk/printk.c Stephen Brennan         2022-02-02  3001  			if (abandon_console_lock_in_panic())
fc956ae0de7fa25 kernel/printk/printk.c John Ogness             2022-11-16  3002  				goto abandon;
8ebc476fd51e6c0 kernel/printk/printk.c Stephen Brennan         2022-02-02  3003  
8d91f8b15361dfb kernel/printk/printk.c Tejun Heo               2016-01-15  3004  			if (do_cond_resched)
8d91f8b15361dfb kernel/printk/printk.c Tejun Heo               2016-01-15  3005  				cond_resched();
^1da177e4c3f415 kernel/printk.c        Linus Torvalds          2005-04-16  3006  		}
fc956ae0de7fa25 kernel/printk/printk.c John Ogness             2022-11-16  3007  		console_srcu_read_unlock(cookie);
a699449bb13b70b kernel/printk/printk.c John Ogness             2022-04-21  3008  	} while (any_progress);
dbdda842fe96f89 kernel/printk/printk.c Steven Rostedt (VMware  2018-01-10  3009) 
a699449bb13b70b kernel/printk/printk.c John Ogness             2022-04-21  3010  	return any_usable;
fc956ae0de7fa25 kernel/printk/printk.c John Ogness             2022-11-16  3011  
fc956ae0de7fa25 kernel/printk/printk.c John Ogness             2022-11-16  3012  abandon:
fc956ae0de7fa25 kernel/printk/printk.c John Ogness             2022-11-16  3013  	console_srcu_read_unlock(cookie);
fc956ae0de7fa25 kernel/printk/printk.c John Ogness             2022-11-16  3014  	return false;
a699449bb13b70b kernel/printk/printk.c John Ogness             2022-04-21  3015  }
fe3d8ad31cf51b0 kernel/printk.c        Feng Tang               2011-03-22  3016  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH printk v1 10/18] printk: nobkl: Add emit function and callback functions for atomic printing
  2023-03-03  0:19   ` kernel test robot
@ 2023-03-03 10:55     ` John Ogness
  0 siblings, 0 replies; 92+ messages in thread
From: John Ogness @ 2023-03-03 10:55 UTC (permalink / raw)
  To: kernel test robot, Petr Mladek
  Cc: oe-kbuild-all, Sergey Senozhatsky, Steven Rostedt,
	Thomas Gleixner, linux-kernel, Greg Kroah-Hartman

On 2023-03-03, kernel test robot <lkp@intel.com> wrote:
>>> kernel/printk/printk.c:2841:6: warning: no previous prototype for 'printk_get_next_message' [-Wmissing-prototypes]
>     2841 | bool printk_get_next_message(struct printk_message *pmsg, u64 seq,
>          |      ^~~~~~~~~~~~~~~~~~~~~~~

This function needs to be declared for !CONFIG_PRINTK as well.

diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h
index 8856beed65da..60d6bf18247e 100644
--- a/kernel/printk/internal.h
+++ b/kernel/printk/internal.h
@@ -188,10 +188,11 @@ struct cons_context_data {
 	struct printk_buffers		pbufs;
 };
 
-#ifdef CONFIG_PRINTK
-
 bool printk_get_next_message(struct printk_message *pmsg, u64 seq,
 			     bool is_extended, bool may_supress);
+
+#ifdef CONFIG_PRINTK
+
 void console_prepend_dropped(struct printk_message *pmsg,
 			     unsigned long dropped);
 

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* Re: [PATCH printk v1 11/18] printk: nobkl: Introduce printer threads
  2023-03-03  1:23   ` kernel test robot
@ 2023-03-03 10:56     ` John Ogness
  0 siblings, 0 replies; 92+ messages in thread
From: John Ogness @ 2023-03-03 10:56 UTC (permalink / raw)
  To: kernel test robot, Petr Mladek
  Cc: oe-kbuild-all, Sergey Senozhatsky, Steven Rostedt,
	Thomas Gleixner, linux-kernel, Greg Kroah-Hartman

On 2023-03-03, kernel test robot <lkp@intel.com> wrote:
>    kernel/printk/printk.c: In function 'console_flush_all':
>>> kernel/printk/printk.c:2979:30: error: implicit declaration of function 'console_is_usable'; did you mean 'console_exit_unsafe'? [-Werror=implicit-function-declaration]
>     2979 |                         if (!console_is_usable(con, flags))
>          |                              ^~~~~~~~~~~~~~~~~
>          |                              console_exit_unsafe

This macro needs to be defined for !CONFIG_PRINTK as well.

diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h
index 60d6bf18247e..e4fb600daf06 100644
--- a/kernel/printk/internal.h
+++ b/kernel/printk/internal.h
@@ -147,6 +147,7 @@ static inline void cons_kthread_create(struct console *con) { }
 static inline bool printk_percpu_data_ready(void) { return false; }
 static inline bool cons_nobkl_init(struct console *con) { return true; }
 static inline void cons_nobkl_cleanup(struct console *con) { }
+static inline bool console_is_usable(struct console *con, short flags) { return false; }
 
 #endif /* CONFIG_PRINTK */
 

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* Re: [PATCH printk v1 06/18] printk: nobkl: Add acquire/release logic
  2023-03-02 19:56 ` [PATCH printk v1 06/18] printk: nobkl: Add acquire/release logic John Ogness
@ 2023-03-06  9:07   ` Dan Carpenter
  2023-03-06  9:39     ` John Ogness
  2023-03-13 16:07   ` Petr Mladek
  2023-03-17 17:34   ` simplify: was: " Petr Mladek
  2 siblings, 1 reply; 92+ messages in thread
From: Dan Carpenter @ 2023-03-06  9:07 UTC (permalink / raw)
  To: oe-kbuild, John Ogness, Petr Mladek
  Cc: lkp, oe-kbuild-all, Sergey Senozhatsky, Steven Rostedt,
	Thomas Gleixner, linux-kernel, Greg Kroah-Hartman

Hi John,

url:    https://github.com/intel-lab-lkp/linux/commits/John-Ogness/kdb-do-not-assume-write-callback-available/20230303-040039
base:   10d639febe5629687dac17c4a7500a96537ce11a
patch link:    https://lore.kernel.org/r/20230302195618.156940-7-john.ogness%40linutronix.de
patch subject: [PATCH printk v1 06/18] printk: nobkl: Add acquire/release logic
config: i386-randconfig-m021 (https://download.01.org/0day-ci/archive/20230305/202303051319.m55kZE3v-lkp@intel.com/config)
compiler: gcc-11 (Debian 11.3.0-8) 11.3.0

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>
| Reported-by: Dan Carpenter <error27@gmail.com>
| Link: https://lore.kernel.org/r/202303051319.m55kZE3v-lkp@intel.com/

smatch warnings:
kernel/printk/printk_nobkl.c:391 cons_try_acquire_spin() warn: signedness bug returning '(-16)'

vim +391 kernel/printk/printk_nobkl.c

d444c8549ebdf3 Thomas Gleixner 2023-03-02  284  /**
d444c8549ebdf3 Thomas Gleixner 2023-03-02  285   * cons_try_acquire_spin - Complete the spinwait attempt
d444c8549ebdf3 Thomas Gleixner 2023-03-02  286   * @ctxt:	Pointer to an acquire context that contains
d444c8549ebdf3 Thomas Gleixner 2023-03-02  287   *		all information about the acquire mode
d444c8549ebdf3 Thomas Gleixner 2023-03-02  288   *
d444c8549ebdf3 Thomas Gleixner 2023-03-02  289   * @ctxt->hov_state contains the handover state that was set in
d444c8549ebdf3 Thomas Gleixner 2023-03-02  290   * state[REQ]
d444c8549ebdf3 Thomas Gleixner 2023-03-02  291   * @ctxt->req_state contains the request state that was set in
d444c8549ebdf3 Thomas Gleixner 2023-03-02  292   * state[CUR]
d444c8549ebdf3 Thomas Gleixner 2023-03-02  293   *
d444c8549ebdf3 Thomas Gleixner 2023-03-02  294   * Returns: 0 if successfully locked. -EBUSY on timeout. -EAGAIN on
d444c8549ebdf3 Thomas Gleixner 2023-03-02  295   * unexpected state values.

Out of date comments.

d444c8549ebdf3 Thomas Gleixner 2023-03-02  296   *
d444c8549ebdf3 Thomas Gleixner 2023-03-02  297   * On success @ctxt->state contains the new state that was set in
d444c8549ebdf3 Thomas Gleixner 2023-03-02  298   * state[CUR]
d444c8549ebdf3 Thomas Gleixner 2023-03-02  299   *
d444c8549ebdf3 Thomas Gleixner 2023-03-02  300   * On -EBUSY failure this context timed out. This context should either
d444c8549ebdf3 Thomas Gleixner 2023-03-02  301   * give up or attempt a hostile takeover.
d444c8549ebdf3 Thomas Gleixner 2023-03-02  302   *
d444c8549ebdf3 Thomas Gleixner 2023-03-02  303   * On -EAGAIN failure this context encountered unexpected state values.
d444c8549ebdf3 Thomas Gleixner 2023-03-02  304   * This context should retry the full handover request setup process (the
d444c8549ebdf3 Thomas Gleixner 2023-03-02  305   * handover request setup by cons_setup_handover() is now invalidated and
d444c8549ebdf3 Thomas Gleixner 2023-03-02  306   * must be performed again).

Out of date.

d444c8549ebdf3 Thomas Gleixner 2023-03-02  307   */
d444c8549ebdf3 Thomas Gleixner 2023-03-02  308  static bool cons_try_acquire_spin(struct cons_context *ctxt)
                                                       ^^^^
After reviewing the code, it looks the intention was the bool should be
changed to int.

d444c8549ebdf3 Thomas Gleixner 2023-03-02  309  {
d444c8549ebdf3 Thomas Gleixner 2023-03-02  310  	struct console *con = ctxt->console;
d444c8549ebdf3 Thomas Gleixner 2023-03-02  311  	struct cons_state cur;
d444c8549ebdf3 Thomas Gleixner 2023-03-02  312  	struct cons_state new;
d444c8549ebdf3 Thomas Gleixner 2023-03-02  313  	int err = -EAGAIN;
d444c8549ebdf3 Thomas Gleixner 2023-03-02  314  	int timeout;
d444c8549ebdf3 Thomas Gleixner 2023-03-02  315  
d444c8549ebdf3 Thomas Gleixner 2023-03-02  316  	/* Now wait for the other side to hand over */
d444c8549ebdf3 Thomas Gleixner 2023-03-02  317  	for (timeout = ctxt->spinwait_max_us; timeout >= 0; timeout--) {
d444c8549ebdf3 Thomas Gleixner 2023-03-02  318  		/* Timeout immediately if a remote panic is detected. */
d444c8549ebdf3 Thomas Gleixner 2023-03-02  319  		if (cons_check_panic())
d444c8549ebdf3 Thomas Gleixner 2023-03-02  320  			break;
d444c8549ebdf3 Thomas Gleixner 2023-03-02  321  
d444c8549ebdf3 Thomas Gleixner 2023-03-02  322  		cons_state_read(con, CON_STATE_CUR, &cur);
d444c8549ebdf3 Thomas Gleixner 2023-03-02  323  
d444c8549ebdf3 Thomas Gleixner 2023-03-02  324  		/*
d444c8549ebdf3 Thomas Gleixner 2023-03-02  325  		 * If the real state of the console matches the handover state
d444c8549ebdf3 Thomas Gleixner 2023-03-02  326  		 * that this context setup, then the handover was a success
d444c8549ebdf3 Thomas Gleixner 2023-03-02  327  		 * and this context is now the owner.
d444c8549ebdf3 Thomas Gleixner 2023-03-02  328  		 *
d444c8549ebdf3 Thomas Gleixner 2023-03-02  329  		 * Note that this might have raced with a new higher priority
d444c8549ebdf3 Thomas Gleixner 2023-03-02  330  		 * requester coming in after the lock was handed over.
d444c8549ebdf3 Thomas Gleixner 2023-03-02  331  		 * However, that requester will see that the owner changes and
d444c8549ebdf3 Thomas Gleixner 2023-03-02  332  		 * setup a new request for the current owner (this context).
d444c8549ebdf3 Thomas Gleixner 2023-03-02  333  		 */
d444c8549ebdf3 Thomas Gleixner 2023-03-02  334  		if (cons_state_bits_match(cur, ctxt->hov_state))
d444c8549ebdf3 Thomas Gleixner 2023-03-02  335  			goto success;
d444c8549ebdf3 Thomas Gleixner 2023-03-02  336  
d444c8549ebdf3 Thomas Gleixner 2023-03-02  337  		/*
d444c8549ebdf3 Thomas Gleixner 2023-03-02  338  		 * If state changed since the request was made, give up as
d444c8549ebdf3 Thomas Gleixner 2023-03-02  339  		 * it is no longer consistent. This must include
d444c8549ebdf3 Thomas Gleixner 2023-03-02  340  		 * state::req_prio since there could be a higher priority
d444c8549ebdf3 Thomas Gleixner 2023-03-02  341  		 * request available.
d444c8549ebdf3 Thomas Gleixner 2023-03-02  342  		 */
d444c8549ebdf3 Thomas Gleixner 2023-03-02  343  		if (cur.bits != ctxt->req_state.bits)
d444c8549ebdf3 Thomas Gleixner 2023-03-02  344  			goto cleanup;
d444c8549ebdf3 Thomas Gleixner 2023-03-02  345  
d444c8549ebdf3 Thomas Gleixner 2023-03-02  346  		/*
d444c8549ebdf3 Thomas Gleixner 2023-03-02  347  		 * Finally check whether the handover state is still
d444c8549ebdf3 Thomas Gleixner 2023-03-02  348  		 * the same.
d444c8549ebdf3 Thomas Gleixner 2023-03-02  349  		 */
d444c8549ebdf3 Thomas Gleixner 2023-03-02  350  		cons_state_read(con, CON_STATE_REQ, &cur);
d444c8549ebdf3 Thomas Gleixner 2023-03-02  351  		if (cur.atom != ctxt->hov_state.atom)
d444c8549ebdf3 Thomas Gleixner 2023-03-02  352  			goto cleanup;
d444c8549ebdf3 Thomas Gleixner 2023-03-02  353  
d444c8549ebdf3 Thomas Gleixner 2023-03-02  354  		/* Account time */
d444c8549ebdf3 Thomas Gleixner 2023-03-02  355  		if (timeout > 0)
d444c8549ebdf3 Thomas Gleixner 2023-03-02  356  			udelay(1);
d444c8549ebdf3 Thomas Gleixner 2023-03-02  357  	}
d444c8549ebdf3 Thomas Gleixner 2023-03-02  358  
d444c8549ebdf3 Thomas Gleixner 2023-03-02  359  	/*
d444c8549ebdf3 Thomas Gleixner 2023-03-02  360  	 * Timeout. Cleanup the handover state and carefully try to reset
d444c8549ebdf3 Thomas Gleixner 2023-03-02  361  	 * req_prio in the real state. The reset is important to ensure
d444c8549ebdf3 Thomas Gleixner 2023-03-02  362  	 * that the owner does not hand over the lock after this context
d444c8549ebdf3 Thomas Gleixner 2023-03-02  363  	 * has given up waiting.
d444c8549ebdf3 Thomas Gleixner 2023-03-02  364  	 */
d444c8549ebdf3 Thomas Gleixner 2023-03-02  365  	cons_cleanup_handover(ctxt);
d444c8549ebdf3 Thomas Gleixner 2023-03-02  366  
d444c8549ebdf3 Thomas Gleixner 2023-03-02  367  	cons_state_read(con, CON_STATE_CUR, &cur);
d444c8549ebdf3 Thomas Gleixner 2023-03-02  368  	do {
d444c8549ebdf3 Thomas Gleixner 2023-03-02  369  		/*
d444c8549ebdf3 Thomas Gleixner 2023-03-02  370  		 * The timeout might have raced with the owner coming late
d444c8549ebdf3 Thomas Gleixner 2023-03-02  371  		 * and handing it over gracefully.
d444c8549ebdf3 Thomas Gleixner 2023-03-02  372  		 */
d444c8549ebdf3 Thomas Gleixner 2023-03-02  373  		if (cons_state_bits_match(cur, ctxt->hov_state))
d444c8549ebdf3 Thomas Gleixner 2023-03-02  374  			goto success;
d444c8549ebdf3 Thomas Gleixner 2023-03-02  375  
d444c8549ebdf3 Thomas Gleixner 2023-03-02  376  		/*
d444c8549ebdf3 Thomas Gleixner 2023-03-02  377  		 * Validate that the state matches with the state at request
d444c8549ebdf3 Thomas Gleixner 2023-03-02  378  		 * time. If this check fails, there is already a higher
d444c8549ebdf3 Thomas Gleixner 2023-03-02  379  		 * priority context waiting or the owner has changed (either
d444c8549ebdf3 Thomas Gleixner 2023-03-02  380  		 * by higher priority or by hostile takeover). In all fail
d444c8549ebdf3 Thomas Gleixner 2023-03-02  381  		 * cases this context is no longer in line for a handover to
d444c8549ebdf3 Thomas Gleixner 2023-03-02  382  		 * take place, so no reset is necessary.
d444c8549ebdf3 Thomas Gleixner 2023-03-02  383  		 */
d444c8549ebdf3 Thomas Gleixner 2023-03-02  384  		if (cur.bits != ctxt->req_state.bits)
d444c8549ebdf3 Thomas Gleixner 2023-03-02  385  			goto cleanup;
d444c8549ebdf3 Thomas Gleixner 2023-03-02  386  
d444c8549ebdf3 Thomas Gleixner 2023-03-02  387  		copy_full_state(new, cur);
d444c8549ebdf3 Thomas Gleixner 2023-03-02  388  		new.req_prio = 0;
d444c8549ebdf3 Thomas Gleixner 2023-03-02  389  	} while (!cons_state_try_cmpxchg(con, CON_STATE_CUR, &cur, &new));
d444c8549ebdf3 Thomas Gleixner 2023-03-02  390  	/* Reset worked. Report timeout. */
d444c8549ebdf3 Thomas Gleixner 2023-03-02 @391  	return -EBUSY;
d444c8549ebdf3 Thomas Gleixner 2023-03-02  392  
d444c8549ebdf3 Thomas Gleixner 2023-03-02  393  success:
d444c8549ebdf3 Thomas Gleixner 2023-03-02  394  	/* Store the real state */
d444c8549ebdf3 Thomas Gleixner 2023-03-02  395  	copy_full_state(ctxt->state, cur);
d444c8549ebdf3 Thomas Gleixner 2023-03-02  396  	ctxt->hostile = false;
d444c8549ebdf3 Thomas Gleixner 2023-03-02  397  	err = 0;
d444c8549ebdf3 Thomas Gleixner 2023-03-02  398  
d444c8549ebdf3 Thomas Gleixner 2023-03-02  399  cleanup:
d444c8549ebdf3 Thomas Gleixner 2023-03-02  400  	cons_cleanup_handover(ctxt);
d444c8549ebdf3 Thomas Gleixner 2023-03-02  401  	return err;
d444c8549ebdf3 Thomas Gleixner 2023-03-02  402  }

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH printk v1 06/18] printk: nobkl: Add acquire/release logic
  2023-03-06  9:07   ` Dan Carpenter
@ 2023-03-06  9:39     ` John Ogness
  0 siblings, 0 replies; 92+ messages in thread
From: John Ogness @ 2023-03-06  9:39 UTC (permalink / raw)
  To: Dan Carpenter, oe-kbuild, Petr Mladek
  Cc: lkp, oe-kbuild-all, Sergey Senozhatsky, Steven Rostedt,
	Thomas Gleixner, linux-kernel, Greg Kroah-Hartman

On 2023-03-06, Dan Carpenter <error27@gmail.com> wrote:
> smatch warnings:
> kernel/printk/printk_nobkl.c:391 cons_try_acquire_spin() warn: signedness bug returning '(-16)'

Great catch. That function used to return bool, but recently changed to
int. The consequence of the bug is that a waiter could prematurely abort
the spin.

diff --git a/kernel/printk/printk_nobkl.c b/kernel/printk/printk_nobkl.c
index 78136347a328..bcd75e5bd9c8 100644
--- a/kernel/printk/printk_nobkl.c
+++ b/kernel/printk/printk_nobkl.c
@@ -305,7 +305,7 @@ static bool cons_setup_request(struct cons_context *ctxt, struct cons_state old)
  * handover request setup by cons_setup_handover() is now invalidated and
  * must be performed again).
  */
-static bool cons_try_acquire_spin(struct cons_context *ctxt)
+static int cons_try_acquire_spin(struct cons_context *ctxt)
 {
 	struct console *con = ctxt->console;
 	struct cons_state cur;

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* Re: [PATCH printk v1 01/18] kdb: do not assume write() callback available
  2023-03-02 19:56 ` [PATCH printk v1 01/18] kdb: do not assume write() callback available John Ogness
@ 2023-03-07 14:57   ` Petr Mladek
  2023-03-07 16:34   ` Doug Anderson
  2023-03-09 10:52   ` Daniel Thompson
  2 siblings, 0 replies; 92+ messages in thread
From: Petr Mladek @ 2023-03-07 14:57 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Jason Wessel, Daniel Thompson, Douglas Anderson,
	Aaron Tomlin, Luis Chamberlain, kgdb-bugreport

On Thu 2023-03-02 21:02:01, John Ogness wrote:
> It is allowed for consoles to provide no write() callback. For
> example ttynull does this.
> 
> Check if a write() callback is available before using it.
> 
> Signed-off-by: John Ogness <john.ogness@linutronix.de>

Reviewed-by: Petr Mladek <pmladek@suse.com>

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH printk v1 02/18] printk: Add NMI check to down_trylock_console_sem()
  2023-03-02 19:56 ` [PATCH printk v1 02/18] printk: Add NMI check to down_trylock_console_sem() John Ogness
@ 2023-03-07 16:05   ` Petr Mladek
  2023-03-17 11:37     ` John Ogness
  0 siblings, 1 reply; 92+ messages in thread
From: Petr Mladek @ 2023-03-07 16:05 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel

On Thu 2023-03-02 21:02:02, John Ogness wrote:
> The printk path is NMI safe because it only adds content to the
> buffer and then triggers the delayed output via irq_work. If the
> console is flushed or unblanked (on panic) from NMI then it can
> deadlock in down_trylock_console_sem() because the semaphore is not
> NMI safe.

Do you have any particular code path in mind, please?
This does not work in console_flush_on_panic(), see below.

> Avoid try-locking the console from NMI and assume it failed.
> 
> Signed-off-by: John Ogness <john.ogness@linutronix.de>
> ---
>  kernel/printk/printk.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 40c5f4170ac7..84af038292d9 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -318,6 +318,10 @@ static int __down_trylock_console_sem(unsigned long ip)
>  	int lock_failed;
>  	unsigned long flags;
>  
> +	/* Semaphores are not NMI-safe. */
> +	if (in_nmi())
> +		return 1;

console_flush_on_panic() ignores the console_trylock() return value:

void console_flush_on_panic(enum con_flush_mode mode)
{
[...]
	/*
	 * If someone else is holding the console lock, trylock will fail
	 * and may_schedule may be set.  Ignore and proceed to unlock so
	 * that messages are flushed out.  As this can be called from any
	 * context and we don't want to get preempted while flushing,
	 * ensure may_schedule is cleared.
	 */
	console_trylock();
	console_may_schedule = 0;
	console_unlock();
}

So that this change would cause a non-paired console_unlock().
And console_unlock might still deadlock on the console_sem->lock.


OK, your change makes sense. But we still should try flushing
the messages in console_flush_on_panic() even in NMI.

One solution would be to call console_flush_all() directly in
console_flush_on_panic() without taking console_lock().
It should not be worse than the current code which ignores
the console_trylock() return value.

Note that it mostly works because console_flush_on_panic() is called
when other CPUs are supposed to be stopped.

We only would need to prevent other CPUs from flushing messages
as well if they were still running by chance. But we actually already
do this, see abandon_console_lock_in_panic(). Well, we should
make sure that the abandon_console_lock_in_panic() check is
done before flushing the first message.

All these changes together would prevent deadlock on console_sem->lock.
But the synchronization "guarantees" should stay the same.

> +
>  	/*
>  	 * Here and in __up_console_sem() we need to be in safe mode,
>  	 * because spindump/WARN/etc from under console ->lock will

Alternative solution would be to make the generic down_trylock() safe
in NMI or in panic(). It might do spin_trylock() when oops_in_progress
is set. I mean to do the same trick and console drivers do with
port->lock.

But I am not sure if other down_trylock() users would be happy with
this change. Yes, it might get solved by introducing down_trylock_panic()
that might be used only in console_flush_on_panic(). But it might
be more hairy than the solution proposed above.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH printk v1 01/18] kdb: do not assume write() callback available
  2023-03-02 19:56 ` [PATCH printk v1 01/18] kdb: do not assume write() callback available John Ogness
  2023-03-07 14:57   ` Petr Mladek
@ 2023-03-07 16:34   ` Doug Anderson
  2023-03-09 10:52   ` Daniel Thompson
  2 siblings, 0 replies; 92+ messages in thread
From: Doug Anderson @ 2023-03-07 16:34 UTC (permalink / raw)
  To: John Ogness
  Cc: Petr Mladek, Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Jason Wessel, Daniel Thompson, Aaron Tomlin,
	Luis Chamberlain, kgdb-bugreport

Hi,

On Thu, Mar 2, 2023 at 11:57 AM John Ogness <john.ogness@linutronix.de> wrote:
>
> It is allowed for consoles to provide no write() callback. For
> example ttynull does this.
>
> Check if a write() callback is available before using it.
>
> Signed-off-by: John Ogness <john.ogness@linutronix.de>
> ---
>  kernel/debug/kdb/kdb_io.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/kernel/debug/kdb/kdb_io.c b/kernel/debug/kdb/kdb_io.c
> index 5c7e9ba7cd6b..e9139dfc1f0a 100644
> --- a/kernel/debug/kdb/kdb_io.c
> +++ b/kernel/debug/kdb/kdb_io.c
> @@ -576,6 +576,8 @@ static void kdb_msg_write(const char *msg, int msg_len)
>                         continue;
>                 if (c == dbg_io_ops->cons)
>                         continue;
> +               if (!c->write)
> +                       continue;

Reviewed-by: Douglas Anderson <dianders@chromium.org>

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH printk v1 03/18] printk: Consolidate console deferred printing
  2023-03-02 19:56 ` [PATCH printk v1 03/18] printk: Consolidate console deferred printing John Ogness
@ 2023-03-08 13:15   ` Petr Mladek
  2023-03-17 13:05     ` John Ogness
  0 siblings, 1 reply; 92+ messages in thread
From: Petr Mladek @ 2023-03-08 13:15 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel

On Thu 2023-03-02 21:02:03, John Ogness wrote:
> Printig to consoles can be deferred for several reasons:
> 
> - explicitly with printk_deferred()
> - printk() in NMI context
> - recursive printk() calls
> 
> The current implementation is not consistent. For printk_deferred(),
> irq work is scheduled twice. For NMI und recursive, panic CPU
> suppression and caller delays are not properly enforced.
> 
> Correct these inconsistencies by consolidating the deferred printing
> code so that vprintk_deferred() is the toplevel function for
> deferred printing and vprintk_emit() will perform whichever irq_work
> queueing is appropriate.
> 
> Also add kerneldoc for wake_up_klogd() and defer_console_output() to
> clarify their differences and appropriate usage.

> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -2321,7 +2321,10 @@ asmlinkage int vprintk_emit(int facility, int level,
>  		preempt_enable();
>  	}
>  
> -	wake_up_klogd();
> +	if (in_sched)
> +		defer_console_output();
> +	else
> +		wake_up_klogd();

Nit: I would add an empty line here. Or I would move this up into the
     previous if (in_sched()) condition.

>  	return printed_len;
>  }
>  EXPORT_SYMBOL(vprintk_emit);
> @@ -3811,11 +3814,30 @@ static void __wake_up_klogd(int val)
>  	preempt_enable();
>  }
>  
> +/**
> + * wake_up_klogd - Wake kernel logging daemon
> + *
> + * Use this function when new records have been added to the ringbuffer
> + * and the console printing for those records is handled elsewhere. In

"elsewhere" is ambiguous. I would write:

"and the console printing for those records maybe handled in this context".

> + * this case only the logging daemon needs to be woken.
> + *
> + * Context: Any context.
> + */
>  void wake_up_klogd(void)
>  {
>  	__wake_up_klogd(PRINTK_PENDING_WAKEUP);
>  }
>  

Anyway, I like this change.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH printk v1 04/18] printk: Add per-console suspended state
  2023-03-02 19:56 ` [PATCH printk v1 04/18] printk: Add per-console suspended state John Ogness
@ 2023-03-08 14:40   ` Petr Mladek
  2023-03-17 13:22     ` John Ogness
  0 siblings, 1 reply; 92+ messages in thread
From: Petr Mladek @ 2023-03-08 14:40 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman

On Thu 2023-03-02 21:02:04, John Ogness wrote:
> Currently the global @console_suspended is used to determine if
> consoles are in a suspended state. Its primary purpose is to allow
> usage of the console_lock when suspended without causing console
> printing. It is synchronized by the console_lock.
> 
> Rather than relying on the console_lock to determine suspended
> state, make it an official per-console state that is set within
> console->flags. This allows the state to be queried via SRCU.
> 
> @console_suspended will continue to exist, but now only to implement
> the console_lock/console_unlock trickery and _not_ to represent
> the suspend state of a particular console.
> 
> --- a/include/linux/console.h
> +++ b/include/linux/console.h
> @@ -153,6 +153,8 @@ static inline int con_debug_leave(void)
>   *			receiving the printk spam for obvious reasons.
>   * @CON_EXTENDED:	The console supports the extended output format of
>   *			/dev/kmesg which requires a larger output buffer.
> + * @CON_SUSPENDED:	Indicates if a console is suspended. If true, the
> + *			printing callbacks must not be called.
>   */
>  enum cons_flags {
>  	CON_PRINTBUFFER		= BIT(0),
> @@ -162,6 +164,7 @@ enum cons_flags {
>  	CON_ANYTIME		= BIT(4),
>  	CON_BRL			= BIT(5),
>  	CON_EXTENDED		= BIT(6),
> +	CON_SUSPENDED		= BIT(7),

We have to show it in /proc/consoles, see fs/proc/consoles.c.

> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -2563,10 +2563,26 @@ MODULE_PARM_DESC(console_no_auto_verbose, "Disable console loglevel raise to hig
>   */
>  void suspend_console(void)
>  {
> +	struct console *con;
> +
>  	if (!console_suspend_enabled)
>  		return;
>  	pr_info("Suspending console(s) (use no_console_suspend to debug)\n");
>  	pr_flush(1000, true);
> +
> +	console_list_lock();
> +	for_each_console(con)
> +		console_srcu_write_flags(con, con->flags | CON_SUSPENDED);
> +	console_list_unlock();
> +
> +	/*
> +	 * Ensure that all SRCU list walks have completed. All printing
> +	 * contexts must be able to see that they are suspended so that it
> +	 * is guaranteed that all printing has stopped when this function
> +	 * completes.
> +	 */
> +	synchronize_srcu(&console_srcu);
> +
>  	console_lock();
>  	console_suspended = 1;
>  	up_console_sem();
> @@ -2574,11 +2590,26 @@ void suspend_console(void)
>  
>  void resume_console(void)
>  {
> +	struct console *con;
> +
>  	if (!console_suspend_enabled)
>  		return;
>  	down_console_sem();
>  	console_suspended = 0;
>  	console_unlock();
> +
> +	console_list_lock();
> +	for_each_console(con)
> +		console_srcu_write_flags(con, con->flags & ~CON_SUSPENDED);
> +	console_list_unlock();
> +
> +	/*
> +	 * Ensure that all SRCU list walks have completed. All printing
> +	 * contexts must be able to see they are no longer suspended so
> +	 * that they are guaranteed to wake up and resume printing.
> +	 */
> +	synchronize_srcu(&console_srcu);
> +

The setting of the global "console_suspended" and per-console
CON_SUSPENDED flag is not synchronized. As a result, they might
become inconsistent:

CPU0				CPU1

suspend_console()
  console_list_lock();
  # set CON_SUSPENDED
  console_list_unlock();

				console_resume()
				  down_console_sem();
				  console_suspended = 0;
				  console_unlock();

				  console_list_lock()
				  # clear CON_SUSPENDED
				  console_list_unlock();

  console_lock();
  console_suspended = 1;
  up_console_sem();

I think that we could just remove the global "console_suspended" flag.

IMHO, it used to be needed to avoid moving the global "console_seq" forward
when the consoles were suspended. But it is not needed now with the
per-console con->seq. console_flush_all() skips consoles when
console_is_usable() fails and it bails out when there is no progress.

It seems that both console_flush_all() and console_unlock() would
handle this correctly.

Hmm, it would change the behavior of console_lock() and console_trylock().
They would set "console_locked" and "console_may_schedule" even when
the consoles are suspended. But it should be OK:

   + "console_may_schedule" actually should be set according
     to the context where console_unlock() will be called.

   + "console_locked" seems to be used only in WARN_CONSOLE_UNLOCKED().
     I could imagine a corner case where, for example, "vt" code does
     not print the warning because it works as it works. But it does
     not make much sense. IMHO, such a code should get fixed. And it
     is just a warning after all.

>  	pr_flush(1000, true);
>  }
>  
> @@ -3712,14 +3745,7 @@ static bool __pr_flush(struct console *con, int timeout_ms, bool reset_on_progre
>  		}
>  		console_srcu_read_unlock(cookie);
>  
> -		/*
> -		 * If consoles are suspended, it cannot be expected that they
> -		 * make forward progress, so timeout immediately. @diff is
> -		 * still used to return a valid flush status.
> -		 */
> -		if (console_suspended)
> -			remaining = 0;

Heh, I though that this might cause regression, e.g. non-necessary
delays during suspend.

But it actually works because "diff" is counted only for usable
consoles. It will stay "0" if there is no usable console.

I wonder if it would make sense to add a comment somewhere,
e.g. above the later check:

+		/* diff is zero also when there is no usable console. */
		if (diff == 0 || remaining == 0)
			break;

Anyway, we should update the comment above pr_flush():

-  * Return: true if all enabled printers are caught up.
+  * Return: true if all usable printers are caught up.

> -		else if (diff != last_diff && reset_on_progress)
> +		if (diff != last_diff && reset_on_progress)
>  			remaining = timeout_ms;
>  
>  		console_unlock();

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH printk v1 01/18] kdb: do not assume write() callback available
  2023-03-02 19:56 ` [PATCH printk v1 01/18] kdb: do not assume write() callback available John Ogness
  2023-03-07 14:57   ` Petr Mladek
  2023-03-07 16:34   ` Doug Anderson
@ 2023-03-09 10:52   ` Daniel Thompson
  2023-03-09 11:26     ` Petr Mladek
  2 siblings, 1 reply; 92+ messages in thread
From: Daniel Thompson @ 2023-03-09 10:52 UTC (permalink / raw)
  To: John Ogness
  Cc: Petr Mladek, Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Jason Wessel, Douglas Anderson, Aaron Tomlin,
	Luis Chamberlain, kgdb-bugreport

On Thu, Mar 02, 2023 at 09:02:01PM +0106, John Ogness wrote:
> It is allowed for consoles to provide no write() callback. For
> example ttynull does this.
>
> Check if a write() callback is available before using it.
>
> Signed-off-by: John Ogness <john.ogness@linutronix.de>

Reviewed-by: Daniel Thompson <daniel.thompson@linaro.org>

Any thoughts on best way to land the series. All via one tree or can
we pick and mix?


Daniel.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH printk v1 00/18] threaded/atomic console support
  2023-03-02 19:56 [PATCH printk v1 00/18] threaded/atomic console support John Ogness
                   ` (18 preceding siblings ...)
  2023-03-02 19:58 ` [PATCH printk v1 00/18] serial: 8250: implement non-BKL console John Ogness
@ 2023-03-09 10:55 ` Daniel Thompson
  2023-03-09 11:14   ` John Ogness
  19 siblings, 1 reply; 92+ messages in thread
From: Daniel Thompson @ 2023-03-09 10:55 UTC (permalink / raw)
  To: John Ogness
  Cc: Petr Mladek, Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Jason Wessel, Douglas Anderson, Aaron Tomlin,
	Luis Chamberlain, kgdb-bugreport, Greg Kroah-Hartman,
	linux-fsdevel, Andrew Morton, Guilherme G. Piccoli, David Gow,
	Tiezhu Yang, Daniel Vetter, tangmeng, Paul E. McKenney,
	Frederic Weisbecker, Neeraj Upadhyay, Josh Triplett,
	Mathieu Desnoyers, Lai Jiangshan, Joel Fernandes, rcu

On Thu, Mar 02, 2023 at 09:02:00PM +0106, John Ogness wrote:
> Hi,
>
> This is v1 of a series to bring in a new threaded/atomic console
> infrastructure. The history, motivation, and various explanations and
> examples are available in the cover letter of tglx's RFC series
> [0]. From that series, patches 1-18 have been mainlined as of the 6.3
> merge window. What remains, patches 19-29, is what this series
> represents.

So I grabbed the whole series and pointed it at the kgdb test suite.

Don't get too excited about that (the test suite only exercises 8250
and PL011... and IIUC little in the set should impact UART polling
anyway) but FWIW:
Tested-by: Daniel Thompson <daniel.thompson@linaro.org>


Daniel.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH printk v1 00/18] threaded/atomic console support
  2023-03-09 10:55 ` [PATCH printk v1 00/18] threaded/atomic console support Daniel Thompson
@ 2023-03-09 11:14   ` John Ogness
  0 siblings, 0 replies; 92+ messages in thread
From: John Ogness @ 2023-03-09 11:14 UTC (permalink / raw)
  To: Daniel Thompson
  Cc: Petr Mladek, Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Jason Wessel, Douglas Anderson, Aaron Tomlin,
	Luis Chamberlain, kgdb-bugreport, Greg Kroah-Hartman,
	linux-fsdevel, Andrew Morton, Guilherme G. Piccoli, David Gow,
	Tiezhu Yang, Daniel Vetter, tangmeng, Paul E. McKenney,
	Frederic Weisbecker, Neeraj Upadhyay, Josh Triplett,
	Mathieu Desnoyers, Lai Jiangshan, Joel Fernandes, rcu

On 2023-03-09, Daniel Thompson <daniel.thompson@linaro.org> wrote:
> So I grabbed the whole series and pointed it at the kgdb test suite.
>
> Don't get too excited about that (the test suite only exercises 8250
> and PL011... and IIUC little in the set should impact UART polling
> anyway) but FWIW:
>
> Tested-by: Daniel Thompson <daniel.thompson@linaro.org>

One of the claims of this series is that it does not break any existing
drivers/infrastructure. So any successful test results are certainly of
value.

John

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH printk v1 01/18] kdb: do not assume write() callback available
  2023-03-09 10:52   ` Daniel Thompson
@ 2023-03-09 11:26     ` Petr Mladek
  2023-03-09 11:30       ` Daniel Thompson
  0 siblings, 1 reply; 92+ messages in thread
From: Petr Mladek @ 2023-03-09 11:26 UTC (permalink / raw)
  To: Daniel Thompson
  Cc: John Ogness, Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Jason Wessel, Douglas Anderson, Aaron Tomlin,
	Luis Chamberlain, kgdb-bugreport

On Thu 2023-03-09 10:52:40, Daniel Thompson wrote:
> On Thu, Mar 02, 2023 at 09:02:01PM +0106, John Ogness wrote:
> > It is allowed for consoles to provide no write() callback. For
> > example ttynull does this.
> >
> > Check if a write() callback is available before using it.
> >
> > Signed-off-by: John Ogness <john.ogness@linutronix.de>
> 
> Reviewed-by: Daniel Thompson <daniel.thompson@linaro.org>
> 
> Any thoughts on best way to land the series. All via one tree or can
> we pick and mix?

I would prefer to take everything via the printk tree because
most changes are there. Otherwise, we might end up with non-necessary
cross-tree merge conflicts. Also I would know when all pieces are
there.

That said, this seems to be the only change in
kernel/debug/kdb/kdb_io.c and it is relatively independent.
So, it should not be a big problem to take it separately.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH printk v1 01/18] kdb: do not assume write() callback available
  2023-03-09 11:26     ` Petr Mladek
@ 2023-03-09 11:30       ` Daniel Thompson
  0 siblings, 0 replies; 92+ messages in thread
From: Daniel Thompson @ 2023-03-09 11:30 UTC (permalink / raw)
  To: Petr Mladek
  Cc: John Ogness, Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Jason Wessel, Douglas Anderson, Aaron Tomlin,
	Luis Chamberlain, kgdb-bugreport

On Thu, Mar 09, 2023 at 12:26:23PM +0100, Petr Mladek wrote:
> On Thu 2023-03-09 10:52:40, Daniel Thompson wrote:
> > On Thu, Mar 02, 2023 at 09:02:01PM +0106, John Ogness wrote:
> > > It is allowed for consoles to provide no write() callback. For
> > > example ttynull does this.
> > >
> > > Check if a write() callback is available before using it.
> > >
> > > Signed-off-by: John Ogness <john.ogness@linutronix.de>
> >
> > Reviewed-by: Daniel Thompson <daniel.thompson@linaro.org>
> >
> > Any thoughts on best way to land the series. All via one tree or can
> > we pick and mix?
>
> I would prefer to take everything via the printk tree because
> most changes are there. Otherwise, we might end up with non-necessary
> cross-tree merge conflicts. Also I would know when all pieces are
> there.
>
> That said, this seems to be the only change in
> kernel/debug/kdb/kdb_io.c and it is relatively independent.
> So, it should not be a big problem to take it separately.

Enthusiastically
Acked-by: Daniel Thompson <daniel.thompson@linaro.org>

That suits me fine: kgdb is pretty quiet at the moment so, whilst I
can't predict what patches will show up this cycle, this probably spares
me from having to put together a PR for a single patch!


Daniel.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* global states: was: Re: [PATCH printk v1 05/18] printk: Add non-BKL console basic infrastructure
  2023-03-02 19:56 ` [PATCH printk v1 05/18] printk: Add non-BKL console basic infrastructure John Ogness
@ 2023-03-09 14:08   ` Petr Mladek
  2023-03-17 13:29     ` John Ogness
  2023-03-09 15:32   ` naming: " Petr Mladek
  2023-03-21 16:04   ` union: was: " Petr Mladek
  2 siblings, 1 reply; 92+ messages in thread
From: Petr Mladek @ 2023-03-09 14:08 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman, linux-fsdevel

On Thu 2023-03-02 21:02:05, John Ogness wrote:
> From: Thomas Gleixner <tglx@linutronix.de>
> 
> The current console/printk subsystem is protected by a Big Kernel Lock,
> (aka console_lock) which has ill defined semantics and is more or less
> stateless. This puts severe limitations on the console subsystem and
> makes forced takeover and output in emergency and panic situations a
> fragile endavour which is based on try and pray.
> 
> The goal of non-BKL consoles is to break out of the console lock jail
> and to provide a new infrastructure that avoids the pitfalls and
> allows console drivers to be gradually converted over.
> 
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -3472,6 +3492,14 @@ void register_console(struct console *newcon)
>  	newcon->dropped = 0;
>  	console_init_seq(newcon, bootcon_registered);
>  
> +	if (!(newcon->flags & CON_NO_BKL))
> +		have_bkl_console = true;

We never clear this value even when the console gets unregistered.

> +	else
> +		cons_nobkl_init(newcon);
> +
> +	if (newcon->flags & CON_BOOT)
> +		have_boot_console = true;
> +
>  	/*
>  	 * Put this console in the list - keep the
>  	 * preferred driver at the head of the list.
> @@ -3515,6 +3543,9 @@ void register_console(struct console *newcon)
>  			if (con->flags & CON_BOOT)
>  				unregister_console_locked(con);
>  		}
> +
> +		/* All boot consoles have been unregistered. */
> +		have_boot_console = false;

The boot consoles can be removed also by printk_late_init().

I would prefer to make this more error-proof and update both
have_bkl_console and have_boot_console in unregister_console().

A solution would be to use a reference counter instead of the boolean.
I am not sure if it is worth it. But it seems that refcount_read()
is just simple atomic read, aka READ_ONCE().


>  	}
>  unlock:
>  	console_list_unlock();

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 92+ messages in thread

* naming: Re: [PATCH printk v1 05/18] printk: Add non-BKL console basic infrastructure
  2023-03-02 19:56 ` [PATCH printk v1 05/18] printk: Add non-BKL console basic infrastructure John Ogness
  2023-03-09 14:08   ` global states: was: " Petr Mladek
@ 2023-03-09 15:32   ` Petr Mladek
  2023-03-17 13:39     ` John Ogness
  2023-03-21 16:04   ` union: was: " Petr Mladek
  2 siblings, 1 reply; 92+ messages in thread
From: Petr Mladek @ 2023-03-09 15:32 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman, linux-fsdevel

On Thu 2023-03-02 21:02:05, John Ogness wrote:
> From: Thomas Gleixner <tglx@linutronix.de>
> 
> The current console/printk subsystem is protected by a Big Kernel Lock,
> (aka console_lock) which has ill defined semantics and is more or less
> stateless. This puts severe limitations on the console subsystem and
> makes forced takeover and output in emergency and panic situations a
> fragile endavour which is based on try and pray.
> 
> The goal of non-BKL consoles is to break out of the console lock jail
> and to provide a new infrastructure that avoids the pitfalls and
> allows console drivers to be gradually converted over.
> 
> The proposed infrastructure aims for the following properties:
> 
>   - Per console locking instead of global locking
>   - Per console state which allows to make informed decisions
>   - Stateful handover and takeover
> 

So, this patch adds:

	CON_NO_BKL		= BIT(8),

	struct cons_state {

	atomic_long_t		__private atomic_state[2];

	include/linux/console.h
	kernel/printk/printk_nobkl.c

	enum state_selector {
		CON_STATE_CUR,

	cons_state_set()
	cons_state_try_cmpxchg()

	cons_nobkl_init()
	cons_nobkl_cleanup()


later patches add:

	console_can_proceed(struct cons_write_context *wctxt);
	console_enter_unsafe(struct cons_write_context *wctxt);

	cons_atomic_enter()
	cons_atomic_flush();

	static bool cons_emit_record(struct cons_write_context *wctxt)


All the above names seem to be used only by the NOBLK consoles.
And they use "cons", "NO_BKL", "nobkl", "cons_atomic", "atomic", "console".

I wonder if there is a system or if the names just evolved during several
reworks.

Please, let me know if I am over the edge, like too picky and that it
is not worth it. But you know me. I think that it should help to be
more consistent. And it actually might be a good idea to separate
API specific to the NOBKL consoles.

Here is my opinion:

1. I am not even sure if "nobkl", aka "no_big_kernel_lock" is the
   major property of these consoles.

   It might get confused by the real famous big kernel lock.
   Also I tend to confuse this with "noblk", aka "non-blocking".

   I always liked the "atomic consoles" description.


2. More importantly, an easy to distinguish shortcat would be nice
   as a common prefix. The following comes to my mind:

   + nbcon - aka nobkl/noblk consoles API
   + acon  - atomic console API


It would look like:

a) variant with nbcom:


	CON_NB		= BIT(8),

	struct nbcon_state {
	atomic_long_t		__private atomic_nbcon_state[2];

	include/linux/console.h
	kernel/printk/nbcon.c

	enum nbcon_state_selector {
		NBCON_STATE_CUR,

	nbcon_state_set()
	nbcon_state_try_cmpxchg()

	nbcon_init()
	nbcon_cleanup()

	nbcon_can_proceed(struct cons_write_context *wctxt);
	nbcon_enter_unsafe(struct cons_write_context *wctxt);

	nbcon_enter()
	nbcon_flush_all();

	nbcon_emit_next_record()


a) varianta with atomic:


	CON_ATOMIC		= BIT(8),

	struct acon_state {
	atomic_long_t		__private acon_state[2];

	include/linux/console.h
	kernel/printk/acon.c  or atomic_console.c

	enum acon_state_selector {
		ACON_STATE_CUR,

	acon_state_set()
	acon_state_try_cmpxchg()

	acon_init()
	acon_cleanup()

	acon_can_proceed(struct cons_write_context *wctxt);
	acon_enter_unsafe(struct cons_write_context *wctxt);

	acon_enter()
	acon_flush_all();

	acon_emit_next_record()


I would prefer the variant with "nbcon" because

	$> git grep nbcon | wc -l
	0

vs.

	$> git grep acon | wc -l
	11544


Again, feel free to tell me that I ask for too much. I am not
sure how complicated would be to do this mass change and if it
is worth it. I can review this patchset even with the current names.

My main concern is about the long term maintainability. It is
always easier to see patches than a monolitic source code.
I would like to reduce the risk of people hating us for what "a mess"
we made ;-)

Well, the current names might be fine when the legacy code gets
removed one day. The question is how realistic it is. Also we
probably should make them slightly more consistent anyway.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH printk v1 06/18] printk: nobkl: Add acquire/release logic
  2023-03-02 19:56 ` [PATCH printk v1 06/18] printk: nobkl: Add acquire/release logic John Ogness
  2023-03-06  9:07   ` Dan Carpenter
@ 2023-03-13 16:07   ` Petr Mladek
  2023-03-17 14:56     ` John Ogness
  2023-03-17 17:34   ` simplify: was: " Petr Mladek
  2 siblings, 1 reply; 92+ messages in thread
From: Petr Mladek @ 2023-03-13 16:07 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman

On Thu 2023-03-02 21:02:06, John Ogness wrote:
> From: Thomas Gleixner <tglx@linutronix.de>
> 
> Add per console acquire/release functionality. The console 'locked'
> state is a combination of several state fields:
> 
>   - The 'locked' bit
> 
>   - The 'cpu' field that denotes on which CPU the console is locked
> 
>   - The 'cur_prio' field that contains the severity of the printk
>     context that owns the console. This field is used for decisions
>     whether to attempt friendly handovers and also prevents takeovers
>     from a less severe context, e.g. to protect the panic CPU.
> 
> The acquire mechanism comes with several flavours:
> 
>   - Straight forward acquire when the console is not contended
> 
>   - Friendly handover mechanism based on a request/grant handshake
> 
>     The requesting context:
> 
>       1) Puts the desired handover state (CPU nr, prio) into a
>          separate handover state
> 
>       2) Sets the 'req_prio' field in the real console state
> 
>       3) Waits (with a timeout) for the owning context to handover
> 
>     The owning context:
> 
>       1) Observes the 'req_prio' field set
> 
>       2) Hands the console over to the requesting context by
>          switching the console state to the handover state that was
>          provided by the requester
> 
>   - Hostile takeover
> 
>       The new owner takes the console over without handshake
> 
>       This is required when friendly handovers are not possible,
>       i.e. the higher priority context interrupted the owning context
>       on the same CPU or the owning context is not able to make
>       progress on a remote CPU.
> 
> The release is the counterpart which either releases the console
> directly or hands it gracefully over to a requester.
> 
> All operations on console::atomic_state[CUR|REQ] are atomic
> cmpxchg based to handle concurrency.
> 
> The acquire/release functions implement only minimal policies:
> 
>   - Preference for higher priority contexts
>   - Protection of the panic CPU
> 
> All other policy decisions have to be made at the call sites.
> 
> The design allows to implement the well known:
> 
>     acquire()
>     output_one_line()
>     release()
> 
> algorithm, but also allows to avoid the per line acquire/release for
> e.g. panic situations by doing the acquire once and then relying on
> the panic CPU protection for the rest.
> 
> --- a/include/linux/console.h
> +++ b/include/linux/console.h
> @@ -189,12 +201,79 @@ struct cons_state {
>  			union {
>  				u32	bits;
>  				struct {
> +					u32 locked	:  1;
> +					u32 unsafe	:  1;
> +					u32 cur_prio	:  2;
> +					u32 req_prio	:  2;
> +					u32 cpu		: 18;
>  				};
>  			};
>  		};
>  	};
>  };
>  
> +/**
> + * cons_prio - console writer priority for NOBKL consoles
> + * @CONS_PRIO_NONE:		Unused
> + * @CONS_PRIO_NORMAL:		Regular printk
> + * @CONS_PRIO_EMERGENCY:	Emergency output (WARN/OOPS...)
> + * @CONS_PRIO_PANIC:		Panic output
> + *
> + * Emergency output can carefully takeover the console even without consent
> + * of the owner, ideally only when @cons_state::unsafe is not set. Panic
> + * output can ignore the unsafe flag as a last resort. If panic output is
> + * active no takeover is possible until the panic output releases the
> + * console.
> + */
> +enum cons_prio {
> +	CONS_PRIO_NONE = 0,
> +	CONS_PRIO_NORMAL,
> +	CONS_PRIO_EMERGENCY,
> +	CONS_PRIO_PANIC,
> +};
> +
> +struct console;
> +
> +/**
> + * struct cons_context - Context for console acquire/release
> + * @console:		The associated console
> + * @state:		The state at acquire time
> + * @old_state:		The old state when try_acquire() failed for analysis
> + *			by the caller
> + * @hov_state:		The handover state for spin and cleanup
> + * @req_state:		The request state for spin and cleanup
> + * @spinwait_max_us:	Limit for spinwait acquire
> + * @prio:		Priority of the context
> + * @hostile:		Hostile takeover requested. Cleared on normal
> + *			acquire or friendly handover
> + * @spinwait:		Spinwait on acquire if possible
> + */
> +struct cons_context {
> +	struct console		*console;
> +	struct cons_state	state;
> +	struct cons_state	old_state;
> +	struct cons_state	hov_state;
> +	struct cons_state	req_state;

This looks quite complicated. I am still trying to understand the logic.

I want to be sure that we are on the same page. Let me try to
summarize my understanding and expectations:

1. The console has two state variables (atomic_state[2]):
       + CUR == state of the current owner
       + REQ == set when anyone else requests to take over the owner ship

   In addition, there are also priority bits in the state variable.
   Each state variable has cur_prio, req_prio.


2. There are 4 priorities. They describe the type of the context that is
   either owning the console or which would like to get the owner
   ship.

   These priorities have the following meaning:

       + NONE: when the console is idle

       + NORMAL: the console is owned by the kthread

       + EMERGENCY: The console is called directly from printk().
	   It is used when printing some emergency messages, like
	   WARN(), watchdog splat.

       + PANIC: when console is called directly but only from
	  the CPU that is handling panic().


3. The number of contexts:

       + The is one NORMAL context used by the kthread.

       + There might be eventually more EMERGENCY contexts running
	 in parallel. Usually there is only one but other CPUs
	 might still add more messages into the log buffer parallel.

	 The EMERGENCY context owner is responsible for flushing
	 all pending messages.

       + The might be only one PANIC context on the panic CPU.


4. The current owner sets a flag "unsafe" when it is not safe
   to take over the lock a hostile way.


Switching context:

We have a context with a well defined priority which tries
to get the ownership. There are few possibilities:

a) The console is idle and the context could get the ownership
   immediately.

   It is a simple cmpxchg of con->atomic_state[CUR].


b) The console is owned by anyone with a lower priority.
   The new caller should try to take over the lock a safe way
   when possible.

   It can be done by setting con->atomic_state[REQ] and waiting
   until the current owner makes him the owner in
   con->atomic_state[CUR].


c) The console is owned by anyone with a lower priority
   on the same CPU. Or the owner on an other CPU did not
   pass the lock withing the timeout.

   In this case, we could steal the lock. It can be done by
   writing to con->atomic_state[CUR].

   We could do this in EMERGENCY or PANIC context when the current
   owner is not in an "unsafe" context.

   We could do this at the end of panic (console_flush_in_panic())
   even when the current owner is in an "unsafe" context.


Common rule: The caller never tries to take over the lock
    from another owner of the same priority (?)


Current owner:

  + Must always do non-atomic operations in the "unsafe" context.

  + Must check if they still own the lock or if there is a request
    to pass the lock before manipulating the console state or reading
    the shared buffers.

  + Should pass the lock to a context with a higher priority.
    It must be done only in a "safe" state. But it might be in
    the middle of the record.


Passing the owner:

   + The current owner sets con->atomic_state[CUR] according
     to the info in con->atomic_state[REQ] and bails out.

   + The notices that it became the owner by finding its
     requested state in con->atomic_state[CUR]

   + The most tricky situation is when the current owner
     is passing the lock and the waiter is giving up
     because of the timeout. The current owner could pass
     the lock only when the waiter is still watching.



Other:

   + Atomic consoles ignore con->seq. Instead they store the lower
     32-bit part of the sequence number in the atomic_state variable
     at least on 64-bit systems. They use get_next_seq() to guess
     the higher 32-bit part of the sequence number.


Questions:

How exactly do we handle the early boot before kthreads are ready,
please? It looks like we just wait for the kthread. This looks
wrong to me.

Does the above summary describe the behavior, please?
Or does the code handle some situation another way?

> +	unsigned int		spinwait_max_us;
> +	enum cons_prio		prio;
> +	unsigned int		hostile		: 1;
> +	unsigned int		spinwait	: 1;
> +};
> +
> --- a/kernel/printk/printk_nobkl.c
> +++ b/kernel/printk/printk_nobkl.c
> +/**
> + * cons_check_panic - Check whether a remote CPU is in panic
> + *
> + * Returns: True if a remote CPU is in panic, false otherwise.
> + */
> +static inline bool cons_check_panic(void)
> +{
> +	unsigned int pcpu = atomic_read(&panic_cpu);
> +
> +	return pcpu != PANIC_CPU_INVALID && pcpu != smp_processor_id();
> +}

This does the same as abandon_console_lock_in_panic(). I would
give it some more meaningful name and use it everywhere.

What about other_cpu_in_panic() or panic_on_other_cpu()?

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH printk v1 02/18] printk: Add NMI check to down_trylock_console_sem()
  2023-03-07 16:05   ` Petr Mladek
@ 2023-03-17 11:37     ` John Ogness
  2023-04-13 13:42       ` Petr Mladek
  0 siblings, 1 reply; 92+ messages in thread
From: John Ogness @ 2023-03-17 11:37 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel

On 2023-03-07, Petr Mladek <pmladek@suse.com> wrote:
> So that this change would cause a non-paired console_unlock().
> And console_unlock might still deadlock on the console_sem->lock.

Yes, but at least it would have flushed beforehand.

> One solution would be to call console_flush_all() directly in
> console_flush_on_panic() without taking console_lock().
>
> It should not be worse than the current code which ignores
> the console_trylock() return value.

I think your suggestion is acceptable.

> Note that it mostly works because console_flush_on_panic() is called
> when other CPUs are supposed to be stopped.
>
> We only would need to prevent other CPUs from flushing messages
> as well if they were still running by chance. But we actually already
> do this, see abandon_console_lock_in_panic(). Well, we should
> make sure that the abandon_console_lock_in_panic() check is
> done before flushing the first message.
>
> All these changes together would prevent deadlock on
> console_sem->lock.  But the synchronization "guarantees" should stay
> the same.

We could also update console_trylock() and console_lock() to fail and
infinitely sleep, respectively, when abandon_console_lock_in_panic() is
true. That would prevent CPUs from newly acquiring the console lock and
interfering with the panic CPU.

John

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH printk v1 03/18] printk: Consolidate console deferred printing
  2023-03-08 13:15   ` Petr Mladek
@ 2023-03-17 13:05     ` John Ogness
  2023-04-13 15:15       ` Petr Mladek
  0 siblings, 1 reply; 92+ messages in thread
From: John Ogness @ 2023-03-17 13:05 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel

On 2023-03-08, Petr Mladek <pmladek@suse.com> wrote:
>> --- a/kernel/printk/printk.c
>> +++ b/kernel/printk/printk.c
>> @@ -2321,7 +2321,10 @@ asmlinkage int vprintk_emit(int facility, int level,
>>  		preempt_enable();
>>  	}
>>  
>> -	wake_up_klogd();
>> +	if (in_sched)
>> +		defer_console_output();
>> +	else
>> +		wake_up_klogd();
>
> Nit: I would add an empty line here. Or I would move this up into the
>      previous if (in_sched()) condition.

Empty line is ok. I do not want to move it up because the above
condition gets more complicated later. IMHO a simple if/else for
specifying what the irq_work should do is the most straight forward
here.

>> @@ -3811,11 +3814,30 @@ static void __wake_up_klogd(int val)
>>  	preempt_enable();
>>  }
>>  
>> +/**
>> + * wake_up_klogd - Wake kernel logging daemon
>> + *
>> + * Use this function when new records have been added to the ringbuffer
>> + * and the console printing for those records is handled elsewhere. In
>
> "elsewhere" is ambiguous. I would write:
>
> "and the console printing for those records maybe handled in this context".

The reason for using the word "elsewhere" is because in later patches it
is also the printing threads that can handle it. I can change it to
"this context" for this patch, but then after adding threads I will need
to adjust the comment again. How about:

"and the console printing for those records should not be handled by the
irq_work context because another context will handle it."

> Anyway, I like this change.

Thanks.

John

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH printk v1 04/18] printk: Add per-console suspended state
  2023-03-08 14:40   ` Petr Mladek
@ 2023-03-17 13:22     ` John Ogness
  2023-04-14  9:56       ` Petr Mladek
  0 siblings, 1 reply; 92+ messages in thread
From: John Ogness @ 2023-03-17 13:22 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman

On 2023-03-08, Petr Mladek <pmladek@suse.com> wrote:
>> --- a/include/linux/console.h
>> +++ b/include/linux/console.h
>> @@ -153,6 +153,8 @@ static inline int con_debug_leave(void)
>>   *			receiving the printk spam for obvious reasons.
>>   * @CON_EXTENDED:	The console supports the extended output format of
>>   *			/dev/kmesg which requires a larger output buffer.
>> + * @CON_SUSPENDED:	Indicates if a console is suspended. If true, the
>> + *			printing callbacks must not be called.
>>   */
>>  enum cons_flags {
>>  	CON_PRINTBUFFER		= BIT(0),
>> @@ -162,6 +164,7 @@ enum cons_flags {
>>  	CON_ANYTIME		= BIT(4),
>>  	CON_BRL			= BIT(5),
>>  	CON_EXTENDED		= BIT(6),
>> +	CON_SUSPENDED		= BIT(7),
>
> We have to show it in /proc/consoles, see fs/proc/consoles.c.

Are we supposed to show all flags in /proc/consoles? Currently
CON_EXTENDED is not shown either.

>> --- a/kernel/printk/printk.c
>> +++ b/kernel/printk/printk.c
>> @@ -2574,11 +2590,26 @@ void suspend_console(void)
>>  
>>  void resume_console(void)
>>  {
>> +	struct console *con;
>> +
>>  	if (!console_suspend_enabled)
>>  		return;
>>  	down_console_sem();
>>  	console_suspended = 0;
>>  	console_unlock();
>> +
>> +	console_list_lock();
>> +	for_each_console(con)
>> +		console_srcu_write_flags(con, con->flags & ~CON_SUSPENDED);
>> +	console_list_unlock();
>> +
>> +	/*
>> +	 * Ensure that all SRCU list walks have completed. All printing
>> +	 * contexts must be able to see they are no longer suspended so
>> +	 * that they are guaranteed to wake up and resume printing.
>> +	 */
>> +	synchronize_srcu(&console_srcu);
>> +
>
> The setting of the global "console_suspended" and per-console
> CON_SUSPENDED flag is not synchronized. As a result, they might
> become inconsistent:

They do not need to be synchronized and it doesn't matter if they become
inconsistent. With this patch they are no longer related. One is for
tracking the state of the console (CON_SUSPENDED), the other is for
tracking the suspend trick for the console_lock.

> I think that we could just remove the global "console_suspended" flag.
>
> IMHO, it used to be needed to avoid moving the global "console_seq" forward
> when the consoles were suspended. But it is not needed now with the
> per-console con->seq. console_flush_all() skips consoles when
> console_is_usable() fails and it bails out when there is no progress.

The @console_suspended flag is used to allow console_lock/console_unlock
to be called without triggering printing. This is probably so that vt
code can make use of the console_lock for its own internal locking, even
when in a state where ->write() should not be called. I would expect we
still need it, even if the consoles do not.

The only valid criteria for allowing to call ->write() is CON_SUSPENDED.

>> @@ -3712,14 +3745,7 @@ static bool __pr_flush(struct console *con, int timeout_ms, bool reset_on_progre
>>  		}
>>  		console_srcu_read_unlock(cookie);
>>  
>> -		/*
>> -		 * If consoles are suspended, it cannot be expected that they
>> -		 * make forward progress, so timeout immediately. @diff is
>> -		 * still used to return a valid flush status.
>> -		 */
>> -		if (console_suspended)
>> -			remaining = 0;
>
> I wonder if it would make sense to add a comment somewhere,
> e.g. above the later check:
>
> +		/* diff is zero also when there is no usable console. */
> 		if (diff == 0 || remaining == 0)
> 			break;

I think that is obvious, but I can add a similar comment to remind the
reader that only usable consoles are counted.

> Anyway, we should update the comment above pr_flush():
>
> -  * Return: true if all enabled printers are caught up.
> +  * Return: true if all usable printers are caught up.

Nice catch.

John

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: global states: was: Re: [PATCH printk v1 05/18] printk: Add non-BKL console basic infrastructure
  2023-03-09 14:08   ` global states: was: " Petr Mladek
@ 2023-03-17 13:29     ` John Ogness
  0 siblings, 0 replies; 92+ messages in thread
From: John Ogness @ 2023-03-17 13:29 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman, linux-fsdevel

On 2023-03-09, Petr Mladek <pmladek@suse.com> wrote:
>> --- a/kernel/printk/printk.c
>> +++ b/kernel/printk/printk.c
>> @@ -3472,6 +3492,14 @@ void register_console(struct console *newcon)
>>  	newcon->dropped = 0;
>>  	console_init_seq(newcon, bootcon_registered);
>>  
>> +	if (!(newcon->flags & CON_NO_BKL))
>> +		have_bkl_console = true;
>
> We never clear this value even when the console gets unregistered.

OK. I'll allow it to be cleared on unregister by checking the registered
list.

>> @@ -3515,6 +3543,9 @@ void register_console(struct console *newcon)
>>  			if (con->flags & CON_BOOT)
>>  				unregister_console_locked(con);
>>  		}
>> +
>> +		/* All boot consoles have been unregistered. */
>> +		have_boot_console = false;
>
> The boot consoles can be removed also by printk_late_init().
>
> I would prefer to make this more error-proof and update both
> have_bkl_console and have_boot_console in unregister_console().

OK.

> A solution would be to use a reference counter instead of the boolean.
> I am not sure if it is worth it. But it seems that refcount_read()
> is just simple atomic read, aka READ_ONCE().

Well, we are holding the console_list_lock, so we can just iterate over
the list. Iteration happens later in the series anyway, in order to
create/run the NOBKL threads.

John

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: naming: Re: [PATCH printk v1 05/18] printk: Add non-BKL console basic infrastructure
  2023-03-09 15:32   ` naming: " Petr Mladek
@ 2023-03-17 13:39     ` John Ogness
  0 siblings, 0 replies; 92+ messages in thread
From: John Ogness @ 2023-03-17 13:39 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman, linux-fsdevel

On 2023-03-09, Petr Mladek <pmladek@suse.com> wrote:
> So, this patch adds:
>
> [...]
>
> All the above names seem to be used only by the NOBLK consoles.
> And they use "cons", "NO_BKL", "nobkl", "cons_atomic", "atomic", "console".
>
> I wonder if there is a system or if the names just evolved during several
> reworks.

Yes. And because we didn't really know how to name these different yet
related pieces.

Note that the "atomic" usage is really only referring to the things
related to the atomic part of the console. The console also has a
threaded component.

>    ... an easy to distinguish shortcat would be nice
>    as a common prefix. The following comes to my mind:
>
>    + nbcon - aka nobkl/noblk consoles API
>    + acon  - atomic console API

"acon" is not really appropriate because they are threaded+atomic
consoles, not just atomic consoles. But it probably isn't necessary to
have separate atomic and threaded API forms. The atomic can still be
used as (for example) nbcon_atomic_enter().

> It would look like:
>
> a) variant with nbcom:
>
>
> 	CON_NB		= BIT(8),
>
> 	struct nbcon_state {
> 	atomic_long_t		__private atomic_nbcon_state[2];
>
> 	include/linux/console.h
> 	kernel/printk/nbcon.c
>
> 	enum nbcon_state_selector {
> 		NBCON_STATE_CUR,
>
> 	nbcon_state_set()
> 	nbcon_state_try_cmpxchg()
>
> 	nbcon_init()
> 	nbcon_cleanup()
>
> 	nbcon_can_proceed(struct cons_write_context *wctxt);
> 	nbcon_enter_unsafe(struct cons_write_context *wctxt);
>
> 	nbcon_enter()
> 	nbcon_flush_all();
>
> 	nbcon_emit_next_record()
>
> I would prefer the variant with "nbcon" because
>
> 	$> git grep nbcon | wc -l
> 	0

I also prefer "nbcon". Thanks for finding a name and unique string for
this new code. I will also rename the file to printk_nbcon.c.

John

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH printk v1 06/18] printk: nobkl: Add acquire/release logic
  2023-03-13 16:07   ` Petr Mladek
@ 2023-03-17 14:56     ` John Ogness
  2023-03-20 16:10       ` Petr Mladek
  0 siblings, 1 reply; 92+ messages in thread
From: John Ogness @ 2023-03-17 14:56 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman

Hi Petr,

On oftc#printk you mentioned that I do not need to go into details
here. But I would like to confirm your understanding and clarify some
minor details.

On 2023-03-13, Petr Mladek <pmladek@suse.com> wrote:
>> --- a/include/linux/console.h
>> +++ b/include/linux/console.h
>> @@ -189,12 +201,79 @@ struct cons_state {
>>  			union {
>>  				u32	bits;
>>  				struct {
>> +					u32 locked	:  1;
>> +					u32 unsafe	:  1;
>> +					u32 cur_prio	:  2;
>> +					u32 req_prio	:  2;
>> +					u32 cpu		: 18;
>>  				};
>>  			};
>>  		};
>>  	};
>>  };
>>  
>> +/**
>> + * cons_prio - console writer priority for NOBKL consoles
>> + * @CONS_PRIO_NONE:		Unused
>> + * @CONS_PRIO_NORMAL:		Regular printk
>> + * @CONS_PRIO_EMERGENCY:	Emergency output (WARN/OOPS...)
>> + * @CONS_PRIO_PANIC:		Panic output
>> + *
>> + * Emergency output can carefully takeover the console even without consent
>> + * of the owner, ideally only when @cons_state::unsafe is not set. Panic
>> + * output can ignore the unsafe flag as a last resort. If panic output is
>> + * active no takeover is possible until the panic output releases the
>> + * console.
>> + */
>> +enum cons_prio {
>> +	CONS_PRIO_NONE = 0,
>> +	CONS_PRIO_NORMAL,
>> +	CONS_PRIO_EMERGENCY,
>> +	CONS_PRIO_PANIC,
>> +};
>> +
>> +struct console;
>> +
>> +/**
>> + * struct cons_context - Context for console acquire/release
>> + * @console:		The associated console
>> + * @state:		The state at acquire time
>> + * @old_state:		The old state when try_acquire() failed for analysis
>> + *			by the caller
>> + * @hov_state:		The handover state for spin and cleanup
>> + * @req_state:		The request state for spin and cleanup
>> + * @spinwait_max_us:	Limit for spinwait acquire
>> + * @prio:		Priority of the context
>> + * @hostile:		Hostile takeover requested. Cleared on normal
>> + *			acquire or friendly handover
>> + * @spinwait:		Spinwait on acquire if possible
>> + */
>> +struct cons_context {
>> +	struct console		*console;
>> +	struct cons_state	state;
>> +	struct cons_state	old_state;
>> +	struct cons_state	hov_state;
>> +	struct cons_state	req_state;
>
> This looks quite complicated. I am still trying to understand the logic.
>
> I want to be sure that we are on the same page. Let me try to
> summarize my understanding and expectations:
>
> 1. The console has two state variables (atomic_state[2]):
>        + CUR == state of the current owner
>        + REQ == set when anyone else requests to take over the owner ship
>
>    In addition, there are also priority bits in the state variable.
>    Each state variable has cur_prio, req_prio.

Correct.

> 2. There are 4 priorities. They describe the type of the context that is
>    either owning the console or which would like to get the owner
>    ship.

Yes, however (and I see now the kerneldoc is not very clear about this),
the priorities are not really about _printing_ on the console, but
instead about _owning_ the console. This is an important distinction
because console drivers will also acquire the console for non-printing
activities (such as setting up their baud rate, etc.).

>    These priorities have the following meaning:
>
>        + NONE: when the console is idle

"unowned" is a better term than "idle".

>        + NORMAL: the console is owned by the kthread

NORMAL really means ownership for normal usage (i.e. an owner that is
not in an emergency or panic situation).

>        + EMERGENCY: The console is called directly from printk().
> 	   It is used when printing some emergency messages, like
> 	   WARN(), watchdog splat.

This priority of ownership will only be used when printing emergency
messages. It does not mean that printk() does direct printing. The
atomic printing occurs as a flush when releasing the ownership. This
allows the full backtrace to go into the ringbuffer before flushing (as
we decided at LPC2022).

>        + PANIC: when console is called directly but only from
> 	  the CPU that is handling panic().

This priority really has the same function as EMERGENCY, but is a higher
priority so that console ownership can always be taken (hostile if
necessary) in a panic situation. This priority of ownership will only be
used on the panic CPU.

> 3. The number of contexts:
>
>        + The is one NORMAL context used by the kthread.

NORMAL defines the priority of the ownership. So it is _all_ owning
contexts (not just printing contexts) that are not EMERGENCY or PANIC.

>        + There might be eventually more EMERGENCY contexts running
> 	 in parallel. Usually there is only one but other CPUs
> 	 might still add more messages into the log buffer parallel.
>
> 	 The EMERGENCY context owner is responsible for flushing
> 	 all pending messages.

Yes, but you need to be careful how you view this. There might be more
contexts with emergency messages, but only one owner with the EMERGENCY
priority. The other contexts will fail to acquire the console and only
dump their messages into the ringbuffer. The one EMERGENCY owner will
flush all messages when ownership is released.

>        + The might be only one PANIC context on the panic CPU.

There is only one PANIC owner. (There is only ever at most one owner.)
But also there should only be one CPU with panic messages. @panic_cpu
should take care of that.

> 4. The current owner sets a flag "unsafe" when it is not safe
>    to take over the lock a hostile way.

Yes. These owners will be console drivers that are touching registers
that affect their write_thread() and write_atomic() callback
code. Theoretically the drivers could also use EMERGENCY or PANIC
priorities to make sure those situations do not steal the console from
them. But for now drivers should only use NORMAL.

> Switching context:
>
> We have a context with a well defined priority which tries
> to get the ownership. There are few possibilities:
>
> a) The console is idle and the context could get the ownership
>    immediately.
>
>    It is a simple cmpxchg of con->atomic_state[CUR].

Yes, although "unowned" is a better term than "idle". The console might
be un-idle (playing with registers), but those registers do not affect
its write_thread() and write_atomic() callbacks.

> b) The console is owned by anyone with a lower priority.
>    The new caller should try to take over the lock a safe way
>    when possible.
>
>    It can be done by setting con->atomic_state[REQ] and waiting
>    until the current owner makes him the owner in
>    con->atomic_state[CUR].

Correct. And the requester can set a timeout how long it will maximally
wait.

> c) The console is owned by anyone with a lower priority
>    on the same CPU. Or the owner on an other CPU did not
>    pass the lock withing the timeout.

(...and the owner on the other CPU is also a lower priority)

>    In this case, we could steal the lock. It can be done by
>    writing to con->atomic_state[CUR].
>
>    We could do this in EMERGENCY or PANIC context when the current
>    owner is not in an "unsafe" context.

(...and the current owner has a lower priority)

>    We could do this at the end of panic (console_flush_in_panic())
>    even when the current owner is in an "unsafe" context.
>
> Common rule: The caller never tries to take over the lock
>     from another owner of the same priority (?)

Correct. Although I could see there being an argument to let an
EMERGENCY priority take over another EMERGENCY. For example, an owning
EMERGENCY CPU could hang and another CPU triggers the NMI stall message
(also considered emergency messages), in which case it would be helpful
to take over ownership from the hung CPU in order to finish flushing.

> Current owner:
>
>   + Must always do non-atomic operations in the "unsafe" context.

Each driver must decide for itself how it defines unsafe. But generally
speaking it will be a block of code involving modifying multiple
registers.

>   + Must check if they still own the lock or if there is a request
>     to pass the lock before manipulating the console state or reading
>     the shared buffers.

... or continuing to touch its registers.

>   + Should pass the lock to a context with a higher priority.
>     It must be done only in a "safe" state. But it might be in
>     the middle of the record.

The function to check also handles the handing over. So a console
driver, when checking, may suddenly see that it is no longer the owner
and must either carefully back out or re-acquire ownership to finish
what it started. (For example, for the 8250, if an owning context
disabled interrupts and then lost ownership, it _must_ re-acquire the
console to re-enable the interrupts.)

> Passing the owner:
>
>    + The current owner sets con->atomic_state[CUR] according
>      to the info in con->atomic_state[REQ] and bails out.
>
>    + The notices that it became the owner by finding its
>      requested state in con->atomic_state[CUR]
>
>    + The most tricky situation is when the current owner
>      is passing the lock and the waiter is giving up
>      because of the timeout. The current owner could pass
>      the lock only when the waiter is still watching.

Yes, yes, and yes. Since the waiter must remove its request from
con->atomic_state[CUR] before giving up, it guarentees the current owner
will see that the waiter is gone because any cmpxchg will fail and the
current owner will need to re-read con->atomic_state[CUR] (in which case
it sees there is no waiter).

> Other:
>
>    + Atomic consoles ignore con->seq. Instead they store the lower
>      32-bit part of the sequence number in the atomic_state variable
>      at least on 64-bit systems. They use get_next_seq() to guess
>      the higher 32-bit part of the sequence number.

Yes, because con->seq is protected by the console_lock, which nbcons do
not use. Note that pr_flush() relies on the console_lock, so it also
takes that opporunity to sync con->seq with con->atomic_state[CUR].seq,
thus allowing pr_flush() to only care about con->seq. pr_flush() is the
only function that cares about the sequence number for both legacy and
nbcon consoles during runtime.

> Questions:
>
> How exactly do we handle the early boot before kthreads are ready,
> please? It looks like we just wait for the kthread.

Every vprintk_emit() will call into cons_atomic_flush(), which will
atomically flush the consoles if their threads do not exist. Looking at
the code, I see it deserves a comment about this (inside the
for_each_console_srcu loop in cons_atomic_flush()).

> Does the above summary describe the behavior, please?
> Or does the code handle some situation another way?

Generally speaking, you have a pretty good picture. I think the only
thing that was missing was the concept that non-printing code (in
console drivers) will also acquire the console at times.

>> --- a/kernel/printk/printk_nobkl.c
>> +++ b/kernel/printk/printk_nobkl.c
>> +/**
>> + * cons_check_panic - Check whether a remote CPU is in panic
>> + *
>> + * Returns: True if a remote CPU is in panic, false otherwise.
>> + */
>> +static inline bool cons_check_panic(void)
>> +{
>> +	unsigned int pcpu = atomic_read(&panic_cpu);
>> +
>> +	return pcpu != PANIC_CPU_INVALID && pcpu != smp_processor_id();
>> +}
>
> This does the same as abandon_console_lock_in_panic(). I would
> give it some more meaningful name and use it everywhere.
>
> What about other_cpu_in_panic() or panic_on_other_cpu()?

I prefer the first because it sounds more like a query than a
command.

John

^ permalink raw reply	[flat|nested] 92+ messages in thread

* simplify: was: Re: [PATCH printk v1 06/18] printk: nobkl: Add acquire/release logic
  2023-03-02 19:56 ` [PATCH printk v1 06/18] printk: nobkl: Add acquire/release logic John Ogness
  2023-03-06  9:07   ` Dan Carpenter
  2023-03-13 16:07   ` Petr Mladek
@ 2023-03-17 17:34   ` Petr Mladek
  2023-03-21 15:36     ` Petr Mladek
  2 siblings, 1 reply; 92+ messages in thread
From: Petr Mladek @ 2023-03-17 17:34 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman

Hi,

I send this before reading today's answers about the basic rules.

I have spent on this answer few days and I do not want to delay
it indefinitely. It documents my initial feelings about the code.
Also it describes some ideas that might or need not be useful
anyway.

Also there is a POC that slightly modifies the logic. But the basic
approach remains the same.

On Thu 2023-03-02 21:02:06, John Ogness wrote:
> From: Thomas Gleixner <tglx@linutronix.de>
> 
> Add per console acquire/release functionality. The console 'locked'
> state is a combination of several state fields:
> 
>   - The 'locked' bit
> 
>   - The 'cpu' field that denotes on which CPU the console is locked
> 
>   - The 'cur_prio' field that contains the severity of the printk
>     context that owns the console. This field is used for decisions
>     whether to attempt friendly handovers and also prevents takeovers
>     from a less severe context, e.g. to protect the panic CPU.
> 
> The acquire mechanism comes with several flavours:
> 
>   - Straight forward acquire when the console is not contended
> 
>   - Friendly handover mechanism based on a request/grant handshake
> 
>     The requesting context:
> 
>       1) Puts the desired handover state (CPU nr, prio) into a
>          separate handover state
> 
>       2) Sets the 'req_prio' field in the real console state
> 
>       3) Waits (with a timeout) for the owning context to handover
> 
>     The owning context:
> 
>       1) Observes the 'req_prio' field set
> 
>       2) Hands the console over to the requesting context by
>          switching the console state to the handover state that was
>          provided by the requester
> 
>   - Hostile takeover
> 
>       The new owner takes the console over without handshake
> 
>       This is required when friendly handovers are not possible,
>       i.e. the higher priority context interrupted the owning context
>       on the same CPU or the owning context is not able to make
>       progress on a remote CPU.
> 
> The release is the counterpart which either releases the console
> directly or hands it gracefully over to a requester.
> 
> All operations on console::atomic_state[CUR|REQ] are atomic
> cmpxchg based to handle concurrency.
> 
> The acquire/release functions implement only minimal policies:
> 
>   - Preference for higher priority contexts
>   - Protection of the panic CPU
> 
> All other policy decisions have to be made at the call sites.
> 
> The design allows to implement the well known:
> 
>     acquire()
>     output_one_line()
>     release()
> 
> algorithm, but also allows to avoid the per line acquire/release for
> e.g. panic situations by doing the acquire once and then relying on
> the panic CPU protection for the rest.
> 
> Co-developed-by: John Ogness <john.ogness@linutronix.de>
> Signed-off-by: John Ogness <john.ogness@linutronix.de>
> Signed-off-by: Thomas Gleixner (Intel) <tglx@linutronix.de>
> ---
>  include/linux/console.h      |  82 ++++++
>  kernel/printk/printk_nobkl.c | 531 +++++++++++++++++++++++++++++++++++
>  2 files changed, 613 insertions(+)
> 
> diff --git a/include/linux/console.h b/include/linux/console.h
> index b9d2ad580128..2c95fcc765e6 100644
> --- a/include/linux/console.h
> +++ b/include/linux/console.h
> @@ -176,8 +176,20 @@ enum cons_flags {
>   * @seq:	Sequence for record tracking (64bit only)
>   * @bits:	Compound of the state bits below
>   *
> + * @locked:	Console is locked by a writer
> + * @unsafe:	Console is busy in a non takeover region
> + * @cur_prio:	The priority of the current output
> + * @req_prio:	The priority of a handover request
> + * @cpu:	The CPU on which the writer runs
> + *
>   * To be used for state read and preparation of atomic_long_cmpxchg()
>   * operations.
> + *
> + * The @req_prio field is particularly important to allow spin-waiting to
> + * timeout and give up without the risk of it being assigned the lock
> + * after giving up. The @req_prio field has a nice side-effect that it
> + * also makes it possible for a single read+cmpxchg in the common case of
> + * acquire and release.
>   */
>  struct cons_state {
>  	union {
> @@ -189,12 +201,79 @@ struct cons_state {
>  			union {
>  				u32	bits;
>  				struct {
> +					u32 locked	:  1;

Is this bit really necessary?

The console is locked when con->atomic_state[CUR] != 0.
This check gives the same information.

Motivation:

The code is quite complex by definition. It implements
a sleeping lock with spinlock-like waiting with timeout,
priorities, voluntary and hostile take-over.

The main logic is easier than the lockless printk rinbuffer.
But it was still hard for me to understand it. And I am still
not sure if it is OK.

One big problem is the manipulation of cons_state. It includes
a lot of information. And I am never sure if we compare
the right bits and if we pass the right value to cmpxchg.

Any simplification might help. And an extra bit that does not
bring an extra information looks like non-necessary complication.

> +					u32 unsafe	:  1;
> +					u32 cur_prio	:  2;
> +					u32 req_prio	:  2;
> +					u32 cpu		: 18;
>  				};
>  			};
>  		};
>  	};
>  };
>  
> +/**
> + * cons_prio - console writer priority for NOBKL consoles
> + * @CONS_PRIO_NONE:		Unused
> + * @CONS_PRIO_NORMAL:		Regular printk
> + * @CONS_PRIO_EMERGENCY:	Emergency output (WARN/OOPS...)
> + * @CONS_PRIO_PANIC:		Panic output
> + *
> + * Emergency output can carefully takeover the console even without consent
> + * of the owner, ideally only when @cons_state::unsafe is not set. Panic
> + * output can ignore the unsafe flag as a last resort. If panic output is
> + * active no takeover is possible until the panic output releases the
> + * console.
> + */
> +enum cons_prio {
> +	CONS_PRIO_NONE = 0,
> +	CONS_PRIO_NORMAL,
> +	CONS_PRIO_EMERGENCY,
> +	CONS_PRIO_PANIC,
> +};
> +
> +struct console;
> +
> +/**
> + * struct cons_context - Context for console acquire/release
> + * @console:		The associated console

There are 4 state variables below. It is really hard to
understand what information they include and when they
are updates and why we need to keep it.

I'll describe how they confused me:

> + * @state:		The state at acquire time

This sounds like it is a copy of con->atomic_state[CUR] read
before trying to acquire it.

But the code copies there some new state via
copy_full_state(ctxt->state, new);

> + * @old_state:		The old state when try_acquire() failed for analysis
> + *			by the caller

This sounds like a copy of con->atomic_state[CUR] when
cmpxchg failed. It means that @state is probably
not longer valid.

It sounds strange that we would need to have remember
both values. So, I guess that the meaning is different.

> + * @hov_state:		The handover state for spin and cleanup

It sounds like the value that we put into con->atomic_state[REQ].

> + * @req_state:		The request state for spin and cleanup

This is quite confusing. It is req_state but it seems to be yet
another cache of con->atomic_state[CUR].

Now, a better name and better explantion might help. But even better might
be to avoid/remove some of these values. I have some ideas see below.


> + * @spinwait_max_us:	Limit for spinwait acquire
> + * @prio:		Priority of the context
> + * @hostile:		Hostile takeover requested. Cleared on normal
> + *			acquire or friendly handover
> + * @spinwait:		Spinwait on acquire if possible
> + */
> +struct cons_context {
> +	struct console		*console;
> +	struct cons_state	state;
> +	struct cons_state	old_state;
> +	struct cons_state	hov_state;
> +	struct cons_state	req_state;
> +	unsigned int		spinwait_max_us;
> +	enum cons_prio		prio;
> +	unsigned int		hostile		: 1;
> +	unsigned int		spinwait	: 1;
> +};
> +
> +/**
> + * struct cons_write_context - Context handed to the write callbacks
> + * @ctxt:	The core console context
> + * @outbuf:	Pointer to the text buffer for output
> + * @len:	Length to write
> + * @unsafe:	Invoked in unsafe state due to force takeover
> + */
> +struct cons_write_context {
> +	struct cons_context	__private ctxt;
> +	char			*outbuf;
> +	unsigned int		len;
> +	bool			unsafe;
> +};
> +
>  /**
>   * struct console - The console descriptor structure
>   * @name:		The name of the console driver
> --- a/kernel/printk/printk_nobkl.c
> +++ b/kernel/printk/printk_nobkl.c
> @@ -4,6 +4,7 @@
>  
>  #include <linux/kernel.h>
>  #include <linux/console.h>
> +#include <linux/delay.h>
>  #include "internal.h"
>  /*
>   * Printk implementation for consoles that do not depend on the BKL style
> @@ -112,6 +113,536 @@ static inline bool cons_state_try_cmpxchg(struct console *con,
>  				       &old->atom, new->atom);
>  }
>  
> +/**
> + * cons_state_full_match - Check whether the full state matches
> + * @cur:	The state to check
> + * @prev:	The previous state
> + *
> + * Returns: True if matching, false otherwise.
> + *
> + * Check the full state including state::seq on 64bit. For take over
> + * detection.
> + */
> +static inline bool cons_state_full_match(struct cons_state cur,
> +					 struct cons_state prev)
> +{
> +	/*
> +	 * req_prio can be set by a concurrent writer for friendly
> +	 * handover. Ignore it in the comparison.
> +	 */
> +	cur.req_prio = prev.req_prio;
> +	return cur.atom == prev.atom;

This function seems to be used to check if the context is still the same.
Is it really important to check the entire atom, please?

The current owner should be defined by CPU number and the priority.
Anything else is an information that we want to change an atomic
way together with the owner. But they should not really be needed
to indentify the owner.

My motivation is to make it clear what we are testing.
cons_state_full_match() and cons_state_bits_match() have vague names.
They play games with cur.req_prio. The result depends on which
state we are testing against.

Also I think that it is one reason why we need the 4 cons_state variables
in struct cons_context. We need to check against a particular
full const_state variables.

I am not 100% sure but I think that checking particular fields
might make things more strightforward. I have more ideas,
see below.

> +
> +/**
> + * cons_state_bits_match - Check for matching state bits
> + * @cur:	The state to check
> + * @prev:	The previous state
> + *
> + * Returns: True if state matches, false otherwise.
> + *
> + * Contrary to cons_state_full_match this checks only the bits and ignores
> + * a sequence change on 64bits. On 32bit the two functions are identical.
> + */
> +static inline bool cons_state_bits_match(struct cons_state cur, struct cons_state prev)
> +{
> +	/*
> +	 * req_prio can be set by a concurrent writer for friendly
> +	 * handover. Ignore it in the comparison.
> +	 */
> +	cur.req_prio = prev.req_prio;
> +	return cur.bits == prev.bits;
> +}
> +
> +/**
> + * cons_check_panic - Check whether a remote CPU is in panic
> + *
> + * Returns: True if a remote CPU is in panic, false otherwise.
> + */
> +static inline bool cons_check_panic(void)
> +{
> +	unsigned int pcpu = atomic_read(&panic_cpu);
> +
> +	return pcpu != PANIC_CPU_INVALID && pcpu != smp_processor_id();
> +}
> +
> +/**
> + * cons_cleanup_handover - Cleanup a handover request
> + * @ctxt:	Pointer to acquire context
> + *
> + * @ctxt->hov_state contains the state to clean up
> + */
> +static void cons_cleanup_handover(struct cons_context *ctxt)
> +{
> +	struct console *con = ctxt->console;
> +	struct cons_state new;
> +
> +	/*
> +	 * No loop required. Either hov_state is still the same or
> +	 * not.
> +	 */
> +	new.atom = 0;
> +	cons_state_try_cmpxchg(con, CON_STATE_REQ, &ctxt->hov_state, &new);
> +}
> +
> +/**
> + * cons_setup_handover - Setup a handover request
> + * @ctxt:	Pointer to acquire context
> + *
> + * Returns: True if a handover request was setup, false otherwise.
> + *
> + * On success @ctxt->hov_state contains the requested handover state
> + *
> + * On failure this context is not allowed to request a handover from the
> + * current owner. Reasons would be priority too low or a remote CPU in panic.
> + * In both cases this context should give up trying to acquire the console.
> + */
> +static bool cons_setup_handover(struct cons_context *ctxt)
> +{
> +	unsigned int cpu = smp_processor_id();
> +	struct console *con = ctxt->console;
> +	struct cons_state old;
> +	struct cons_state hstate = {
> +		.locked		= 1,
> +		.cur_prio	= ctxt->prio,
> +		.cpu		= cpu,
> +	};
> +
> +	/*
> +	 * Try to store hstate in @con->atomic_state[REQ]. This might
> +	 * race with a higher priority waiter.
> +	 */
> +	cons_state_read(con, CON_STATE_REQ, &old);
> +	do {
> +		if (cons_check_panic())
> +			return false;
> +
> +		/* Same or higher priority waiter exists? */
> +		if (old.cur_prio >= ctxt->prio)
> +			return false;
> +
> +	} while (!cons_state_try_cmpxchg(con, CON_STATE_REQ, &old, &hstate));
> +
> +	/* Save that state for comparison in spinwait */
> +	copy_full_state(ctxt->hov_state, hstate);
> +	return true;
> +}
> +
> +/**
> + * cons_setup_request - Setup a handover request in state[CUR]
> + * @ctxt:	Pointer to acquire context
> + * @old:	The state that was used to make the decision to spin wait
> + *
> + * Returns: True if a handover request was setup in state[CUR], false
> + * otherwise.
> + *
> + * On success @ctxt->req_state contains the request state that was set in
> + * state[CUR]
> + *
> + * On failure this context encountered unexpected state values. This
> + * context should retry the full handover request setup process (the
> + * handover request setup by cons_setup_handover() is now invalidated
> + * and must be performed again).
> + */
> +static bool cons_setup_request(struct cons_context *ctxt, struct cons_state old)
> +{
> +	struct console *con = ctxt->console;
> +	struct cons_state cur;
> +	struct cons_state new;
> +
> +	/* Now set the request in state[CUR] */
> +	cons_state_read(con, CON_STATE_CUR, &cur);
> +	do {
> +		if (cons_check_panic())
> +			goto cleanup;
> +
> +		/* Bit state changed vs. the decision to spinwait? */
> +		if (!cons_state_bits_match(cur, old))
> +			goto cleanup;
> +
> +		/*
> +		 * A higher or equal priority context already setup a
> +		 * request?
> +		 */
> +		if (cur.req_prio >= ctxt->prio)
> +			goto cleanup;
> +
> +		/* Setup a request for handover. */
> +		copy_full_state(new, cur);
> +		new.req_prio = ctxt->prio;
> +	} while (!cons_state_try_cmpxchg(con, CON_STATE_CUR, &cur, &new));
> +
> +	/* Save that state for comparison in spinwait */
> +	copy_bit_state(ctxt->req_state, new);
> +	return true;
> +
> +cleanup:
> +	cons_cleanup_handover(ctxt);
> +	return false;
> +}
> +
> +/**
> + * cons_try_acquire_spin - Complete the spinwait attempt
> + * @ctxt:	Pointer to an acquire context that contains
> + *		all information about the acquire mode
> + *
> + * @ctxt->hov_state contains the handover state that was set in
> + * state[REQ]
> + * @ctxt->req_state contains the request state that was set in
> + * state[CUR]
> + *
> + * Returns: 0 if successfully locked. -EBUSY on timeout. -EAGAIN on
> + * unexpected state values.
> + *
> + * On success @ctxt->state contains the new state that was set in
> + * state[CUR]
> + *
> + * On -EBUSY failure this context timed out. This context should either
> + * give up or attempt a hostile takeover.
> + *
> + * On -EAGAIN failure this context encountered unexpected state values.
> + * This context should retry the full handover request setup process (the
> + * handover request setup by cons_setup_handover() is now invalidated and
> + * must be performed again).
> + */
> +static bool cons_try_acquire_spin(struct cons_context *ctxt)
> +{
> +	struct console *con = ctxt->console;
> +	struct cons_state cur;
> +	struct cons_state new;
> +	int err = -EAGAIN;
> +	int timeout;
> +
> +	/* Now wait for the other side to hand over */
> +	for (timeout = ctxt->spinwait_max_us; timeout >= 0; timeout--) {
> +		/* Timeout immediately if a remote panic is detected. */
> +		if (cons_check_panic())
> +			break;
> +
> +		cons_state_read(con, CON_STATE_CUR, &cur);
> +
> +		/*
> +		 * If the real state of the console matches the handover state
> +		 * that this context setup, then the handover was a success
> +		 * and this context is now the owner.
> +		 *
> +		 * Note that this might have raced with a new higher priority
> +		 * requester coming in after the lock was handed over.
> +		 * However, that requester will see that the owner changes and
> +		 * setup a new request for the current owner (this context).
> +		 */
> +		if (cons_state_bits_match(cur, ctxt->hov_state))
> +			goto success;
> +
> +		/*
> +		 * If state changed since the request was made, give up as
> +		 * it is no longer consistent. This must include
> +		 * state::req_prio since there could be a higher priority
> +		 * request available.
> +		 */
> +		if (cur.bits != ctxt->req_state.bits)

IMHO, this would fail when .unsafe flag (added later) has another
value. But it does not mean that we should stop waiting.

This is one example that comparing all bits makes things tricky.


> +			goto cleanup;
> +
> +		/*
> +		 * Finally check whether the handover state is still
> +		 * the same.
> +		 */
> +		cons_state_read(con, CON_STATE_REQ, &cur);
> +		if (cur.atom != ctxt->hov_state.atom)
> +			goto cleanup;
> +
> +		/* Account time */
> +		if (timeout > 0)
> +			udelay(1);
> +	}
> +
> +	/*
> +	 * Timeout. Cleanup the handover state and carefully try to reset
> +	 * req_prio in the real state. The reset is important to ensure
> +	 * that the owner does not hand over the lock after this context
> +	 * has given up waiting.
> +	 */
> +	cons_cleanup_handover(ctxt);
> +
> +	cons_state_read(con, CON_STATE_CUR, &cur);
> +	do {
> +		/*
> +		 * The timeout might have raced with the owner coming late
> +		 * and handing it over gracefully.
> +		 */
> +		if (cons_state_bits_match(cur, ctxt->hov_state))
> +			goto success;
> +
> +		/*
> +		 * Validate that the state matches with the state at request
> +		 * time. If this check fails, there is already a higher
> +		 * priority context waiting or the owner has changed (either
> +		 * by higher priority or by hostile takeover). In all fail
> +		 * cases this context is no longer in line for a handover to
> +		 * take place, so no reset is necessary.
> +		 */
> +		if (cur.bits != ctxt->req_state.bits)
> +			goto cleanup;

Again, this might give wrong result because of the .unsafe bit.

Also "goto cleanup" looks superfluous. cons_cleanup_handover() has
already been called above.

> +
> +		copy_full_state(new, cur);
> +		new.req_prio = 0;
> +	} while (!cons_state_try_cmpxchg(con, CON_STATE_CUR, &cur, &new));
> +	/* Reset worked. Report timeout. */
> +	return -EBUSY;
> +
> +success:
> +	/* Store the real state */
> +	copy_full_state(ctxt->state, cur);
> +	ctxt->hostile = false;
> +	err = 0;
> +
> +cleanup:
> +	cons_cleanup_handover(ctxt);
> +	return err;
> +}
> +
> +/**
> + * __cons_try_acquire - Try to acquire the console for printk output
> + * @ctxt:	Pointer to an acquire context that contains
> + *		all information about the acquire mode
> + *
> + * Returns: True if the acquire was successful. False on fail.
> + *
> + * In case of success @ctxt->state contains the acquisition
> + * state.
> + *
> + * In case of fail @ctxt->old_state contains the state
> + * that was read from @con->state for analysis by the caller.
> + */
> +static bool __cons_try_acquire(struct cons_context *ctxt)
> +{
> +	unsigned int cpu = smp_processor_id();
> +	struct console *con = ctxt->console;
> +	short flags = console_srcu_read_flags(con);
> +	struct cons_state old;
> +	struct cons_state new;
> +	int err;
> +
> +	if (WARN_ON_ONCE(!(flags & CON_NO_BKL)))
> +		return false;
> +again:
> +	cons_state_read(con, CON_STATE_CUR, &old);
> +
> +	/* Preserve it for the caller and for spinwait */
> +	copy_full_state(ctxt->old_state, old);
> +
> +	if (cons_check_panic())
> +		return false;
> +
> +	/* Set up the new state for takeover */
> +	copy_full_state(new, old);
> +	new.locked = 1;
> +	new.cur_prio = ctxt->prio;
> +	new.req_prio = CONS_PRIO_NONE;
> +	new.cpu = cpu;
> +
> +	/* Attempt to acquire it directly if unlocked */
> +	if (!old.locked) {
> +		if (!cons_state_try_cmpxchg(con, CON_STATE_CUR, &old, &new))
> +			goto again;
> +
> +		ctxt->hostile = false;
> +		copy_full_state(ctxt->state, new);
> +		goto success;
> +	}
> +cons_state_bits_match(cur, ctxt->hov_state)
> +	/*
> +	 * If the active context is on the same CPU then there is
> +	 * obviously no handshake possible.
> +	 */
> +	if (old.cpu == cpu)
> +		goto check_hostile;
> +
> +	/*
> +	 * If a handover request with same or higher priority is already
> +	 * pending then this context cannot setup a handover request.
> +	 */
> +	if (old.req_prio >= ctxt->prio)
> +		goto check_hostile;
> +
> +	/*
> +	 * If the caller did not request spin-waiting then performing a
> +	 * handover is not an option.
> +	 */
> +	if (!ctxt->spinwait)
> +		goto check_hostile;
> +
> +	/*
> +	 * Setup the request in state[REQ]. If this fails then this
> +	 * context is not allowed to setup a handover request.
> +	 */
> +	if (!cons_setup_handover(ctxt))
> +		goto check_hostile;
> +
> +	/*
> +	 * Setup the request in state[CUR]. Hand in the state that was
> +	 * used to make the decision to spinwait above, for comparison. If
> +	 * this fails then unexpected state values were encountered and the
> +	 * full request setup process is retried.
> +	 */
> +	if (!cons_setup_request(ctxt, old))
> +		goto again;
> +
> +	/*
> +	 * Spin-wait to acquire the console. If this fails then unexpected
> +	 * state values were encountered (for example, a hostile takeover by
> +	 * another context) and the full request setup process is retried.
> +	 */
> +	err = cons_try_acquire_spin(ctxt);
> +	if (err) {
> +		if (err == -EAGAIN)
> +			goto again;
> +		goto check_hostile;
> +	}
> +success:
> +	/* Common updates on success */
> +	return true;
> +
> +check_hostile:
> +	if (!ctxt->hostile)
> +		return false;
> +
> +	if (cons_check_panic())
> +		return false;
> +
> +	if (!cons_state_try_cmpxchg(con, CON_STATE_CUR, &old, &new))
> +		goto again;
> +
> +	copy_full_state(ctxt->state, new);
> +	goto success;
> +}
> +
> +/**
> + * cons_try_acquire - Try to acquire the console for printk output
> + * @ctxt:	Pointer to an acquire context that contains
> + *		all information about the acquire mode
> + *
> + * Returns: True if the acquire was successful. False on fail.
> + *
> + * In case of success @ctxt->state contains the acquisition
> + * state.
> + *
> + * In case of fail @ctxt->old_state contains the state
> + * that was read from @con->state for analysis by the caller.
> + */
> +static bool cons_try_acquire(struct cons_context *ctxt)
> +{
> +	if (__cons_try_acquire(ctxt))
> +		return true;
> +
> +	ctxt->state.atom = 0;
> +	return false;
> +}
> +
> +/**
> + * __cons_release - Release the console after output is done
> + * @ctxt:	The acquire context that contains the state
> + *		at cons_try_acquire()
> + *
> + * Returns:	True if the release was regular
> + *
> + *		False if the console is in unusable state or was handed over
> + *		with handshake or taken	over hostile without handshake.
> + *
> + * The return value tells the caller whether it needs to evaluate further
> + * printing.
> + */
> +static bool __cons_release(struct cons_context *ctxt)
> +{
> +	struct console *con = ctxt->console;
> +	short flags = console_srcu_read_flags(con);
> +	struct cons_state hstate;
> +	struct cons_state old;
> +	struct cons_state new;
> +
> +	if (WARN_ON_ONCE(!(flags & CON_NO_BKL)))
> +		return false;
> +
> +	cons_state_read(con, CON_STATE_CUR, &old);
> +again:
> +	if (!cons_state_bits_match(old, ctxt->state))
> +		return false;
> +
> +	/* Release it directly when no handover request is pending. */
> +	if (!old.req_prio)
> +		goto unlock;
> +
> +	/* Read the handover target state */
> +	cons_state_read(con, CON_STATE_REQ, &hstate);
> +
> +	/* If the waiter gave up hstate is 0 */
> +	if (!hstate.atom)
> +		goto unlock;
> +
> +	/*
> +	 * If a higher priority waiter raced against a lower priority
> +	 * waiter then unlock instead of handing over to either. The
> +	 * higher priority waiter will notice the updated state and
> +	 * retry.
> +	 */
> +	if (hstate.cur_prio != old.req_prio)
> +		goto unlock;
> +
> +	/* Switch the state and preserve the sequence on 64bit */
> +	copy_bit_state(new, hstate);
> +	copy_seq_state64(new, old);
> +	if (!cons_state_try_cmpxchg(con, CON_STATE_CUR, &old, &new))
> +		goto again;
> +
> +	return true;
> +
> +unlock:
> +	/* Clear the state and preserve the sequence on 64bit */
> +	new.atom = 0;
> +	copy_seq_state64(new, old);
> +	if (!cons_state_try_cmpxchg(con, CON_STATE_CUR, &old, &new))
> +		goto again;
> +

The entire logic is quite tricky. I checked many possible races
and everything looked fine. But then (after many what if) I found
this quite tricky problem:

CPU0				CPU1

// releasing the lock		// spinning 
__cons_release()		cons_try_acquire_spin()

// still see CPU1 spinning
cons_state_read(REQ, hstate);

				// timeout
				// clear REQ
				cons_cleanup_handover(ctxt);

				// still see CPU0 locked
				const_state_read(CUR)

// assign con to CPU1
cons_state_try_cmpxchg(con, CUR, old, hstate)

// try to get the lock again
__cons_try_acquire();

// see it locked by CPU1
cons_state_read(con);

// set CPU0 into REQ
// it has been cleared by CPU1
// => success
cons_setup_handover()

// set req_prio in CUR
// cur.req_prio is 0 because
// we did set CPU1 as the owner.
// => success
cons_setup_request()

Note: Everything looks fine at this moment.
      CPU1 is about to realize that it became the owner.
      CPU0 became the waiter.
      BUT...


				// try clear req_prio in CUR
				// fails because we became the owner
				// CUR.cpu has changed to CPU1
				while (!cmpxchg(con, CUR, old, new))

				// re-read CUR
				const_state_read(CUR)

				// it misses that it is the owner because
				// .req_prio is already set for CPU0
				cons_state_bits_match(cur, ctxt->hov_state)

				if (cur.bits != ctxt->req_state.bits)
				       goto clean up;


BANG: It goes to clean up because cur.cpu is CPU1. But the original
      req_state.cpu is CPU0. The ctxt->req_state was set at
      the beginning when CUR was owned by CPU0.

      As a result. con->atomic_state[CUR] says that the console is
      locked by CPU1. But CPU1 is not aware of this.


My opinion:

I think that the current code is more error prone that it could be.
IMHO, the main problems are:

    + the checks of the owner fails when other-unrelated bits
      are modified

    + also the hand shake between the CUR and REQ state is
      complicated on both try_ack and release sides. And
      we haven't even started talking about barriers yet.



Ideas:

I though about many approaches. And the following ideas looked
promissing:

    + Make the owner checks reliable by explicitly checking
      .prio and .cpu values.

    + The hand shake and probably also barriers might be easier
      when release() modifies only CUR value. Then all the magic
      is done on the try_acquire part.


POC:

Note: I removed seq from the cons_state for simplicity.
      Well, I think that we could actually keep it separate.

/**
 * struct cons_state - console state for NOBKL consoles
 * @atom:	Compound of the state fields for atomic operations
 * @req_prio:	Priority of request that would like to take over the lock
 * @unsafe:	Console is busy in a non takeover region
 * @prio:	The priority of the current state owner
 * @cpu:	The CPU on which the state owner runs
 *
 * Note: cur_prio and req_prio are a bit confusing in the REG state
 *       I removed the cur_ prefix. The meaning is that they own
 *       this particular const_state.
 */
struct cons_state {
	union {
		u32	atom;
		struct {
			u32 req_prio	:  2;
			u32 unsafe	:  1;
			u32 prio	:  2;
			u32 cpu	: 18;
		};
	};
};

/*
 * struct cons_context - Context for console acquire/release
 * @prio:	The priority of the current context owner
 * @cpu:	The CPU on which the context runs
 * @cur_prio:   The priority of the CUR state owner taken (cache)
 * @cur_cpu:    The CPU on which the CUR state owner runs (cache)
 * @stay_safe:  Could get the lock the hostile way only when
 *		the console driver is in a safe state.
 *
 * The cur_* cache values are taken at the beginning of
 * cons_try_acquire() when it fails to take the CUR state (lock).
 * directly. It allows to manipulate CUR state later. It is valid
 * as long as the CUR owner stays the same.
 */
struct cons_context {
	u32			prio     :2;
	u32			cpu      :18;
	u32			cur_prio :2;
	u32			cut_cpu  :18;
	bool			stay_safe;
}

/**
 * cons_context_owner_matches() - check if the owner of cons_context
 *		match cons_state owner.
 *
 * Alternative would be to put .prio .cpu into an union
 * compare both directly, something like:
 *
 *	union {
 *		u32 owner: 24;
 *		struct {
 *			u32 prio : 2;
 *			u32 cpu  : 22;
 *		};
 *	};
 *
 *  and use (state.owner == ctxt.owner) directly in the code.
 */
bool cons_state_owner_matches(struct cons_context *ctxt,
			      struct const_state *state)
{
	if (state->prio != ctxt->prio)
		return false;

	if (state->cpu != ctxt->cpu)
		return false;

	return true;
}

/**
 * cons_context_owner_matches() - check if the owner of the given
 *	still matches the owner of CUR state cached in the given
 *      context struct.
 *
 *  It allows to ignore other state changes as long as the CUR
 *  state owner stays the same.
 */
bool cur_cons_state_owner_matches(struct cons_context *ctxt,
				  struct const_state *state)
{
	if (state->prio != ctxt->cur_prio)
		return false;

	if (state->cpu != ctxt->cur_cpu)
		return false;

	return true;
}

/*
 * Release the lock but preserve the request so that the lock
 * stays blocked for the request.
 *
 * @passing: Release the only when there is still a request.
 *	Use this option when the current owner passes the lock
 *	prematurelly on request from a context with a higher
 *	priority. It prevents lossing the lock when the request
 *	timeouted in the meantime.
 */
bool __cons_release(struct cons_context *ctxt, bool passing)
{
	struct console *con = ctxt->con;
	struct cons_state cur, new_cur;
	ret = true;

	cons_read_state(con, CON_STATE_CUR, &cur);
	do {
		/* Am I still the owner? */
		if (!cons_state_owner_matches(ctxt, cur))
			return false;

		/*
		 * Avoid passing the lock when the request disappeared
		 * in the mean time.
		 */
		if (passing && !cur.req_prio)
			return false;

		/*
		 * Prepare unlocked state. But preserve .req_prio. It will
		 * keep the lock blocked for the REQ owner context.
		 */
		new_cur.atom = 0;
		new_cur.req_prio = cur.req_prio;
	} while (!cons_state_try_cmpxchg(con, CON_STATE_CUR, &cur, &new_cur));

	return true;
}

bool cons_release(struct cons_context *ctxt)
{
	return __cons_release(ctxt, false);
}

bool cons_release_pass(struct cons_context *ctxt)
{
	return __cons_release(ctxt, true);
}


bool cons_can_proceed(struct const_context *ctxt)
{
	struct console *con = ctxt->con;
	struct cons_state cur;

	cons_read_state(con, CON_STATE_CUR, &cur);

	/* Am I still the owner? */
	if (!cons_state_owner_matches(ctxt, cur))
		return false;

	/*
	 * Pass the lock when there is a request and it is safe.
	 *
	 * The current owner could still procceed when the pass
	 * failed because the request timeouted.
	 */
	if (cur.req_prio && !cur.unsafe) {
		return !cons_release_pass(ctxt);
	}

	return true;
}

/*
 * Try get owner of the CUR state whet it is and there is no pending
 * request.
 */
int cons_try_acquire_directly(struct cons_context *ctxt)
{
	struct cons_state cur;
	struct my = {
		.cpu = ctxt.cpu;
		.prio = ctxt.prio;
	};

	cur.atom = 0;
	if (cons_state_try_cmpxchg(con, CON_STATE_CUR, &cur, &my))
		return 0;

	/* Give up when the current owner has the same or higher priority */
	if (cur.prio >= my.prio)
		return -EBUSY;


	/*
	 * Give up when there is a pedning request with the same or
	 * higher priority.
	 */
	if (cur.req_prio >= my.prio)
		return -EBUSY;

	/*
	 * Cant' take it dirrectly. Store info about the current owner.
	 * The entire try_acquire() needs to get restarted or fail
	 * when it changes.
	 */
	ctxt->cur_prio = cur.prio;
	ctxt->cur_cpu = cur.cpu;
	return -EBUSY;
}


/*
/**
 * cons_try_acquire_request - Set a request to get the lock
 * @ctxt:	Pointer to acquire context
 *
 * Try to set REQ and make CUR aware this request by setting .req_prio.
 *
 * Must be called only when cons_try_acquire_direct() failed.
 *
 * Rerurn: 0 success; -EBUSY when the lock is owned or already required
 *	by a context with a higher or the same priority. -EAGAIN when
 *	the current owner has changed and the caller has to start
 *	from scratch.
 */
bool cons_try_acquire_request(struct cons_context *ctxt)
{
	struct console *con = ctxt->con;
	struct cons_state cur, orig_cur, new_cur, req;
	struct my = {
		.cpu = ctxt.cpu;
		.prio = ctxt.prio;
	};
	int ret;

	/*
	 * Nope when pr_try_acquire_direct() did not cache CUR owner.
	 * It means that CUR has already owner or requested by a context
	 * with the same or higher priority;
	 */
	if (!ctxt->cur_prio)
		return -EBUSY;

	/* First, try to get REQ. */
	cons_state_read(con, CON_STATE_REQ, &req);
	do {
		/*
		 * Give up when the current request has the same or
		 * higher priority.
		 */
		if (req.prio >= my.prio)
			return -EBUSY;
	} while (!cons_state_try_cmpxchg(con, CON_STATE_REQ, &req, &my));

	/*
	 * REQ is ours, tell CUR about our request and spin.
	 *
	 * Accept changes of other state bits as long as the owner
	 * of the console stays the same as the one that blocked
	 * direct locking.
	 */
	cons_state_read(con, CON_STATE_CUR, &cur);
	while (cons_cur_state_matches(ctxt, &cur)) {
		new_cur.atom = cur.atom;
		new_cur.req_prio = ctxt->prio;

		if (cons_state_try_cmpxchg(con, CON_STATE_CUR, &cur, &new_cur))
			return 0;
	};

	/*
	 * Bad luck. The lock owner has changed. The lock was either
	 * released or released and taken by another context or
	 * taken harsh way by a higher priority request.
	 *
	 * Drop our request if it is still there and try again from
	 * the beginning.
	 */
	 req.atom = my.atom;
	 new_req.atom = 0;
	 cons_state_try_cmpxchg(con, CON_STATE_REQ, &req, &new_req);
	 return -EAGAIN;
}

/*
 * cons_try_acquire_spinning - wait for the lock
 * @ctxt:	Pointer to acquire context
 *
 * This can be called only when the context has successfully setup
 * request in REQ and CUR has .request bit set.
 *
 * Return: 0 on success; -EBUSY when a context with a higher priority
 *	took over our request or the lock or when the current owner
 *	have not passed the lock within the timeout.
 */
int cons_try_acquire_spinning(struct cons_context *ctxt)
{
	struct console *con = ctxt->con;
	struct cons_state cur, old_cur, new_cur, req;
	struct cons_state my = {
		.cpu = ctxt->cpu;
		.prio = ctxt->prio;
	};
	bool locked = false;
	int err;

	/* Wait for the other side to hand over */
	for (timeout = ctxt->spinwait_max_us; timeout >= 0; timeout--)
	{
		/* Try to get the lock if it was released. */
		cur.atom = 0;
		cur.req_prio = ctxt->my_prio;
		if (cons_state_try_cmpxchg(con, CON_STATE_CUR, &cur, &my)) {
			locked = true;
			break;
		}

		/*
		 * Give up when another context overridden our request.
		 * It must have had a higher priority.
		 */
		cons_read_state(con, CON_STATE_REQ, &req);
		if (!is_my_cons_state(ctxt, &req))
			goto clean_ctxt_cur;

		/* Acount time. */
		udelay(1);
	}

	 if (!locked) {
		 /*
		  * Timeout passed. Have to remove .req_prio.
		  *
		  * Ignore changes in other flags as long as the owner is
		  * the same as the one that blocked direct locking.
		  * Also .req_prio must be ours.
		  */
		 cons_state_read(con, CON_STATE_CUR, &cur);
		 do {
			 /*
			  * Only a context with higher priority could override
			  * our request. It must have replaced REQ already.
			  */
			 if (cur.req_prio != ctxt.prio)
				 return -EBUSY;

			 new_cur.atom = cur.atom;
			 new_cur.req_prio = 0;
		 } while (!cons_state_try_cmpxchg(con, CON_STATE_CUR, &cur, &new_cur));
	 }

	 /*
	  * We are here either when timeouted or got the lock by spinning.
	  * REQ must be cleaned in both situations.
	  *
	  * It might fail when REQ has been taken by a context with a higher
	  * priority. It is OK.
	  */
	 req.atom = my.atom;
	 new_req.atom = 0;
	 cons_state_try_cmpxchg(con, CON_STATE_REQ, &my, &new_req);

	 return locked ? 0 : -EBUSY;
}

bool cons_try_acquire_hostile(struct cons_context *ctxt)
{
	struct cons_state my = {
		.cpu = ctxt->cpu;
		.prio = ctxt->prio;
	};

	if (ctxt.prio < CONS_PRIO_EMERGENCY)
		return -EBUSY;

	if (cons_check_panic())
		return -EBUSY;

	/*
	 * OK, be hostile and just grab it when safe.
	 * Taking the lock when it is not safe makes sense only as the
	 * last resort when seeing the log is the last wish of the to-be-die
	 * system.
	 *
	 */
	cons_state_read(con, CON_STATE_CUR, &cur)
	do {
		if (ctxt->stay_safe && cur.unsafe)
			return -EBUSY;

		/*
		 * Make sure that the lock has not been taked or requested
		 * by a context with even higher priority.
		 */
		if (cur.prio > my.prio || cur.req_prio > my.prio)
			return -EBUSY;
	} while (!cons_state_try_cmpxchg(con, CON_STATE_CUR, &cur, &my));

	return 0;
}

bool cons_try_acquire(struct cons_context *ctxt)
{
	struct console *con = ctxt->con;
	struct cons_state cur;
	struct my = {
		.cpu = ctxt.cpu;
		.prio = ctxt.prio;
	};
	int err;

try_again:
	ctxt->cur_prio = 0;

	err = cons_try_acquire_directly(ctxt);
	if (!err)
		return true;

	err = cons_try_acquire_request(ctxt);
	if (err == -EAGAIN)
		goto try_again;
	if (err ==-EBUSY)
		return false;

	err = cons_try_acquire_spinning(ctxt);
	if (!err)
		return true;

	err = cons_try_acquire_hostile(ctxt);
	if (!err)
		return true;

	return false;
}


The above POC is not even compile tested. And it is possible that
I missed something important. I rewrote it many times and this is
the best that I was able to produce.

The main principle is still the same. I do not resist on using my
variant. But some ideas might be useful. The main changes:

     + The code compares the owner (.prio + .cpu) instead of the whole
       state atom. It helps to see what is important and what we are
       really checking and ignore unrelated changes in the atomic
       state. Also we need to remember only two owners in struct
       cons_ctxt instead of the 4 cached values.

     + cons_release() just removes the owner. It keeps .req_prio
       so that the lock still stays blocked by the request.

       As a result, cons_release() and cons_can_continue() are
       easier. All the complexity is in try_acquire part.
       I am not completely sure. But it might be easier to
       follow the logic and even document the barriers.

       I mean that it should be easier to see that we set/check/clear
       the various states and bits in the right order. It might also
       make it easier to see and document barriers.


     + Removed the .cur_ prefix. I think that it does more harm
       then good. Especially, it is confusing in REQ state
       and in struct cons_context.


     + I split the code into functions a bit different way.
       It was result of trying various approaches. It is still
       hairy at some points. I am not sure if it is better or
       worse than your code. The different split is not really
       important.

     + I removed the 32-bit sequence number from the state.
       I think that it might be in a separate 32-bit
       atomic variable. It can be updated using cmpxchg when
       the current owner finishes writing a record.

       IMHO, there is no real advantage in passing seq number
       with the lock. Everything will be good when the lock is passed
       correctly. There will always be duplicate lines (part of lines)
       when the lock is passed to a higher priority context
       prematurely.

       But I might miss something. I did not really looked at
       the sequence number related races yet.


     + I also did not add other flags added by later patches. The
       intention of this POC was to play with the code. It was
       useful even when it won't be used. It helped me to understand
       your code and see various catches.


Best Regards,
Petr

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH printk v1 06/18] printk: nobkl: Add acquire/release logic
  2023-03-17 14:56     ` John Ogness
@ 2023-03-20 16:10       ` Petr Mladek
  0 siblings, 0 replies; 92+ messages in thread
From: Petr Mladek @ 2023-03-20 16:10 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman

On Fri 2023-03-17 16:02:12, John Ogness wrote:
> Hi Petr,
> 
> On oftc#printk you mentioned that I do not need to go into details
> here. But I would like to confirm your understanding and clarify some
> minor details.
> 
> On 2023-03-13, Petr Mladek <pmladek@suse.com> wrote:
> > 2. There are 4 priorities. They describe the type of the context that is
> >    either owning the console or which would like to get the owner
> >    ship.
> 
> Yes, however (and I see now the kerneldoc is not very clear about this),
> the priorities are not really about _printing_ on the console, but
> instead about _owning_ the console. This is an important distinction
> because console drivers will also acquire the console for non-printing
> activities (such as setting up their baud rate, etc.).

Makes sense. I have missed this use-case of the lock.

> >    These priorities have the following meaning:
> >
> >        + NONE: when the console is idle
> 
> "unowned" is a better term than "idle".

Makes sense. Or maybe "free" or "released".

> >        + NORMAL: the console is owned by the kthread
> 
> NORMAL really means ownership for normal usage (i.e. an owner that is
> not in an emergency or panic situation).
> 
> >        + EMERGENCY: The console is called directly from printk().
> > 	   It is used when printing some emergency messages, like
> > 	   WARN(), watchdog splat.
> 
> This priority of ownership will only be used when printing emergency
> messages. It does not mean that printk() does direct printing. The
> atomic printing occurs as a flush when releasing the ownership. This
> allows the full backtrace to go into the ringbuffer before flushing (as
> we decided at LPC2022).

I see. I have missed this as well.

> >
> > Common rule: The caller never tries to take over the lock
> >     from another owner of the same priority (?)
> 
> Correct. Although I could see there being an argument to let an
> EMERGENCY priority take over another EMERGENCY. For example, an owning
> EMERGENCY CPU could hang and another CPU triggers the NMI stall message
> (also considered emergency messages), in which case it would be helpful
> to take over ownership from the hung CPU in order to finish flushing.

I agree that it would be useful. Another motivation would be to reduce
the risk of stalling the current lock owner. I mean to have a variant
of console_trylock_spinning() also for this consoles in the EMERGENCY
priority.


> > Current owner:
> >
> >   + Must always do non-atomic operations in the "unsafe" context.
> 
> Each driver must decide for itself how it defines unsafe. But generally
> speaking it will be a block of code involving modifying multiple
> registers.
> 
> >   + Must check if they still own the lock or if there is a request
> >     to pass the lock before manipulating the console state or reading
> >     the shared buffers.
> 
> ... or continuing to touch its registers.
> 
> >   + Should pass the lock to a context with a higher priority.
> >     It must be done only in a "safe" state. But it might be in
> >     the middle of the record.
> 
> The function to check also handles the handing over. So a console
> driver, when checking, may suddenly see that it is no longer the owner
> and must either carefully back out or re-acquire ownership to finish
> what it started.

Just to be sure. The owner could finish what-it-started only when
the other owner did not do conflicting changes in the meantime.

For example, it could not finish writing of a line because the
other owner could have reused the buffer or already flushed
the line in the meantime.


(For example, for the 8250, if an owning context
> disabled interrupts and then lost ownership, it _must_ re-acquire the
> console to re-enable the interrupts.)
> 
> > Passing the owner:
> >
> >    + The current owner sets con->atomic_state[CUR] according
> >      to the info in con->atomic_state[REQ] and bails out.
> >
> >    + The notices that it became the owner by finding its
> >      requested state in con->atomic_state[CUR]
> >
> >    + The most tricky situation is when the current owner
> >      is passing the lock and the waiter is giving up
> >      because of the timeout. The current owner could pass
> >      the lock only when the waiter is still watching.
> 
> Yes, yes, and yes. Since the waiter must remove its request from
> con->atomic_state[CUR] before giving up, it guarentees the current owner
> will see that the waiter is gone because any cmpxchg will fail and the
> current owner will need to re-read con->atomic_state[CUR] (in which case
> it sees there is no waiter).
> 
> > Other:
> >
> >    + Atomic consoles ignore con->seq. Instead they store the lower
> >      32-bit part of the sequence number in the atomic_state variable
> >      at least on 64-bit systems. They use get_next_seq() to guess
> >      the higher 32-bit part of the sequence number.
> 
> Yes, because con->seq is protected by the console_lock, which nbcons do
> not use.

Yup.

> > Questions:
> >
> > How exactly do we handle the early boot before kthreads are ready,
> > please? It looks like we just wait for the kthread.
> 
> Every vprintk_emit() will call into cons_atomic_flush(), which will
> atomically flush the consoles if their threads do not exist. Looking at
> the code, I see it deserves a comment about this (inside the
> for_each_console_srcu loop in cons_atomic_flush()).

I see. I have missed this as well. I haven't checked the later
patches in delail yet.

> > Does the above summary describe the behavior, please?
> > Or does the code handle some situation another way?
> 
> Generally speaking, you have a pretty good picture. I think the only
> thing that was missing was the concept that non-printing code (in
> console drivers) will also acquire the console at times.

Thanks a lot for the info.


> >> --- a/kernel/printk/printk_nobkl.c
> >> +++ b/kernel/printk/printk_nobkl.c
> >> +/**
> >> + * cons_check_panic - Check whether a remote CPU is in panic
> >> + *
> >> + * Returns: True if a remote CPU is in panic, false otherwise.
> >> + */
> >> +static inline bool cons_check_panic(void)
> >> +{
> >> +	unsigned int pcpu = atomic_read(&panic_cpu);
> >> +
> >> +	return pcpu != PANIC_CPU_INVALID && pcpu != smp_processor_id();
> >> +}
> >
> > This does the same as abandon_console_lock_in_panic(). I would
> > give it some more meaningful name and use it everywhere.
> >
> > What about other_cpu_in_panic() or panic_on_other_cpu()?
> 
> I prefer the first because it sounds more like a query than a
> command.

Yup.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: simplify: was: Re: [PATCH printk v1 06/18] printk: nobkl: Add acquire/release logic
  2023-03-17 17:34   ` simplify: was: " Petr Mladek
@ 2023-03-21 15:36     ` Petr Mladek
  2023-04-02 18:39       ` John Ogness
  0 siblings, 1 reply; 92+ messages in thread
From: Petr Mladek @ 2023-03-21 15:36 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman

On Fri 2023-03-17 18:34:30, Petr Mladek wrote:
> Hi,
> 
> I send this before reading today's answers about the basic rules.
> 
> I have spent on this answer few days and I do not want to delay
> it indefinitely. It documents my initial feelings about the code.
> Also it describes some ideas that might or need not be useful
> anyway.
> 
> Also there is a POC that slightly modifies the logic. But the basic
> approach remains the same.

I looked at this with a "fresh" mind. I though if there was any real
advantage in the proposed change of the cons_release() logic. I mean
to just clear .cpu and .cur_prio and let cons_try_acquire() to take
over the lock.

I tried to describe my view below.

> > +/**
> > + * __cons_release - Release the console after output is done
> > + * @ctxt:	The acquire context that contains the state
> > + *		at cons_try_acquire()
> > + *
> > + * Returns:	True if the release was regular
> > + *
> > + *		False if the console is in unusable state or was handed over
> > + *		with handshake or taken	over hostile without handshake.
> > + *
> > + * The return value tells the caller whether it needs to evaluate further
> > + * printing.
> > + */
> > +static bool __cons_release(struct cons_context *ctxt)
> > +{
> > +	struct console *con = ctxt->console;
> > +	short flags = console_srcu_read_flags(con);
> > +	struct cons_state hstate;
> > +	struct cons_state old;
> > +	struct cons_state new;
> > +
> > +	if (WARN_ON_ONCE(!(flags & CON_NO_BKL)))
> > +		return false;
> > +
> > +	cons_state_read(con, CON_STATE_CUR, &old);
> > +again:
> > +	if (!cons_state_bits_match(old, ctxt->state))
> > +		return false;
> > +
> > +	/* Release it directly when no handover request is pending. */
> > +	if (!old.req_prio)
> > +		goto unlock;
> > +
> > +	/* Read the handover target state */
> > +	cons_state_read(con, CON_STATE_REQ, &hstate);
> > +
> > +	/* If the waiter gave up hstate is 0 */
> > +	if (!hstate.atom)
> > +		goto unlock;
> > +
> > +	/*
> > +	 * If a higher priority waiter raced against a lower priority
> > +	 * waiter then unlock instead of handing over to either. The
> > +	 * higher priority waiter will notice the updated state and
> > +	 * retry.
> > +	 */
> > +	if (hstate.cur_prio != old.req_prio)
> > +		goto unlock;

The above check might cause that CUR will be completely unlocked
even when there is a request. It is a corner case. It would happen
when a higher priority context is in the middle of over-ridding
an older request (already took REQ but have not updated
CUR.req_prio yet).

As a result any context might take CUR while the higher priority
context is re-starting the request and tries to get the lock with
the updated CUR.

It is a bit pity but it is not end of the world. The higher priority
context would just need to wait for another context.

That said, my proposal would solve this a bit cleaner way.
CUR would stay blocked for the .req_prio context. As a result,
the being-overridden REQ owner would become CUR owner.
And the higher priority context would then need to setup
new REQ against the previous REQ owner.

> > +
> > +	/* Switch the state and preserve the sequence on 64bit */
> > +	copy_bit_state(new, hstate);
> > +	copy_seq_state64(new, old);
> > +	if (!cons_state_try_cmpxchg(con, CON_STATE_CUR, &old, &new))
> > +		goto again;

The other difference is that the above code will do just half of
the request-related manipulation. It will assing CUR to the REQ owner.
The REQ owner will need to realize that it got the lock and
clean up REQ part.

Or by other words, there are 3 pieces of information:

   + CUR owner is defined by CUR.cpu and CUR.cur_prio
   + REQ owner is defined by REQ.cpu and REQ.cur_prio
   + CUR knows about the request by CUR.req_prio

The current code modifies the pieces in thie order:

CPU0				CPU1

// take a free lock
set CUR.cpu
set CUR.cur_prio

				// set request
				set REQ.cpu
				set REQ.cur_prio

				// notify CUR
				set CUR.req_prio

// re-assign the lock to CPU1
set CUR.cpu = REQ.cpu
set CUR.cur_prio = REQ.cur_prio
set CUR.req_prio = 0

				// clean REQ
				REQ.cpu =0;
				REQ.cur_prio = 0;


In this case, CPU0 has to read REQ and does a job for CPU1.

Instead, my proposal does:

CPU0				CPU1

// take a free lock
set CUR.cpu
set cur.prio

				// set request
				set REQ.cpu
				set REQ,cur_prio

				// notify CUR
				set CUR.req_prio

// unlock CPU0
set CUR.cpu = 0
set CUR.cur_prio = 0;
keep CUR.req_prio == REQ.cur_prio

				// take the lock and clean notification
				set CUR.cpu = REQ.cpu
				set CUR.cur_prio = REQ.cur_prio
				set CUR.req_prio = 0

				// clean REQ
				REQ.cpu =0;
				REQ.cur_prio = 0;


In this case:

   + CPU0: It manipulates only CUR. And it keeps CUR.req_prio value.
	   It does not check REQ at all.

   + CPU1: Manipulates all REQ-related variables and fields.
	   It modifies SEQ.cpu and SEQ.cur_prio only when
	   they are free.

It looks a bit cleaner. Also it might help to think about barriers
because each side touches only its variables and fields. We might
need less explicit barriers that might be needed when one CPU
does a change for the other.


My view:

I would prefer to do the logic change. It might help with review
and also with the long term maintenance.

But I am not 100% sure if it is worth it. The original approach might
be good enough. The important thing is that it modifies CUR and REQ
variables and fields in the right order. And I do not see any
chicken-and-egg problems. Also the barriers should be doable.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 92+ messages in thread

* union: was: Re: [PATCH printk v1 05/18] printk: Add non-BKL console basic infrastructure
  2023-03-02 19:56 ` [PATCH printk v1 05/18] printk: Add non-BKL console basic infrastructure John Ogness
  2023-03-09 14:08   ` global states: was: " Petr Mladek
  2023-03-09 15:32   ` naming: " Petr Mladek
@ 2023-03-21 16:04   ` Petr Mladek
  2023-03-27 16:28     ` John Ogness
  2 siblings, 1 reply; 92+ messages in thread
From: Petr Mladek @ 2023-03-21 16:04 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman, linux-fsdevel

On Thu 2023-03-02 21:02:05, John Ogness wrote:
> From: Thomas Gleixner <tglx@linutronix.de>
> 
> The current console/printk subsystem is protected by a Big Kernel Lock,
> (aka console_lock) which has ill defined semantics and is more or less
> stateless. This puts severe limitations on the console subsystem and
> makes forced takeover and output in emergency and panic situations a
> fragile endavour which is based on try and pray.
> 
> The goal of non-BKL consoles is to break out of the console lock jail
> and to provide a new infrastructure that avoids the pitfalls and
> allows console drivers to be gradually converted over.
> 
> The proposed infrastructure aims for the following properties:
> 
>   - Per console locking instead of global locking
>   - Per console state which allows to make informed decisions
>   - Stateful handover and takeover
> 
> As a first step state is added to struct console. The per console state
> is an atomic_long_t with a 32bit bit field and on 64bit also a 32bit
> sequence for tracking the last printed ringbuffer sequence number. On
> 32bit the sequence is separate from state for obvious reasons which
> requires handling a few extra race conditions.
>

It is not completely clear that that this struct is stored
as atomic_long_t atomic_state[2] in struct console.

What about adding?

		atomic_long_t atomic;

> +		unsigned long	atom;
> +		struct {
> +#ifdef CONFIG_64BIT
> +			u32	seq;
> +#endif
> +			union {
> +				u32	bits;
> +				struct {
> +				};
> +			};
> +		};
> +	};
>  };
>  
>  /**
> @@ -186,6 +214,8 @@ enum cons_flags {
>   * @dropped:		Number of unreported dropped ringbuffer records
>   * @data:		Driver private data
>   * @node:		hlist node for the console list
> + *
> + * @atomic_state:	State array for NOBKL consoles; real and handover
>   */
>  struct console {
>  	char			name[16];
> @@ -205,6 +235,9 @@ struct console {
>  	unsigned long		dropped;
>  	void			*data;
>  	struct hlist_node	node;
> +
> +	/* NOBKL console specific members */
> +	atomic_long_t		__private atomic_state[2];

and using here

	struct cons_state	__private cons_state[2];

Then we could use cons_state[which].atomic to access it as
the atomic type.

Or was this on purpose? It is true that only the variable
in struct console has to be accessed the atomic way.

Anyway, we should at least add a comment into struct console
about that atomic_state[2] is used to store and access
struct cons_state an atomic way. Also add a compilation
check that the size is the same.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH printk v1 07/18] printk: nobkl: Add buffer management
  2023-03-02 19:56 ` [PATCH printk v1 07/18] printk: nobkl: Add buffer management John Ogness
@ 2023-03-21 16:38   ` Petr Mladek
  2023-03-23 13:38     ` John Ogness
  0 siblings, 1 reply; 92+ messages in thread
From: Petr Mladek @ 2023-03-21 16:38 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman

On Thu 2023-03-02 21:02:07, John Ogness wrote:
> From: Thomas Gleixner <tglx@linutronix.de>
> 
> In case of hostile takeovers it must be ensured that the previous
> owner cannot scribble over the output buffer of the emergency/panic
> context. This is achieved by:
> 
>  - Adding a global output buffer instance for early boot (pre per CPU
>    data being available).
> 
>  - Allocating an output buffer per console for threaded printers once
>    printer threads become available.
> 
>  - Allocating per CPU output buffers per console for printing from
>    all contexts not covered by the other buffers.
> 
>  - Choosing the appropriate buffer is handled in the acquire/release
>    functions.
> 
> The output buffer is wrapped into a separate data structure so other
> context related fields can be added in later steps.
> 
> --- a/kernel/printk/printk_nobkl.c
> +++ b/kernel/printk/printk_nobkl.c
> @@ -166,6 +166,47 @@ static inline bool cons_check_panic(void)
>  	return pcpu != PANIC_CPU_INVALID && pcpu != smp_processor_id();
>  }
>  
> +static struct cons_context_data early_cons_ctxt_data __initdata;
> +
> +/**
> + * cons_context_set_pbufs - Set the output text buffer for the current context
> + * @ctxt:	Pointer to the acquire context
> + *
> + * Buffer selection:
> + *   1) Early boot uses the global (initdata) buffer
> + *   2) Printer threads use the dynamically allocated per-console buffers
> + *   3) All other contexts use the per CPU buffers
> + *
> + * This guarantees that there is no concurrency on the output records ever.
> + * Early boot and per CPU nesting is not a problem. The takeover logic
> + * tells the interrupted context that the buffer has been overwritten.
> + *
> + * There are two critical regions that matter:
> + *
> + * 1) Context is filling the buffer with a record. After interruption
> + *    it continues to sprintf() the record and before it goes to
> + *    write it out, it checks the state, notices the takeover, discards
> + *    the content and backs out.
> + *
> + * 2) Context is in a unsafe critical region in the driver. After
> + *    interruption it might read overwritten data from the output
> + *    buffer. When it leaves the critical region it notices and backs
> + *    out. Hostile takeovers in driver critical regions are best effort
> + *    and there is not much that can be done about that.
> + */
> +static __ref void cons_context_set_pbufs(struct cons_context *ctxt)
> +{
> +	struct console *con = ctxt->console;
> +
> +	/* Thread context or early boot? */
> +	if (ctxt->thread)
> +		ctxt->pbufs = con->thread_pbufs;
> +	else if (!con->pcpu_data)
> +		ctxt->pbufs = &early_cons_ctxt_data.pbufs;
> +	else
> +		ctxt->pbufs = &(this_cpu_ptr(con->pcpu_data)->pbufs);

What exactly do we need the per-CPU buffers for, please?
Is it for an early boot or panic or another scenario?

I would expect that per-console buffer should be enough.
The per-console atomic lock should define who owns
the per-console buffer. The buffer must be accessed
carefully because any context could loose the atomic lock.
Why is kthread special?

The per-CPU buffer actually looks dangerous. It might
be used by more NOBKL consoles. How is the access synchronized
please? By console_list_lock? It is not obvious to me.


On the contrary, we might need 4 static buffers for the early
boot. For example, one atomic console might start printing
in the normal context. Second atomic console might use
the same static buffer in IRQ context. But the first console
will not realize it because it did not loose the per-CPU
atomic lock when the CPU handled the interrupt..
Or is this handled another way, please?

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH printk v1 07/18] printk: nobkl: Add buffer management
  2023-03-21 16:38   ` Petr Mladek
@ 2023-03-23 13:38     ` John Ogness
  2023-03-23 15:25       ` Petr Mladek
  0 siblings, 1 reply; 92+ messages in thread
From: John Ogness @ 2023-03-23 13:38 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman

On 2023-03-21, Petr Mladek <pmladek@suse.com> wrote:
>> +static __ref void cons_context_set_pbufs(struct cons_context *ctxt)
>> +{
>> +	struct console *con = ctxt->console;
>> +
>> +	/* Thread context or early boot? */
>> +	if (ctxt->thread)
>> +		ctxt->pbufs = con->thread_pbufs;
>> +	else if (!con->pcpu_data)
>> +		ctxt->pbufs = &early_cons_ctxt_data.pbufs;
>> +	else
>> +		ctxt->pbufs = &(this_cpu_ptr(con->pcpu_data)->pbufs);
>
> What exactly do we need the per-CPU buffers for, please?
> Is it for an early boot or panic or another scenario?

In case of hostile takeovers, the panic context needs to have a buffer
that the previous context (on another CPU) will not scribble
in. Currently, hostile takeovers only occur during panics. In early boot
there is only 1 CPU.

> I would expect that per-console buffer should be enough.
> The per-console atomic lock should define who owns
> the per-console buffer. The buffer must be accessed
> carefully because any context could loose the atomic lock.

A context will string-print its message into the buffer. During the
string-print it cannot check if it is still the owner. Another CPU may
be already actively printing on that console.

> Why is kthread special?

I believe the idea was that the kthread is not bound to any CPU. But
since migration must be disabled when acquiring the console, there is no
purpose for the kthread to have its own buffer. I will remove it.

> The per-CPU buffer actually looks dangerous. It might
> be used by more NOBKL consoles. How is the access synchronized
> please? By console_list_lock? It is not obvious to me.

Each console has its own set of per-CPU buffers (con->pcpu_data).

> On the contrary, we might need 4 static buffers for the early
> boot. For example, one atomic console might start printing
> in the normal context. Second atomic console might use
> the same static buffer in IRQ context. But the first console
> will not realize it because it did not loose the per-CPU
> atomic lock when the CPU handled the interrupt..
> Or is this handled another way, please?

You are correct! Although I think 3 initdata static buffers should
suffice. (2 if the system does not support NMI).


Your feedback points out that we are allocating a lot of extra memory
for the rare case of a hostile takeover from another CPU when in
panic. I suppose it would be enough to have a single dedicated panic
buffer to use in this case.

With all that in mind, we would have 3 initdata early buffers, a single
panic buffer, and per-console buffers. So the function would look
something like this:

static __ref void cons_context_set_pbufs(struct cons_context *ctxt)
{
	struct console *con = ctxt->console;

	if (atomic_read(&panic_cpu) == smp_processor_id())
		ctxt->pbufs = &panic_ctxt_data.pbufs;
	else if (con->pbufs)
		ctxt->pbufs = con->pbufs;
	else
		ctxt->pbufs = &early_cons_ctxt_data[early_nbcon_nested].pbufs;
}

It should be enough to increment @early_nbcon_nested in cons_get_wctxt()
and decrement it in a new cons_put_wctxt() that is called after
cons_atomic_flush_con().


Originally in tglx's design, hostile takeovers were allowed at any time,
which requires the per-CPU data per console. His idea was that the
policy about hostile takeovers should be implemented outside the nbcons
framework. However, with this newly proposed change in order to avoid
per-CPU buffers for every console, we are adding an implicit rule that
hostile takeovers only occur at panic. Maybe it is ok to hard-code this
particular policy. It would certainly save significant buffer space and
I not sure if hostile takeovers make sense outside of a panic context.

John

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH printk v1 07/18] printk: nobkl: Add buffer management
  2023-03-23 13:38     ` John Ogness
@ 2023-03-23 15:25       ` Petr Mladek
  0 siblings, 0 replies; 92+ messages in thread
From: Petr Mladek @ 2023-03-23 15:25 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman

On Thu 2023-03-23 14:44:43, John Ogness wrote:
> On 2023-03-21, Petr Mladek <pmladek@suse.com> wrote:
> > The per-CPU buffer actually looks dangerous. It might
> > be used by more NOBKL consoles. How is the access synchronized
> > please? By console_list_lock? It is not obvious to me.
> 
> Each console has its own set of per-CPU buffers (con->pcpu_data).
> 
> > On the contrary, we might need 4 static buffers for the early
> > boot. For example, one atomic console might start printing
> > in the normal context. Second atomic console might use
> > the same static buffer in IRQ context. But the first console
> > will not realize it because it did not loose the per-CPU
> > atomic lock when the CPU handled the interrupt..
> > Or is this handled another way, please?
> 
> You are correct! Although I think 3 initdata static buffers should
> suffice. (2 if the system does not support NMI).

I am never completely sure about it. My undestanding is that softirq
might be proceed at the end if irq_exit():

  + irq_exit()
    + __irq_exit_rcu()
      + invoke_softirq()
	+ __do_softirq()

And I see local_irq_enable() in __do_softirq() before softirq actions
are proceed. It means that there might be 4 nested contexts:

   + task
   + softirq
   + irq
   + NMI

So we need 4 buffers (3 if the system does not support NMI).


> Your feedback points out that we are allocating a lot of extra memory
> for the rare case of a hostile takeover from another CPU when in
> panic. I suppose it would be enough to have a single dedicated panic
> buffer to use in this case.

Yup.

> With all that in mind, we would have 3 initdata early buffers, a single
> panic buffer, and per-console buffers. So the function would look
> something like this:
> 
> static __ref void cons_context_set_pbufs(struct cons_context *ctxt)
> {
> 	struct console *con = ctxt->console;
> 
> 	if (atomic_read(&panic_cpu) == smp_processor_id())
> 		ctxt->pbufs = &panic_ctxt_data.pbufs;
> 	else if (con->pbufs)
> 		ctxt->pbufs = con->pbufs;
> 	else
> 		ctxt->pbufs = &early_cons_ctxt_data[early_nbcon_nested].pbufs;
> }

Looks good.

> It should be enough to increment @early_nbcon_nested in cons_get_wctxt()
> and decrement it in a new cons_put_wctxt() that is called after
> cons_atomic_flush_con().

I still have to understand the logic related to
cons_atomic_flush_con() and early boot.


> Originally in tglx's design, hostile takeovers were allowed at any time,
> which requires the per-CPU data per console. His idea was that the
> policy about hostile takeovers should be implemented outside the nbcons
> framework. However, with this newly proposed change in order to avoid
> per-CPU buffers for every console, we are adding an implicit rule that
> hostile takeovers only occur at panic. Maybe it is ok to hard-code this
> particular policy. It would certainly save significant buffer space and
> I not sure if hostile takeovers make sense outside of a panic context.

I am not sure about the hostile takeovers as well. But they might be
potentially dangerous so I would allow them only in panic for a start.
And I would avoid the per-CPU buffers if we do not need them now.
We could always make it more complicated...

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH printk v1 08/18] printk: nobkl: Add sequence handling
  2023-03-02 19:56 ` [PATCH printk v1 08/18] printk: nobkl: Add sequence handling John Ogness
@ 2023-03-27 15:45   ` Petr Mladek
  0 siblings, 0 replies; 92+ messages in thread
From: Petr Mladek @ 2023-03-27 15:45 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman

On Thu 2023-03-02 21:02:08, John Ogness wrote:
> From: Thomas Gleixner <tglx@linutronix.de>
> 
> On 64bit systems the sequence tracking is embedded into the atomic
> console state, on 32bit it has to be stored in a separate atomic
> member. The latter needs to handle the non-atomicity in hostile
> takeover cases, while 64bit can completely rely on the state
> atomicity.

IMHO, the race on 32-bit is not a big problem. The message might be
printed twice even on 64-bit. And it does not matter which writer
updates the sequence number. The only important thing is that
the sequence number does not go backward. And it is quaranteed
by the cmpxchg.

By other words, IMHO, bundling the lower part of the sequence number
into the atomic state value does not bring any significant advantage.

On the contrary, using the same approach on both 32-bit and 64-bit
simplifies:

    + seq handling by removing one arch-specific variant
    + handling of the atomic state on 64-bit arch

> The ringbuffer sequence number is 64bit, but having a 32bit
> representation in the console is sufficient. If a console ever gets
> more than 2^31 records behind the ringbuffer then this is the least
> of the problems.

Nice trick.

> --- a/include/linux/console.h
> +++ b/include/linux/console.h
> @@ -259,6 +261,8 @@ struct cons_context {
>  	struct cons_state	old_state;
>  	struct cons_state	hov_state;
>  	struct cons_state	req_state;
> +	u64			oldseq;
> +	u64			newseq;

I would use old_seq, new_seq. I mean using the same naming scheme as
for req_cpu, req_prio.

>  	unsigned int		spinwait_max_us;
>  	enum cons_prio		prio;
>  	struct printk_buffers	*pbufs;
> @@ -328,6 +333,9 @@ struct console {
>  
>  	/* NOBKL console specific members */
>  	atomic_long_t		__private atomic_state[2];
> +#ifndef CONFIG_64BIT
> +	atomic_t		__private atomic_seq;

I would call it nbcon_seq;

> +#endif
>  	struct printk_buffers	*thread_pbufs;
>  	struct cons_context_data	__percpu *pcpu_data;
>  };
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -511,7 +511,7 @@ _DEFINE_PRINTKRB(printk_rb_static, CONFIG_LOG_BUF_SHIFT - PRB_AVGBITS,
>  
>  static struct printk_ringbuffer printk_rb_dynamic;
>  
> -static struct printk_ringbuffer *prb = &printk_rb_static;
> +struct printk_ringbuffer *prb = &printk_rb_static;
>  
>  /*
>   * We cannot access per-CPU data (e.g. per-CPU flush irq_work) before
> @@ -2728,30 +2728,39 @@ static bool abandon_console_lock_in_panic(void)
>  
>  /*
>   * Check if the given console is currently capable and allowed to print
> - * records.
> - *
> - * Requires the console_srcu_read_lock.
> + * records. If the caller only works with certain types of consoles, the
> + * caller is responsible for checking the console type before calling
> + * this function.
>   */
> -static inline bool console_is_usable(struct console *con)
> +static inline bool console_is_usable(struct console *con, short flags)
>  {
> -	short flags = console_srcu_read_flags(con);
> -
>  	if (!(flags & CON_ENABLED))
>  		return false;
>  
>  	if ((flags & CON_SUSPENDED))
>  		return false;
>  
> -	if (!con->write)
> -		return false;
> -
>  	/*
> -	 * Console drivers may assume that per-cpu resources have been
> -	 * allocated. So unless they're explicitly marked as being able to
> -	 * cope (CON_ANYTIME) don't call them until this CPU is officially up.
> +	 * The usability of a console varies depending on whether
> +	 * it is a NOBKL console or not.
>  	 */
> -	if (!cpu_online(raw_smp_processor_id()) && !(flags & CON_ANYTIME))
> -		return false;
> +
> +	if (flags & CON_NO_BKL) {
> +		if (have_boot_console)
> +			return false;
> +
> +	} else {
> +		if (!con->write)
> +			return false;

It is weird that we check whether con->write exists for the legacy consoles.
But we do not have similar check for CON_NO_BKL consoles.

The ttynull console might actually be a good candidate for a NOBKL console.

> +		/*
> +		 * Console drivers may assume that per-cpu resources have
> +		 * been allocated. So unless they're explicitly marked as
> +		 * being able to cope (CON_ANYTIME) don't call them until
> +		 * this CPU is officially up.
> +		 */
> +		if (!cpu_online(raw_smp_processor_id()) && !(flags & CON_ANYTIME))
> +			return false;

Anyway, I would prefer to put the console_is_usable() change into a
separate patch. It is too hidden here. And it is not really important
for the sequence number handling.

> +	}
>  
>  	return true;
>  }
> @@ -3001,9 +3010,14 @@ static bool console_flush_all(bool do_cond_resched, u64 *next_seq, bool *handove
>  
>  		cookie = console_srcu_read_lock();
>  		for_each_console_srcu(con) {
> +			short flags = console_srcu_read_flags(con);
>  			bool progress;
>  
> -			if (!console_is_usable(con))
> +			/* console_flush_all() is only for legacy consoles. */
> +			if (flags & CON_NO_BKL)
> +				continue;

Same here. This should be in a separate patch that would explain how
con_is_usable() and __flush__ is being changed for NOBKL consoles.

> +
> +			if (!console_is_usable(con, flags))
>  				continue;
>  			any_usable = true;
>  
> @@ -3775,10 +3789,23 @@ static bool __pr_flush(struct console *con, int timeout_ms, bool reset_on_progre
>  
>  		cookie = console_srcu_read_lock();
>  		for_each_console_srcu(c) {
> +			short flags;
> +
>  			if (con && con != c)
>  				continue;
> -			if (!console_is_usable(c))
> +
> +			flags = console_srcu_read_flags(c);
> +
> +			if (!console_is_usable(c, flags))
>  				continue;

Same here. Separate patch ...

> +			/*
> +			 * Since the console is locked, use this opportunity
> +			 * to update console->seq for NOBKL consoles.
> +			 */
> +			if (flags & CON_NO_BKL)
> +				c->seq = cons_read_seq(c);

This is controversial. c->seq will be out-of-date most of the time.
It might create more harm than good.

I would personally keep this value 0 or -1 for NOBKL consoles.

Note: I thought about putting it into union with the 32-bit atomic_seq.
  But it might cause confusion as well. People might read misleading
  values via con->seq variable. So, I would really keep them separate.


> +
>  			printk_seq = c->seq;
>  			if (printk_seq < seq)
>  				diff += seq - printk_seq;
> diff --git a/kernel/printk/printk_nobkl.c b/kernel/printk/printk_nobkl.c
> index 7db56ffd263a..7184a93a5b0d 100644
> --- a/kernel/printk/printk_nobkl.c
> +++ b/kernel/printk/printk_nobkl.c
> @@ -207,6 +208,227 @@ static __ref void cons_context_set_pbufs(struct cons_context *ctxt)
>  		ctxt->pbufs = &(this_cpu_ptr(con->pcpu_data)->pbufs);
>  }
>  
> +/**
> + * cons_seq_init - Helper function to initialize the console sequence
> + * @con:	Console to work on
> + *
> + * Set @con->atomic_seq to the starting record, or if that record no
> + * longer exists, the oldest available record. For init only. Do not
> + * use for runtime updates.
> + */
> +static void cons_seq_init(struct console *con)
> +{
> +	u32 seq = (u32)max_t(u64, con->seq, prb_first_valid_seq(prb));
> +#ifdef CONFIG_64BIT
> +	struct cons_state state;
> +
> +	cons_state_read(con, CON_STATE_CUR, &state);
> +	state.seq = seq;
> +	cons_state_set(con, CON_STATE_CUR, &state);
> +#else
> +	atomic_set(&ACCESS_PRIVATE(con, atomic_seq), seq);
> +#endif
> +}
> +
> +static inline u64 cons_expand_seq(u64 seq)

I would use "__" prefix to make it clear that it should be used
carefully. And I would add comment that it gives correct result
only when expanding an up-to-date (recently enough read) atomic_seq
values.

> +{
> +	u64 rbseq;
> +
> +	/*
> +	 * The provided sequence is only the lower 32bits of the ringbuffer
> +	 * sequence. It needs to be expanded to 64bit. Get the next sequence
> +	 * number from the ringbuffer and fold it.
> +	 */
> +	rbseq = prb_next_seq(prb);
> +	seq = rbseq - ((u32)rbseq - (u32)seq);
> +
> +	return seq;
> +}
> +
> +static bool cons_release(struct cons_context *ctxt);
> +static bool __maybe_unused cons_seq_try_update(struct cons_context *ctxt)
> +{
> +	struct console *con = ctxt->console;
> +	struct cons_state state;
> +	int pcpu;
> +	u32 old;
> +	u32 new;
> +
> +	/*
> +	 * There is a corner case that needs to be considered here:
> +	 *
> +	 * CPU0			CPU1
> +	 * printk()
> +	 *  acquire()		-> emergency
> +	 *  write()		   acquire()
> +	 *  update_seq()
> +	 *    state == OK
> +	 * --> NMI
> +	 *			   takeover()
> +	 * <---			     write()
> +	 *  cmpxchg() succeeds	     update_seq()
> +	 *			     cmpxchg() fails
> +	 *

Nice explanation. It perfectly explains what might happen.
Anyway, the main "issue" is that the same message is printed twice.
And it can't be fixed even when the seq number is bundled with
the state value.

Crazy idea:  The double output might be avoided if the bundle
   the index of the used buffer and printed character into the atomic
   state variable. Then the other writer might continue where the
   interrupted writer ended.

   But I am not sure if it is worth it. I would keep it easy and live
   with the duplicated lines. We could always make it more complicated
   later.

> +	 * There is nothing that can be done about this other than having
> +	 * yet another state bit that needs to be tracked and analyzed,
> +	 * but fails to cover the problem completely.
> +	 *
> +	 * No other scenarios expose such a problem. On same CPU takeovers
> +	 * the cmpxchg() always fails on the interrupted context after the
> +	 * interrupting context finished printing, but that's fine as it
> +	 * does not own the console anymore. The state check after the
> +	 * failed cmpxchg prevents that.
> +	 */
> +	cons_state_read(con, CON_STATE_CUR, &state);
> +	/* Make sure this context is still the owner. */
> +	if (!cons_state_bits_match(state, ctxt->state))
> +		return false;
> +
> +	/*
> +	 * Get the original sequence number that was retrieved
> +	 * from @con->atomic_seq. @con->atomic_seq should be still
> +	 * the same. 32bit truncates. See cons_context_set_seq().
> +	 */
> +	old = (u32)ctxt->oldseq;
> +	new = (u32)ctxt->newseq;
> +	if (atomic_try_cmpxchg(&ACCESS_PRIVATE(con, atomic_seq), &old, new)) {
> +		ctxt->oldseq = ctxt->newseq;
> +		return true;
> +	}
> +
> +	/*
> +	 * Reread the state. If this context does not own the console anymore
> +	 * then it cannot touch the sequence again.
> +	 */
> +	cons_state_read(con, CON_STATE_CUR, &state);
> +	if (!cons_state_bits_match(state, ctxt->state))
> +		return false;
> +
> +	pcpu = atomic_read(&panic_cpu);
> +	if (pcpu == smp_processor_id()) {
> +		/*
> +		 * This is the panic CPU. Emitting a warning here does not
> +		 * help at all. The callchain is clear and the priority is
> +		 * to get the messages out. In the worst case duplicated
> +		 * ones. That's a job for postprocessing.
> +		 */
> +		atomic_set(&ACCESS_PRIVATE(con, atomic_seq), new);

This is problematic. We are here when the above cmpxchg()
failed. IMHO, there might be two reasons:

     + someone else pushed the line or more lines in the meantime
     + someone did reset the sequence number to re-play the log again

In both cases, we should keep the value as it is. Do not update it.


On the contrary, this function always returns "true" when the above cmpxchg
succeeded. But the current context might have already lost the
lock. Like in the example in the comment. Which is a bit inconsistent
behavior.


I would personally split the function into two:

   1st function that will only try to update the atomic_seq using
      cmpxchg(). The semantic is simple. The caller emitted the entire
      record. Let's update the atomic_seq. Do not worry when it fails.
      It means that we probably lost the lock. But the caller need to
      recheck the lock anyway.

   2nd function will check is we are still the owner. cons_can_proceed()
      already does the job. It needs to be called anyway before doing
      more changes on the console.


> +		ctxt->oldseq = ctxt->newseq;
> +		return true;
> +	}
> +
> +	/*
> +	 * Only emit a warning when this happens outside of a panic
> +	 * situation as on panic it's neither useful nor helping to let the
> +	 * panic CPU get the important stuff out.
> +	 */
> +	WARN_ON_ONCE(pcpu == PANIC_CPU_INVALID);
> +
> +	cons_release(ctxt);
> +	return false;
> +}
> +#endif

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: union: was: Re: [PATCH printk v1 05/18] printk: Add non-BKL console basic infrastructure
  2023-03-21 16:04   ` union: was: " Petr Mladek
@ 2023-03-27 16:28     ` John Ogness
  2023-03-28  8:20       ` Petr Mladek
  0 siblings, 1 reply; 92+ messages in thread
From: John Ogness @ 2023-03-27 16:28 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman, linux-fsdevel

On 2023-03-21, Petr Mladek <pmladek@suse.com> wrote:
> It is not completely clear that that this struct is stored
> as atomic_long_t atomic_state[2] in struct console.
>
> What about adding?
>
> 		atomic_long_t atomic;

The struct is used to simplify interpretting and creating values to be
stored in the atomic state variable. I do not think it makes sense that
the atomic variable type itself is part of it.

> Anyway, we should at least add a comment into struct console
> about that atomic_state[2] is used to store and access
> struct cons_state an atomic way. Also add a compilation
> check that the size is the same.

A compilation check would be nice. Is that possible?

I am renaming the struct to nbcon_state. Also the variable will be
called nbcon_state. With the description updated, I think it makes it
clearer that "struct nbcon_state" is used to interpret/create values of
console->nbcon_state.

John Ogness

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: union: was: Re: [PATCH printk v1 05/18] printk: Add non-BKL console basic infrastructure
  2023-03-27 16:28     ` John Ogness
@ 2023-03-28  8:20       ` Petr Mladek
  2023-03-28  9:42         ` John Ogness
  0 siblings, 1 reply; 92+ messages in thread
From: Petr Mladek @ 2023-03-28  8:20 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman, linux-fsdevel

On Mon 2023-03-27 18:34:22, John Ogness wrote:
> On 2023-03-21, Petr Mladek <pmladek@suse.com> wrote:
> > It is not completely clear that that this struct is stored
> > as atomic_long_t atomic_state[2] in struct console.
> >
> > What about adding?
> >
> > 		atomic_long_t atomic;
> 
> The struct is used to simplify interpretting and creating values to be
> stored in the atomic state variable. I do not think it makes sense that
> the atomic variable type itself is part of it.

It was just an idea. Feel free to keep it as is (not to add the atomic
into the union).

> > Anyway, we should at least add a comment into struct console
> > about that atomic_state[2] is used to store and access
> > struct cons_state an atomic way. Also add a compilation
> > check that the size is the same.
> 
> A compilation check would be nice. Is that possible?

I think the following might do the trick:

static_assert(sizeof(struct cons_state) == sizeof(atomic_long_t));


> I am renaming the struct to nbcon_state. Also the variable will be
> called nbcon_state. With the description updated, I think it makes it
> clearer that "struct nbcon_state" is used to interpret/create values of
> console->nbcon_state.

Sounds good.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: union: was: Re: [PATCH printk v1 05/18] printk: Add non-BKL console basic infrastructure
  2023-03-28  8:20       ` Petr Mladek
@ 2023-03-28  9:42         ` John Ogness
  2023-03-28 12:52           ` Petr Mladek
  2023-03-28 13:47           ` Steven Rostedt
  0 siblings, 2 replies; 92+ messages in thread
From: John Ogness @ 2023-03-28  9:42 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman, linux-fsdevel

On 2023-03-28, Petr Mladek <pmladek@suse.com> wrote:
>> A compilation check would be nice. Is that possible?
>
> I think the following might do the trick:
>
> static_assert(sizeof(struct cons_state) == sizeof(atomic_long_t));

I never realized the kernel code was allowed to have that. But it is
everywhere! :-) Thanks. I've added and tested the following:

/*
 * The nbcon_state struct is used to easily create and interpret values that
 * are stored in the console.nbcon_state variable. Make sure this struct stays
 * within the size boundaries of that atomic variable's underlying type in
 * order to avoid any accidental truncation.
 */
static_assert(sizeof(struct nbcon_state) <= sizeof(long));

Note that I am checking against sizeof(long), the underlying variable
type. We probably shouldn't assume sizeof(atomic_long_t) is always
sizeof(long).

John

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: union: was: Re: [PATCH printk v1 05/18] printk: Add non-BKL console basic infrastructure
  2023-03-28  9:42         ` John Ogness
@ 2023-03-28 12:52           ` Petr Mladek
  2023-03-28 13:47           ` Steven Rostedt
  1 sibling, 0 replies; 92+ messages in thread
From: Petr Mladek @ 2023-03-28 12:52 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman, linux-fsdevel

On Tue 2023-03-28 11:48:06, John Ogness wrote:
> On 2023-03-28, Petr Mladek <pmladek@suse.com> wrote:
> >> A compilation check would be nice. Is that possible?
> >
> > I think the following might do the trick:
> >
> > static_assert(sizeof(struct cons_state) == sizeof(atomic_long_t));
> 
> I never realized the kernel code was allowed to have that. But it is
> everywhere! :-) Thanks. I've added and tested the following:
> 
> /*
>  * The nbcon_state struct is used to easily create and interpret values that
>  * are stored in the console.nbcon_state variable. Make sure this struct stays
>  * within the size boundaries of that atomic variable's underlying type in
>  * order to avoid any accidental truncation.
>  */
> static_assert(sizeof(struct nbcon_state) <= sizeof(long));
> 
> Note that I am checking against sizeof(long), the underlying variable
> type. We probably shouldn't assume sizeof(atomic_long_t) is always
> sizeof(long).

Makes sense and looks good to me.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 92+ messages in thread

* locking API: was: [PATCH printk v1 00/18] serial: 8250: implement non-BKL console
  2023-03-02 19:58 ` [PATCH printk v1 00/18] serial: 8250: implement non-BKL console John Ogness
@ 2023-03-28 13:33   ` Petr Mladek
  2023-03-28 13:57     ` John Ogness
  2023-03-28 13:59   ` [PATCH printk v1 00/18] POC: serial: 8250: implement nbcon console John Ogness
  1 sibling, 1 reply; 92+ messages in thread
From: Petr Mladek @ 2023-03-28 13:33 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Jason Wessel, Daniel Thompson, Douglas Anderson,
	Aaron Tomlin, Luis Chamberlain, kgdb-bugreport,
	Greg Kroah-Hartman, linux-fsdevel, Andrew Morton,
	Guilherme G. Piccoli, David Gow, Tiezhu Yang, Daniel Vetter,
	tangmeng, Paul E. McKenney, Frederic Weisbecker, Neeraj Upadhyay,
	Josh Triplett, Mathieu Desnoyers, Lai Jiangshan, Joel Fernandes,
	rcu

On Thu 2023-03-02 21:04:50, John Ogness wrote:
> Implement the necessary callbacks to allow the 8250 console driver
> to perform as a non-BKL console. Remove the implementation for the
> legacy console callback (write) and add implementations for the
> non-BKL consoles (write_atomic, write_thread, port_lock) and add
> CON_NO_BKL to the initial flags.
> 
> This is an all-in-one commit meant only for testing the new printk
> non-BKL infrastructure. It is not meant to be included mainline in
> this form. In particular, it includes mainline driver fixes that
> need to be submitted individually.
> 
> Although non-BKL consoles can coexist with legacy consoles, you
> will only receive all the benefits of the non-BKL consoles, if
> this console driver is the only console. That means no netconsole,
> no tty1, no earlyprintk, no earlycon. Just the uart8250.
> 
> For example: console=ttyS0,115200
> 
> --- a/drivers/tty/serial/8250/8250_port.c
> +++ b/drivers/tty/serial/8250/8250_port.c
> +static void atomic_console_reacquire(struct cons_write_context *wctxt,
> +				     struct cons_write_context *wctxt_init)
> +{
> +	memcpy(wctxt, wctxt_init, sizeof(*wctxt));
> +	while (!console_try_acquire(wctxt)) {
> +		cpu_relax();
> +		memcpy(wctxt, wctxt_init, sizeof(*wctxt));
> +	}
> +}
> +
>  /*
> - * Print a string to the serial port using the device FIFO
> - *
> - * It sends fifosize bytes and then waits for the fifo
> - * to get empty.
> + * It should be possible to support a hostile takeover in an unsafe
> + * section if it is write_atomic() that is being taken over. But where
> + * to put this policy?
>   */
> -static void serial8250_console_fifo_write(struct uart_8250_port *up,
> -					  const char *s, unsigned int count)
> +bool serial8250_console_write_atomic(struct uart_8250_port *up,
> +				     struct cons_write_context *wctxt)
>  {
> -	int i;
> -	const char *end = s + count;
> -	unsigned int fifosize = up->tx_loadsz;
> -	bool cr_sent = false;
> -
> -	while (s != end) {
> -		wait_for_lsr(up, UART_LSR_THRE);
> -
> -		for (i = 0; i < fifosize && s != end; ++i) {
> -			if (*s == '\n' && !cr_sent) {
> -				serial_out(up, UART_TX, '\r');
> -				cr_sent = true;
> -			} else {
> -				serial_out(up, UART_TX, *s++);
> -				cr_sent = false;
> -			}
> +	struct cons_write_context wctxt_init = {};
> +	struct cons_context *ctxt_init = &ACCESS_PRIVATE(&wctxt_init, ctxt);
> +	struct cons_context *ctxt = &ACCESS_PRIVATE(wctxt, ctxt);
> +	bool can_print = true;
> +	unsigned int ier;
> +
> +	/* With write_atomic, another context may hold the port->lock. */
> +
> +	ctxt_init->console = ctxt->console;
> +	ctxt_init->prio = ctxt->prio;
> +	ctxt_init->thread = ctxt->thread;
> +
> +	touch_nmi_watchdog();
> +
> +	/*
> +	 * Enter unsafe in order to disable interrupts. If the console is
> +	 * lost before the interrupts are disabled, bail out because another
> +	 * context took over the printing. If the console is lost after the
> +	 * interrutps are disabled, the console must be reacquired in order
> +	 * to re-enable the interrupts. However in that case no printing is
> +	 * allowed because another context took over the printing.
> +	 */
> +
> +	if (!console_enter_unsafe(wctxt))
> +		return false;
> +
> +	if (!__serial8250_clear_IER(up, wctxt, &ier))
> +		return false;
> +
> +	if (console_exit_unsafe(wctxt)) {
> +		can_print = atomic_print_line(up, wctxt);
> +		if (!can_print)
> +			atomic_console_reacquire(wctxt, &wctxt_init);

I am trying to review the 9th patch adding console_can_proceed(),
console_enter_unsafe(), console_exit_unsafe() API. And I wanted
to see how the struct cons_write_context was actually used.

I am confused now. I do not understand the motivation for the extra
@wctxt_init copy and atomic_console_reacquire().

Why do we need a copy? And why we need to reacquire it?

My feeling is that it is needed only to call
console_exit_unsafe(wctxt) later. Or do I miss anything?

> +
> +		if (can_print) {
> +			can_print = console_can_proceed(wctxt);
> +			if (can_print)
> +				wait_for_xmitr(up, UART_LSR_BOTH_EMPTY);
> +			else
> +				atomic_console_reacquire(wctxt, &wctxt_init);
> +		}
> +	} else {
> +		atomic_console_reacquire(wctxt, &wctxt_init);
> +	}
> +
> +	/*
> +	 * Enter unsafe in order to enable interrupts. If the console is
> +	 * lost before the interrupts are enabled, the console must be
> +	 * reacquired in order to re-enable the interrupts.
> +	 */
> +
> +	for (;;) {
> +		if (console_enter_unsafe(wctxt) &&
> +		    __serial8250_set_IER(up, wctxt, ier)) {
> +			break;
>  		}
> +
> +		/* HW-IRQs still disabled. Reacquire to enable them. */
> +		atomic_console_reacquire(wctxt, &wctxt_init);
>  	}
> +
> +	console_exit_unsafe(wctxt);
> +
> +	return can_print;
>  }

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: union: was: Re: [PATCH printk v1 05/18] printk: Add non-BKL console basic infrastructure
  2023-03-28  9:42         ` John Ogness
  2023-03-28 12:52           ` Petr Mladek
@ 2023-03-28 13:47           ` Steven Rostedt
  1 sibling, 0 replies; 92+ messages in thread
From: Steven Rostedt @ 2023-03-28 13:47 UTC (permalink / raw)
  To: John Ogness
  Cc: Petr Mladek, Sergey Senozhatsky, Thomas Gleixner, linux-kernel,
	Greg Kroah-Hartman, linux-fsdevel

On Tue, 28 Mar 2023 11:48:06 +0206
John Ogness <john.ogness@linutronix.de> wrote:

> > static_assert(sizeof(struct cons_state) == sizeof(atomic_long_t));  
> 
> I never realized the kernel code was allowed to have that. But it is
> everywhere! :-) Thanks. I've added and tested the following:

I didn't know about static_assert(), as I always used BUILD_BUG_ON().

The difference being that BUILD_BUG_ON() has to be used within a function,
where as static_assert() can be done outside of functions. Hmm, maybe I can
convert some of my BUILD_BUG_ON()s to static_assert()s.

Learn something new every day.

-- Steve

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: locking API: was: [PATCH printk v1 00/18] serial: 8250: implement non-BKL console
  2023-03-28 13:33   ` locking API: was: " Petr Mladek
@ 2023-03-28 13:57     ` John Ogness
  2023-03-28 15:10       ` Petr Mladek
  0 siblings, 1 reply; 92+ messages in thread
From: John Ogness @ 2023-03-28 13:57 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Jason Wessel, Daniel Thompson, Douglas Anderson,
	Aaron Tomlin, Luis Chamberlain, kgdb-bugreport,
	Greg Kroah-Hartman, linux-fsdevel, Andrew Morton,
	Guilherme G. Piccoli, David Gow, Tiezhu Yang, Daniel Vetter,
	tangmeng, Paul E. McKenney, Frederic Weisbecker, Neeraj Upadhyay,
	Josh Triplett, Mathieu Desnoyers, Lai Jiangshan, Joel Fernandes,
	rcu

On 2023-03-28, Petr Mladek <pmladek@suse.com> wrote:
>> +	if (!__serial8250_clear_IER(up, wctxt, &ier))
>> +		return false;
>> +
>> +	if (console_exit_unsafe(wctxt)) {
>> +		can_print = atomic_print_line(up, wctxt);
>> +		if (!can_print)
>> +			atomic_console_reacquire(wctxt, &wctxt_init);
>
> I am trying to review the 9th patch adding console_can_proceed(),
> console_enter_unsafe(), console_exit_unsafe() API. And I wanted
> to see how the struct cons_write_context was actually used.

First off, I need to post the latest version of the 8250-POC patch. It
is not officially part of this series and is still going through changes
for the PREEMPT_RT tree. I will post the latest version directly after
answering this email.

> I am confused now. I do not understand the motivation for the extra
> @wctxt_init copy and atomic_console_reacquire().

If an atomic context loses ownership while doing certain activities, it
may need to re-acquire ownership in order to finish or cleanup what it
started.

> Why do we need a copy?

When ownership is lost, the context is cleared. In order to re-acquire,
an original copy of the context is needed. There is no technical reason
to clear the context, so maybe the context should not be cleared after a
takeover. Otherwise, many drivers will need to implement the "backup
copy" solution.

> And why we need to reacquire it?

In this particular case the context has disabled interrupts. No other
context will re-enable interrupts because the driver is implemented such
that the one who disables is the one who enables. So this context must
re-acquire ownership in order to re-enable interrupts.

> My feeling is that it is needed only to call
> console_exit_unsafe(wctxt) later. Or do I miss anything?

No. It is only about re-enabling interrupts. The concept of unsafe is
not really relevant if a hostile takeover during unsafe occurs. In that
case it becomes a "hope and pray" effort at the end of panic().

John

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH printk v1 00/18] POC: serial: 8250: implement nbcon console
  2023-03-02 19:58 ` [PATCH printk v1 00/18] serial: 8250: implement non-BKL console John Ogness
  2023-03-28 13:33   ` locking API: was: " Petr Mladek
@ 2023-03-28 13:59   ` John Ogness
  1 sibling, 0 replies; 92+ messages in thread
From: John Ogness @ 2023-03-28 13:59 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Jason Wessel, Daniel Thompson, Douglas Anderson,
	Aaron Tomlin, Luis Chamberlain, kgdb-bugreport,
	Greg Kroah-Hartman, linux-fsdevel, Andrew Morton,
	Guilherme G. Piccoli, David Gow, Tiezhu Yang, Daniel Vetter,
	tangmeng, Paul E. McKenney, Frederic Weisbecker, Neeraj Upadhyay,
	Josh Triplett, Mathieu Desnoyers, Lai Jiangshan, Joel Fernandes,
	rcu

Implement the necessary callbacks to allow the 8250 console driver
to perform as a nbcon console. Remove the implementation for the
legacy console callback (write) and add implementations for the
nbcon consoles (write_atomic, write_thread, port_lock) and add
CON_NBCON to the initial flags.

Although nbcon consoles can coexist with legacy consoles, you
will only receive all the benefits of the nbcon consoles if
this console driver is the only console. That means no netconsole,
no tty1, no earlyprintk, no earlycon. Just the uart8250.

For example: console=ttyS0,115200

Signed-off-by: John Ogness <john.ogness@linutronix.de>
---
 drivers/tty/serial/8250/8250.h              | 269 +++++++++++++-
 drivers/tty/serial/8250/8250_aspeed_vuart.c |   2 +-
 drivers/tty/serial/8250/8250_bcm7271.c      |  23 +-
 drivers/tty/serial/8250/8250_core.c         |  50 ++-
 drivers/tty/serial/8250/8250_exar.c         |   8 +-
 drivers/tty/serial/8250/8250_fsl.c          |   3 +-
 drivers/tty/serial/8250/8250_mtk.c          |  30 +-
 drivers/tty/serial/8250/8250_omap.c         |  36 +-
 drivers/tty/serial/8250/8250_port.c         | 369 +++++++++++++++-----
 drivers/tty/serial/8250/Kconfig             |   1 +
 drivers/tty/serial/serial_core.c            |  10 +-
 include/linux/serial_8250.h                 |  11 +-
 12 files changed, 686 insertions(+), 126 deletions(-)

diff --git a/drivers/tty/serial/8250/8250.h b/drivers/tty/serial/8250/8250.h
index 287153d32536..beb77af4ec62 100644
--- a/drivers/tty/serial/8250/8250.h
+++ b/drivers/tty/serial/8250/8250.h
@@ -177,12 +177,277 @@ static inline void serial_dl_write(struct uart_8250_port *up, int value)
 	up->dl_write(up, value);
 }
 
+static inline bool serial8250_is_console(struct uart_port *port)
+{
+	return uart_console(port) && !hlist_unhashed_lockless(&port->cons->node);
+}
+
+/**
+ * serial8250_init_wctxt - Initialize a write context for
+ *	non-console-printing usage
+ * @wctxt:	The write context to initialize
+ * @cons:	The console to assign to the write context
+ *
+ * In order to mark an unsafe region, drivers must acquire the console. This
+ * requires providing an initialized write context (even if that driver will
+ * not be doing any printing).
+ *
+ * This function should not be used for console printing contexts.
+ */
+static inline void serial8250_init_wctxt(struct nbcon_write_context *wctxt,
+					 struct console *cons)
+{
+	struct nbcon_context *ctxt = &ACCESS_PRIVATE(wctxt, ctxt);
+
+	memset(wctxt, 0, sizeof(*wctxt));
+	ctxt->console = cons;
+	ctxt->prio = NBCON_PRIO_NORMAL;
+}
+
+/**
+ * __serial8250_console_acquire - Acquire a console for
+ *	non-console-printing usage
+ * @wctxt:	An uninitialized write context to use for acquiring
+ * @cons:	The console to assign to the write context
+ *
+ * The caller is holding the port->lock.
+ * The caller is holding the console_srcu_read_lock.
+ *
+ * This function should not be used for console printing contexts.
+ */
+static inline void __serial8250_console_acquire(struct nbcon_write_context *wctxt,
+						struct console *cons)
+{
+	for (;;) {
+		serial8250_init_wctxt(wctxt, cons);
+		if (nbcon_try_acquire(wctxt))
+			break;
+		cpu_relax();
+	}
+}
+
+/**
+ * serial8250_enter_unsafe - Mark the beginning of an unsafe region for
+ *		non-console-printing usage
+ * @up:	The port that is entering the unsafe state
+ *
+ * The caller should ensure @up is a console before calling this function.
+ *
+ * The caller is holding the port->lock.
+ * This function takes the console_srcu_read_lock and becomes owner of the
+ * console associated with @up.
+ *
+ * This function should not be used for console printing contexts.
+ */
+static inline void serial8250_enter_unsafe(struct uart_8250_port *up)
+{
+	struct uart_port *port = &up->port;
+
+	lockdep_assert_held_once(&port->lock);
+
+	for (;;) {
+		up->cookie = console_srcu_read_lock();
+
+		__serial8250_console_acquire(&up->wctxt, port->cons);
+
+		if (nbcon_enter_unsafe(&up->wctxt))
+			break;
+
+		console_srcu_read_unlock(up->cookie);
+		cpu_relax();
+	}
+}
+
+/**
+ * serial8250_exit_unsafe - Mark the end of an unsafe region for
+ *		non-console-printing usage
+ * @up:	The port that is exiting the unsafe state
+ *
+ * The caller is holding the port->lock.
+ * This function releases ownership of the console associated with @up and
+ * releases the console_srcu_read_lock.
+ *
+ * This function should not be used for console printing contexts.
+ */
+static inline void serial8250_exit_unsafe(struct uart_8250_port *up)
+{
+	struct uart_port *port = &up->port;
+
+	lockdep_assert_held_once(&port->lock);
+
+	if (nbcon_exit_unsafe(&up->wctxt))
+		nbcon_release(&up->wctxt);
+
+	console_srcu_read_unlock(up->cookie);
+}
+
+/**
+ * serial8250_in_IER - Read the IER register for
+ *		non-console-printing usage
+ * @up:	The port to work on
+ *
+ * Returns:	The value read from IER
+ *
+ * The caller is holding the port->lock.
+ *
+ * This is the top-level function for non-console-printing contexts to
+ * read the IER register. The caller does not need to care if @up is a
+ * console before calling this function.
+ *
+ * This function should not be used for printing contexts.
+ */
+static inline int serial8250_in_IER(struct uart_8250_port *up)
+{
+	struct uart_port *port = &up->port;
+	bool is_console;
+	int ier;
+
+	is_console = serial8250_is_console(port);
+
+	if (is_console)
+		serial8250_enter_unsafe(up);
+
+	ier = serial_in(up, UART_IER);
+
+	if (is_console)
+		serial8250_exit_unsafe(up);
+
+	return ier;
+}
+
+/**
+ * __serial8250_set_IER - Directly write to the IER register
+ * @up:		The port to work on
+ * @wctxt:	The current write context
+ * @ier:	The value to write
+ *
+ * Returns:	True if IER was written to. False otherwise
+ *
+ * The caller is holding the port->lock.
+ * The caller is holding the console_srcu_read_unlock.
+ * The caller is the owner of the console associated with @up.
+ *
+ * This function should only be directly called within console printing
+ * contexts. Other contexts should use serial8250_set_IER().
+ */
+static inline bool __serial8250_set_IER(struct uart_8250_port *up,
+					struct nbcon_write_context *wctxt,
+					int ier)
+{
+	if (wctxt && !nbcon_can_proceed(wctxt))
+		return false;
+	serial_out(up, UART_IER, ier);
+	return true;
+}
+
+/**
+ * serial8250_set_IER - Write a new value to the IER register for
+ *	non-console-printing usage
+ * @up:		The port to work on
+ * @ier:	The value to write
+ *
+ * The caller is holding the port->lock.
+ *
+ * This is the top-level function for non-console-printing contexts to
+ * write to the IER register. The caller does not need to care if @up is a
+ * console before calling this function.
+ *
+ * This function should not be used for printing contexts.
+ */
+static inline void serial8250_set_IER(struct uart_8250_port *up, int ier)
+{
+	struct uart_port *port = &up->port;
+	bool is_console;
+
+	is_console = serial8250_is_console(port);
+
+	if (is_console) {
+		serial8250_enter_unsafe(up);
+		while (!__serial8250_set_IER(up, &up->wctxt, ier)) {
+			console_srcu_read_unlock(up->cookie);
+			nbcon_enter_unsafe(&up->wctxt);
+		}
+		serial8250_exit_unsafe(up);
+	} else {
+		__serial8250_set_IER(up, NULL, ier);
+	}
+}
+
+/**
+ * __serial8250_clear_IER - Directly clear the IER register
+ * @up:		The port to work on
+ * @wctxt:	The current write context
+ * @prior:	Gets set to the previous value of IER
+ *
+ * Returns:	True if IER was cleared and @prior points to the previous
+ *		value of IER. False otherwise and @prior is invalid
+ *
+ * The caller is holding the port->lock.
+ * The caller is holding the console_srcu_read_unlock.
+ * The caller is the owner of the console associated with @up.
+ *
+ * This function should only be directly called within console printing
+ * contexts. Other contexts should use serial8250_clear_IER().
+ */
+static inline bool __serial8250_clear_IER(struct uart_8250_port *up,
+					  struct nbcon_write_context *wctxt,
+					  int *prior)
+{
+	unsigned int clearval = 0;
+
+	if (up->capabilities & UART_CAP_UUE)
+		clearval = UART_IER_UUE;
+
+	*prior = serial_in(up, UART_IER);
+	if (wctxt && !nbcon_can_proceed(wctxt))
+		return false;
+	serial_out(up, UART_IER, clearval);
+	return true;
+}
+
+/**
+ * serial8250_clear_IER - Clear the IER register for
+ *		non-console-printing usage
+ * @up:	The port to work on
+ *
+ * Returns:	The previous value of IER
+ *
+ * The caller is holding the port->lock.
+ *
+ * This is the top-level function for non-console-printing contexts to
+ * clear the IER register. The caller does not need to care if @up is a
+ * console before calling this function.
+ *
+ * This function should not be used for printing contexts.
+ */
+static inline int serial8250_clear_IER(struct uart_8250_port *up)
+{
+	struct uart_port *port = &up->port;
+	bool is_console;
+	int prior;
+
+	is_console = serial8250_is_console(port);
+
+	if (is_console) {
+		serial8250_enter_unsafe(up);
+		while (!__serial8250_clear_IER(up, &up->wctxt, &prior)) {
+			console_srcu_read_unlock(up->cookie);
+			nbcon_enter_unsafe(&up->wctxt);
+		}
+		serial8250_exit_unsafe(up);
+	} else {
+		__serial8250_clear_IER(up, NULL, &prior);
+	}
+
+	return prior;
+}
+
 static inline bool serial8250_set_THRI(struct uart_8250_port *up)
 {
 	if (up->ier & UART_IER_THRI)
 		return false;
 	up->ier |= UART_IER_THRI;
-	serial_out(up, UART_IER, up->ier);
+	serial8250_set_IER(up, up->ier);
 	return true;
 }
 
@@ -191,7 +456,7 @@ static inline bool serial8250_clear_THRI(struct uart_8250_port *up)
 	if (!(up->ier & UART_IER_THRI))
 		return false;
 	up->ier &= ~UART_IER_THRI;
-	serial_out(up, UART_IER, up->ier);
+	serial8250_set_IER(up, up->ier);
 	return true;
 }
 
diff --git a/drivers/tty/serial/8250/8250_aspeed_vuart.c b/drivers/tty/serial/8250/8250_aspeed_vuart.c
index 9d2a7856784f..7cc6b527c088 100644
--- a/drivers/tty/serial/8250/8250_aspeed_vuart.c
+++ b/drivers/tty/serial/8250/8250_aspeed_vuart.c
@@ -278,7 +278,7 @@ static void __aspeed_vuart_set_throttle(struct uart_8250_port *up,
 	up->ier &= ~irqs;
 	if (!throttle)
 		up->ier |= irqs;
-	serial_out(up, UART_IER, up->ier);
+	serial8250_set_IER(up, up->ier);
 }
 static void aspeed_vuart_set_throttle(struct uart_port *port, bool throttle)
 {
diff --git a/drivers/tty/serial/8250/8250_bcm7271.c b/drivers/tty/serial/8250/8250_bcm7271.c
index ed5a94747692..c6f2cd3f19b5 100644
--- a/drivers/tty/serial/8250/8250_bcm7271.c
+++ b/drivers/tty/serial/8250/8250_bcm7271.c
@@ -606,8 +606,10 @@ static int brcmuart_startup(struct uart_port *port)
 	 * Disable the Receive Data Interrupt because the DMA engine
 	 * will handle this.
 	 */
+	spin_lock_irq(&port->lock);
 	up->ier &= ~UART_IER_RDI;
-	serial_port_out(port, UART_IER, up->ier);
+	serial8250_set_IER(up, up->ier);
+	spin_unlock_irq(&port->lock);
 
 	priv->tx_running = false;
 	priv->dma.rx_dma = NULL;
@@ -787,6 +789,12 @@ static int brcmuart_handle_irq(struct uart_port *p)
 		spin_lock_irqsave(&p->lock, flags);
 		status = serial_port_in(p, UART_LSR);
 		if ((status & UART_LSR_DR) == 0) {
+			bool is_console;
+
+			is_console = serial8250_is_console(p);
+
+			if (is_console)
+				serial8250_enter_unsafe(up);
 
 			ier = serial_port_in(p, UART_IER);
 			/*
@@ -807,6 +815,9 @@ static int brcmuart_handle_irq(struct uart_port *p)
 				serial_port_in(p, UART_RX);
 			}
 
+			if (is_console)
+				serial8250_exit_unsafe(up);
+
 			handled = 1;
 		}
 		spin_unlock_irqrestore(&p->lock, flags);
@@ -844,12 +855,22 @@ static enum hrtimer_restart brcmuart_hrtimer_func(struct hrtimer *t)
 	/* re-enable receive unless upper layer has disabled it */
 	if ((up->ier & (UART_IER_RLSI | UART_IER_RDI)) ==
 	    (UART_IER_RLSI | UART_IER_RDI)) {
+		bool is_console;
+
+		is_console = serial8250_is_console(p);
+
+		if (is_console)
+			serial8250_enter_unsafe(up);
+
 		status = serial_port_in(p, UART_IER);
 		status |= (UART_IER_RLSI | UART_IER_RDI);
 		serial_port_out(p, UART_IER, status);
 		status = serial_port_in(p, UART_MCR);
 		status |= UART_MCR_RTS;
 		serial_port_out(p, UART_MCR, status);
+
+		if (is_console)
+			serial8250_exit_unsafe(up);
 	}
 	spin_unlock_irqrestore(&p->lock, flags);
 	return HRTIMER_NORESTART;
diff --git a/drivers/tty/serial/8250/8250_core.c b/drivers/tty/serial/8250/8250_core.c
index ab63c308be0a..6910160a5cec 100644
--- a/drivers/tty/serial/8250/8250_core.c
+++ b/drivers/tty/serial/8250/8250_core.c
@@ -256,6 +256,7 @@ static void serial8250_timeout(struct timer_list *t)
 static void serial8250_backup_timeout(struct timer_list *t)
 {
 	struct uart_8250_port *up = from_timer(up, t, timer);
+	struct uart_port *port = &up->port;
 	unsigned int iir, ier = 0, lsr;
 	unsigned long flags;
 
@@ -266,8 +267,23 @@ static void serial8250_backup_timeout(struct timer_list *t)
 	 * based handler.
 	 */
 	if (up->port.irq) {
+		bool is_console;
+
+		/*
+		 * Do not use serial8250_clear_IER() because this code
+		 * ignores capabilties.
+		 */
+
+		is_console = serial8250_is_console(port);
+
+		if (is_console)
+			serial8250_enter_unsafe(up);
+
 		ier = serial_in(up, UART_IER);
 		serial_out(up, UART_IER, 0);
+
+		if (is_console)
+			serial8250_exit_unsafe(up);
 	}
 
 	iir = serial_in(up, UART_IIR);
@@ -290,7 +306,7 @@ static void serial8250_backup_timeout(struct timer_list *t)
 		serial8250_tx_chars(up);
 
 	if (up->port.irq)
-		serial_out(up, UART_IER, ier);
+		serial8250_set_IER(up, ier);
 
 	spin_unlock_irqrestore(&up->port.lock, flags);
 
@@ -576,12 +592,30 @@ serial8250_register_ports(struct uart_driver *drv, struct device *dev)
 
 #ifdef CONFIG_SERIAL_8250_CONSOLE
 
-static void univ8250_console_write(struct console *co, const char *s,
-				   unsigned int count)
+static void univ8250_console_port_lock(struct console *con, bool do_lock, unsigned long *flags)
+{
+	struct uart_8250_port *up = &serial8250_ports[con->index];
+
+	if (do_lock)
+		spin_lock_irqsave(&up->port.lock, *flags);
+	else
+		spin_unlock_irqrestore(&up->port.lock, *flags);
+}
+
+static bool univ8250_console_write_atomic(struct console *co,
+					  struct nbcon_write_context *wctxt)
+{
+	struct uart_8250_port *up = &serial8250_ports[co->index];
+
+	return serial8250_console_write_atomic(up, wctxt);
+}
+
+static bool univ8250_console_write_thread(struct console *co,
+					  struct nbcon_write_context *wctxt)
 {
 	struct uart_8250_port *up = &serial8250_ports[co->index];
 
-	serial8250_console_write(up, s, count);
+	return serial8250_console_write_thread(up, wctxt);
 }
 
 static int univ8250_console_setup(struct console *co, char *options)
@@ -669,12 +703,14 @@ static int univ8250_console_match(struct console *co, char *name, int idx,
 
 static struct console univ8250_console = {
 	.name		= "ttyS",
-	.write		= univ8250_console_write,
+	.write_atomic	= univ8250_console_write_atomic,
+	.write_thread	= univ8250_console_write_thread,
+	.port_lock	= univ8250_console_port_lock,
 	.device		= uart_console_device,
 	.setup		= univ8250_console_setup,
 	.exit		= univ8250_console_exit,
 	.match		= univ8250_console_match,
-	.flags		= CON_PRINTBUFFER | CON_ANYTIME,
+	.flags		= CON_PRINTBUFFER | CON_ANYTIME | CON_NBCON,
 	.index		= -1,
 	.data		= &serial8250_reg,
 };
@@ -962,7 +998,7 @@ static void serial_8250_overrun_backoff_work(struct work_struct *work)
 	spin_lock_irqsave(&port->lock, flags);
 	up->ier |= UART_IER_RLSI | UART_IER_RDI;
 	up->port.read_status_mask |= UART_LSR_DR;
-	serial_out(up, UART_IER, up->ier);
+	serial8250_set_IER(up, up->ier);
 	spin_unlock_irqrestore(&port->lock, flags);
 }
 
diff --git a/drivers/tty/serial/8250/8250_exar.c b/drivers/tty/serial/8250/8250_exar.c
index 64770c62bbec..ccb70b20b1f4 100644
--- a/drivers/tty/serial/8250/8250_exar.c
+++ b/drivers/tty/serial/8250/8250_exar.c
@@ -185,6 +185,10 @@ static void xr17v35x_set_divisor(struct uart_port *p, unsigned int baud,
 
 static int xr17v35x_startup(struct uart_port *port)
 {
+	struct uart_8250_port *up = up_to_u8250p(port);
+
+	spin_lock_irq(&port->lock);
+
 	/*
 	 * First enable access to IER [7:5], ISR [5:4], FCR [5:4],
 	 * MCR [7:5] and MSR [7:0]
@@ -195,7 +199,9 @@ static int xr17v35x_startup(struct uart_port *port)
 	 * Make sure all interrups are masked until initialization is
 	 * complete and the FIFOs are cleared
 	 */
-	serial_port_out(port, UART_IER, 0);
+	serial8250_set_IER(up, 0);
+
+	spin_unlock_irq(&port->lock);
 
 	return serial8250_do_startup(port);
 }
diff --git a/drivers/tty/serial/8250/8250_fsl.c b/drivers/tty/serial/8250/8250_fsl.c
index 8adfaa183f77..eaf148245a10 100644
--- a/drivers/tty/serial/8250/8250_fsl.c
+++ b/drivers/tty/serial/8250/8250_fsl.c
@@ -58,7 +58,8 @@ int fsl8250_handle_irq(struct uart_port *port)
 	if ((orig_lsr & UART_LSR_OE) && (up->overrun_backoff_time_ms > 0)) {
 		unsigned long delay;
 
-		up->ier = port->serial_in(port, UART_IER);
+		up->ier = serial8250_in_IER(up);
+
 		if (up->ier & (UART_IER_RLSI | UART_IER_RDI)) {
 			port->ops->stop_rx(port);
 		} else {
diff --git a/drivers/tty/serial/8250/8250_mtk.c b/drivers/tty/serial/8250/8250_mtk.c
index fb1d5ec0940e..bf7ab55c8923 100644
--- a/drivers/tty/serial/8250/8250_mtk.c
+++ b/drivers/tty/serial/8250/8250_mtk.c
@@ -222,12 +222,38 @@ static void mtk8250_shutdown(struct uart_port *port)
 
 static void mtk8250_disable_intrs(struct uart_8250_port *up, int mask)
 {
-	serial_out(up, UART_IER, serial_in(up, UART_IER) & (~mask));
+	struct uart_port *port = &up->port;
+	bool is_console;
+	int ier;
+
+	is_console = serial8250_is_console(port);
+
+	if (is_console)
+		serial8250_enter_unsafe(up);
+
+	ier = serial_in(up, UART_IER);
+	serial_out(up, UART_IER, ier & (~mask));
+
+	if (is_console)
+		serial8250_exit_unsafe(up);
 }
 
 static void mtk8250_enable_intrs(struct uart_8250_port *up, int mask)
 {
-	serial_out(up, UART_IER, serial_in(up, UART_IER) | mask);
+	struct uart_port *port = &up->port;
+	bool is_console;
+	int ier;
+
+	is_console = serial8250_is_console(port);
+
+	if (is_console)
+		serial8250_enter_unsafe(up);
+
+	ier = serial_in(up, UART_IER);
+	serial_out(up, UART_IER, ier | mask);
+
+	if (is_console)
+		serial8250_exit_unsafe(up);
 }
 
 static void mtk8250_set_flow_ctrl(struct uart_8250_port *up, int mode)
diff --git a/drivers/tty/serial/8250/8250_omap.c b/drivers/tty/serial/8250/8250_omap.c
index 734f092ef839..bfa50a26349d 100644
--- a/drivers/tty/serial/8250/8250_omap.c
+++ b/drivers/tty/serial/8250/8250_omap.c
@@ -334,8 +334,7 @@ static void omap8250_restore_regs(struct uart_8250_port *up)
 
 	/* drop TCR + TLR access, we setup XON/XOFF later */
 	serial8250_out_MCR(up, mcr);
-
-	serial_out(up, UART_IER, up->ier);
+	serial8250_set_IER(up, up->ier);
 
 	serial_out(up, UART_LCR, UART_LCR_CONF_MODE_B);
 	serial_dl_write(up, priv->quot);
@@ -523,16 +522,21 @@ static void omap_8250_pm(struct uart_port *port, unsigned int state,
 	u8 efr;
 
 	pm_runtime_get_sync(port->dev);
+
+	spin_lock_irq(&port->lock);
+
 	serial_out(up, UART_LCR, UART_LCR_CONF_MODE_B);
 	efr = serial_in(up, UART_EFR);
 	serial_out(up, UART_EFR, efr | UART_EFR_ECB);
 	serial_out(up, UART_LCR, 0);
 
-	serial_out(up, UART_IER, (state != 0) ? UART_IERX_SLEEP : 0);
+	serial8250_set_IER(up, (state != 0) ? UART_IERX_SLEEP : 0);
 	serial_out(up, UART_LCR, UART_LCR_CONF_MODE_B);
 	serial_out(up, UART_EFR, efr);
 	serial_out(up, UART_LCR, 0);
 
+	spin_unlock_irq(&port->lock);
+
 	pm_runtime_mark_last_busy(port->dev);
 	pm_runtime_put_autosuspend(port->dev);
 }
@@ -649,7 +653,8 @@ static irqreturn_t omap8250_irq(int irq, void *dev_id)
 	if ((lsr & UART_LSR_OE) && up->overrun_backoff_time_ms > 0) {
 		unsigned long delay;
 
-		up->ier = port->serial_in(port, UART_IER);
+		spin_lock(&port->lock);
+		up->ier = serial8250_in_IER(up);
 		if (up->ier & (UART_IER_RLSI | UART_IER_RDI)) {
 			port->ops->stop_rx(port);
 		} else {
@@ -658,6 +663,7 @@ static irqreturn_t omap8250_irq(int irq, void *dev_id)
 			 */
 			cancel_delayed_work(&up->overrun_backoff);
 		}
+		spin_unlock(&port->lock);
 
 		delay = msecs_to_jiffies(up->overrun_backoff_time_ms);
 		schedule_delayed_work(&up->overrun_backoff, delay);
@@ -707,8 +713,10 @@ static int omap_8250_startup(struct uart_port *port)
 	if (ret < 0)
 		goto err;
 
+	spin_lock_irq(&port->lock);
 	up->ier = UART_IER_RLSI | UART_IER_RDI;
-	serial_out(up, UART_IER, up->ier);
+	serial8250_set_IER(up, up->ier);
+	spin_unlock_irq(&port->lock);
 
 #ifdef CONFIG_PM
 	up->capabilities |= UART_CAP_RPM;
@@ -748,8 +756,10 @@ static void omap_8250_shutdown(struct uart_port *port)
 	if (priv->habit & UART_HAS_EFR2)
 		serial_out(up, UART_OMAP_EFR2, 0x0);
 
+	spin_lock_irq(&port->lock);
 	up->ier = 0;
-	serial_out(up, UART_IER, 0);
+	serial8250_set_IER(up, 0);
+	spin_unlock_irq(&port->lock);
 
 	if (up->dma)
 		serial8250_release_dma(up);
@@ -797,7 +807,7 @@ static void omap_8250_unthrottle(struct uart_port *port)
 		up->dma->rx_dma(up);
 	up->ier |= UART_IER_RLSI | UART_IER_RDI;
 	port->read_status_mask |= UART_LSR_DR;
-	serial_out(up, UART_IER, up->ier);
+	serial8250_set_IER(up, up->ier);
 	spin_unlock_irqrestore(&port->lock, flags);
 
 	pm_runtime_mark_last_busy(port->dev);
@@ -956,7 +966,7 @@ static void __dma_rx_complete(void *param)
 	__dma_rx_do_complete(p);
 	if (!priv->throttled) {
 		p->ier |= UART_IER_RLSI | UART_IER_RDI;
-		serial_out(p, UART_IER, p->ier);
+		serial8250_set_IER(p, p->ier);
 		if (!(priv->habit & UART_HAS_EFR2))
 			omap_8250_rx_dma(p);
 	}
@@ -1013,7 +1023,7 @@ static int omap_8250_rx_dma(struct uart_8250_port *p)
 			 * callback to run.
 			 */
 			p->ier &= ~(UART_IER_RLSI | UART_IER_RDI);
-			serial_out(p, UART_IER, p->ier);
+			serial8250_set_IER(p, p->ier);
 		}
 		goto out;
 	}
@@ -1226,12 +1236,12 @@ static void am654_8250_handle_rx_dma(struct uart_8250_port *up, u8 iir,
 		 * periodic timeouts, re-enable interrupts.
 		 */
 		up->ier &= ~(UART_IER_RLSI | UART_IER_RDI);
-		serial_out(up, UART_IER, up->ier);
+		serial8250_set_IER(up, up->ier);
 		omap_8250_rx_dma_flush(up);
 		serial_in(up, UART_IIR);
 		serial_out(up, UART_OMAP_EFR2, 0x0);
 		up->ier |= UART_IER_RLSI | UART_IER_RDI;
-		serial_out(up, UART_IER, up->ier);
+		serial8250_set_IER(up, up->ier);
 	}
 }
 
@@ -1717,12 +1727,16 @@ static int omap8250_runtime_resume(struct device *dev)
 
 	up = serial8250_get_port(priv->line);
 
+	spin_lock_irq(&up->port.lock);
+
 	if (omap8250_lost_context(up))
 		omap8250_restore_regs(up);
 
 	if (up->dma && up->dma->rxchan && !(priv->habit & UART_HAS_EFR2))
 		omap_8250_rx_dma(up);
 
+	spin_unlock_irq(&up->port.lock);
+
 	priv->latency = priv->calc_latency;
 	schedule_work(&priv->qos_work);
 	return 0;
diff --git a/drivers/tty/serial/8250/8250_port.c b/drivers/tty/serial/8250/8250_port.c
index fa43df05342b..4378349862e6 100644
--- a/drivers/tty/serial/8250/8250_port.c
+++ b/drivers/tty/serial/8250/8250_port.c
@@ -744,6 +744,7 @@ static void serial8250_set_sleep(struct uart_8250_port *p, int sleep)
 	serial8250_rpm_get(p);
 
 	if (p->capabilities & UART_CAP_SLEEP) {
+		spin_lock_irq(&p->port.lock);
 		if (p->capabilities & UART_CAP_EFR) {
 			lcr = serial_in(p, UART_LCR);
 			efr = serial_in(p, UART_EFR);
@@ -751,25 +752,18 @@ static void serial8250_set_sleep(struct uart_8250_port *p, int sleep)
 			serial_out(p, UART_EFR, UART_EFR_ECB);
 			serial_out(p, UART_LCR, 0);
 		}
-		serial_out(p, UART_IER, sleep ? UART_IERX_SLEEP : 0);
+		serial8250_set_IER(p, sleep ? UART_IERX_SLEEP : 0);
 		if (p->capabilities & UART_CAP_EFR) {
 			serial_out(p, UART_LCR, UART_LCR_CONF_MODE_B);
 			serial_out(p, UART_EFR, efr);
 			serial_out(p, UART_LCR, lcr);
 		}
+		spin_unlock_irq(&p->port.lock);
 	}
 
 	serial8250_rpm_put(p);
 }
 
-static void serial8250_clear_IER(struct uart_8250_port *up)
-{
-	if (up->capabilities & UART_CAP_UUE)
-		serial_out(up, UART_IER, UART_IER_UUE);
-	else
-		serial_out(up, UART_IER, 0);
-}
-
 #ifdef CONFIG_SERIAL_8250_RSA
 /*
  * Attempts to turn on the RSA FIFO.  Returns zero on failure.
@@ -1033,8 +1027,10 @@ static int broken_efr(struct uart_8250_port *up)
  */
 static void autoconfig_16550a(struct uart_8250_port *up)
 {
+	struct uart_port *port = &up->port;
 	unsigned char status1, status2;
 	unsigned int iersave;
+	bool is_console;
 
 	up->port.type = PORT_16550A;
 	up->capabilities |= UART_CAP_FIFO;
@@ -1150,6 +1146,11 @@ static void autoconfig_16550a(struct uart_8250_port *up)
 		return;
 	}
 
+	is_console = serial8250_is_console(port);
+
+	if (is_console)
+		serial8250_enter_unsafe(up);
+
 	/*
 	 * Try writing and reading the UART_IER_UUE bit (b6).
 	 * If it works, this is probably one of the Xscale platform's
@@ -1185,6 +1186,9 @@ static void autoconfig_16550a(struct uart_8250_port *up)
 	}
 	serial_out(up, UART_IER, iersave);
 
+	if (is_console)
+		serial8250_exit_unsafe(up);
+
 	/*
 	 * We distinguish between 16550A and U6 16550A by counting
 	 * how many bytes are in the FIFO.
@@ -1226,6 +1230,13 @@ static void autoconfig(struct uart_8250_port *up)
 	up->bugs = 0;
 
 	if (!(port->flags & UPF_BUGGY_UART)) {
+		bool is_console;
+
+		is_console = serial8250_is_console(port);
+
+		if (is_console)
+			serial8250_enter_unsafe(up);
+
 		/*
 		 * Do a simple existence test first; if we fail this,
 		 * there's no point trying anything else.
@@ -1255,6 +1266,10 @@ static void autoconfig(struct uart_8250_port *up)
 #endif
 		scratch3 = serial_in(up, UART_IER) & UART_IER_ALL_INTR;
 		serial_out(up, UART_IER, scratch);
+
+		if (is_console)
+			serial8250_exit_unsafe(up);
+
 		if (scratch2 != 0 || scratch3 != UART_IER_ALL_INTR) {
 			/*
 			 * We failed; there's nothing here
@@ -1376,6 +1391,7 @@ static void autoconfig_irq(struct uart_8250_port *up)
 	unsigned char save_ICP = 0;
 	unsigned int ICP = 0;
 	unsigned long irqs;
+	bool is_console;
 	int irq;
 
 	if (port->flags & UPF_FOURPORT) {
@@ -1385,8 +1401,12 @@ static void autoconfig_irq(struct uart_8250_port *up)
 		inb_p(ICP);
 	}
 
-	if (uart_console(port))
+	is_console = serial8250_is_console(port);
+
+	if (is_console) {
 		console_lock();
+		serial8250_enter_unsafe(up);
+	}
 
 	/* forget possible initially masked and pending IRQ */
 	probe_irq_off(probe_irq_on());
@@ -1418,8 +1438,10 @@ static void autoconfig_irq(struct uart_8250_port *up)
 	if (port->flags & UPF_FOURPORT)
 		outb_p(save_ICP, ICP);
 
-	if (uart_console(port))
+	if (is_console) {
+		serial8250_exit_unsafe(up);
 		console_unlock();
+	}
 
 	port->irq = (irq > 0) ? irq : 0;
 }
@@ -1432,7 +1454,7 @@ static void serial8250_stop_rx(struct uart_port *port)
 
 	up->ier &= ~(UART_IER_RLSI | UART_IER_RDI);
 	up->port.read_status_mask &= ~UART_LSR_DR;
-	serial_port_out(port, UART_IER, up->ier);
+	serial8250_set_IER(up, up->ier);
 
 	serial8250_rpm_put(up);
 }
@@ -1462,7 +1484,7 @@ void serial8250_em485_stop_tx(struct uart_8250_port *p)
 		serial8250_clear_and_reinit_fifos(p);
 
 		p->ier |= UART_IER_RLSI | UART_IER_RDI;
-		serial_port_out(&p->port, UART_IER, p->ier);
+		serial8250_set_IER(p, p->ier);
 	}
 }
 EXPORT_SYMBOL_GPL(serial8250_em485_stop_tx);
@@ -1709,7 +1731,7 @@ static void serial8250_disable_ms(struct uart_port *port)
 	mctrl_gpio_disable_ms(up->gpios);
 
 	up->ier &= ~UART_IER_MSI;
-	serial_port_out(port, UART_IER, up->ier);
+	serial8250_set_IER(up, up->ier);
 }
 
 static void serial8250_enable_ms(struct uart_port *port)
@@ -1725,7 +1747,7 @@ static void serial8250_enable_ms(struct uart_port *port)
 	up->ier |= UART_IER_MSI;
 
 	serial8250_rpm_get(up);
-	serial_port_out(port, UART_IER, up->ier);
+	serial8250_set_IER(up, up->ier);
 	serial8250_rpm_put(up);
 }
 
@@ -2160,9 +2182,10 @@ static void serial8250_put_poll_char(struct uart_port *port,
 	serial8250_rpm_get(up);
 	/*
 	 *	First save the IER then disable the interrupts
+	 *
+	 *	Best-effort IER access because other CPUs are quiesced.
 	 */
-	ier = serial_port_in(port, UART_IER);
-	serial8250_clear_IER(up);
+	__serial8250_clear_IER(up, NULL, &ier);
 
 	wait_for_xmitr(up, UART_LSR_BOTH_EMPTY);
 	/*
@@ -2175,7 +2198,7 @@ static void serial8250_put_poll_char(struct uart_port *port,
 	 *	and restore the IER
 	 */
 	wait_for_xmitr(up, UART_LSR_BOTH_EMPTY);
-	serial_port_out(port, UART_IER, ier);
+	__serial8250_set_IER(up, NULL, ier);
 	serial8250_rpm_put(up);
 }
 
@@ -2186,6 +2209,7 @@ int serial8250_do_startup(struct uart_port *port)
 	struct uart_8250_port *up = up_to_u8250p(port);
 	unsigned long flags;
 	unsigned char iir;
+	bool is_console;
 	int retval;
 	u16 lsr;
 
@@ -2203,21 +2227,25 @@ int serial8250_do_startup(struct uart_port *port)
 	serial8250_rpm_get(up);
 	if (port->type == PORT_16C950) {
 		/* Wake up and initialize UART */
+		spin_lock_irqsave(&port->lock, flags);
 		up->acr = 0;
 		serial_port_out(port, UART_LCR, UART_LCR_CONF_MODE_B);
 		serial_port_out(port, UART_EFR, UART_EFR_ECB);
-		serial_port_out(port, UART_IER, 0);
+		serial8250_set_IER(up, 0);
 		serial_port_out(port, UART_LCR, 0);
 		serial_icr_write(up, UART_CSR, 0); /* Reset the UART */
 		serial_port_out(port, UART_LCR, UART_LCR_CONF_MODE_B);
 		serial_port_out(port, UART_EFR, UART_EFR_ECB);
 		serial_port_out(port, UART_LCR, 0);
+		spin_unlock_irqrestore(&port->lock, flags);
 	}
 
 	if (port->type == PORT_DA830) {
 		/* Reset the port */
-		serial_port_out(port, UART_IER, 0);
+		spin_lock_irqsave(&port->lock, flags);
+		serial8250_set_IER(up, 0);
 		serial_port_out(port, UART_DA830_PWREMU_MGMT, 0);
+		spin_unlock_irqrestore(&port->lock, flags);
 		mdelay(10);
 
 		/* Enable Tx, Rx and free run mode */
@@ -2315,6 +2343,8 @@ int serial8250_do_startup(struct uart_port *port)
 	if (retval)
 		goto out;
 
+	is_console = serial8250_is_console(port);
+
 	if (port->irq && !(up->port.flags & UPF_NO_THRE_TEST)) {
 		unsigned char iir1;
 
@@ -2331,6 +2361,9 @@ int serial8250_do_startup(struct uart_port *port)
 		 */
 		spin_lock_irqsave(&port->lock, flags);
 
+		if (is_console)
+			serial8250_enter_unsafe(up);
+
 		wait_for_xmitr(up, UART_LSR_THRE);
 		serial_port_out_sync(port, UART_IER, UART_IER_THRI);
 		udelay(1); /* allow THRE to set */
@@ -2341,6 +2374,9 @@ int serial8250_do_startup(struct uart_port *port)
 		iir = serial_port_in(port, UART_IIR);
 		serial_port_out(port, UART_IER, 0);
 
+		if (is_console)
+			serial8250_exit_unsafe(up);
+
 		spin_unlock_irqrestore(&port->lock, flags);
 
 		if (port->irqflags & IRQF_SHARED)
@@ -2395,10 +2431,14 @@ int serial8250_do_startup(struct uart_port *port)
 	 * Do a quick test to see if we receive an interrupt when we enable
 	 * the TX irq.
 	 */
+	if (is_console)
+		serial8250_enter_unsafe(up);
 	serial_port_out(port, UART_IER, UART_IER_THRI);
 	lsr = serial_port_in(port, UART_LSR);
 	iir = serial_port_in(port, UART_IIR);
 	serial_port_out(port, UART_IER, 0);
+	if (is_console)
+		serial8250_exit_unsafe(up);
 
 	if (lsr & UART_LSR_TEMT && iir & UART_IIR_NO_INT) {
 		if (!(up->bugs & UART_BUG_TXEN)) {
@@ -2430,7 +2470,7 @@ int serial8250_do_startup(struct uart_port *port)
 	if (up->dma) {
 		const char *msg = NULL;
 
-		if (uart_console(port))
+		if (is_console)
 			msg = "forbid DMA for kernel console";
 		else if (serial8250_request_dma(up))
 			msg = "failed to request DMA";
@@ -2481,7 +2521,7 @@ void serial8250_do_shutdown(struct uart_port *port)
 	 */
 	spin_lock_irqsave(&port->lock, flags);
 	up->ier = 0;
-	serial_port_out(port, UART_IER, 0);
+	serial8250_set_IER(up, 0);
 	spin_unlock_irqrestore(&port->lock, flags);
 
 	synchronize_irq(port->irq);
@@ -2847,7 +2887,7 @@ serial8250_do_set_termios(struct uart_port *port, struct ktermios *termios,
 	if (up->capabilities & UART_CAP_RTOIE)
 		up->ier |= UART_IER_RTOIE;
 
-	serial_port_out(port, UART_IER, up->ier);
+	serial8250_set_IER(up, up->ier);
 
 	if (up->capabilities & UART_CAP_EFR) {
 		unsigned char efr = 0;
@@ -3312,12 +3352,21 @@ EXPORT_SYMBOL_GPL(serial8250_set_defaults);
 
 #ifdef CONFIG_SERIAL_8250_CONSOLE
 
-static void serial8250_console_putchar(struct uart_port *port, unsigned char ch)
+static bool serial8250_console_putchar(struct uart_port *port, unsigned char ch,
+				       struct nbcon_write_context *wctxt)
 {
 	struct uart_8250_port *up = up_to_u8250p(port);
 
 	wait_for_xmitr(up, UART_LSR_THRE);
+	if (!nbcon_can_proceed(wctxt))
+		return false;
 	serial_port_out(port, UART_TX, ch);
+	if (ch == '\n')
+		up->console_newline_needed = false;
+	else
+		up->console_newline_needed = true;
+
+	return true;
 }
 
 /*
@@ -3346,33 +3395,119 @@ static void serial8250_console_restore(struct uart_8250_port *up)
 	serial8250_out_MCR(up, up->mcr | UART_MCR_DTR | UART_MCR_RTS);
 }
 
-/*
- * Print a string to the serial port using the device FIFO
- *
- * It sends fifosize bytes and then waits for the fifo
- * to get empty.
- */
-static void serial8250_console_fifo_write(struct uart_8250_port *up,
-					  const char *s, unsigned int count)
+static bool __serial8250_console_write(struct uart_port *port, struct nbcon_write_context *wctxt,
+		const char *s, unsigned int count,
+		bool (*putchar)(struct uart_port *, unsigned char, struct nbcon_write_context *))
 {
-	int i;
-	const char *end = s + count;
-	unsigned int fifosize = up->tx_loadsz;
-	bool cr_sent = false;
-
-	while (s != end) {
-		wait_for_lsr(up, UART_LSR_THRE);
-
-		for (i = 0; i < fifosize && s != end; ++i) {
-			if (*s == '\n' && !cr_sent) {
-				serial_out(up, UART_TX, '\r');
-				cr_sent = true;
-			} else {
-				serial_out(up, UART_TX, *s++);
-				cr_sent = false;
-			}
+	bool finished = false;
+	unsigned int i;
+
+	for (i = 0; i < count; i++, s++) {
+		if (*s == '\n') {
+			if (!putchar(port, '\r', wctxt))
+				goto out;
+		}
+		if (!putchar(port, *s, wctxt))
+			goto out;
+	}
+	finished = true;
+out:
+	return finished;
+}
+
+static bool serial8250_console_write(struct uart_port *port, struct nbcon_write_context *wctxt,
+		const char *s, unsigned int count,
+		bool (*putchar)(struct uart_port *, unsigned char, struct nbcon_write_context *))
+{
+	return __serial8250_console_write(port, wctxt, s, count, putchar);
+}
+
+static bool atomic_print_line(struct uart_8250_port *up,
+			      struct nbcon_write_context *wctxt)
+{
+	struct uart_port *port = &up->port;
+
+	if (up->console_newline_needed &&
+	    !__serial8250_console_write(port, wctxt, "\n", 1, serial8250_console_putchar)) {
+		return false;
+	}
+
+	return __serial8250_console_write(port, wctxt, wctxt->outbuf, wctxt->len,
+					  serial8250_console_putchar);
+}
+
+static void atomic_console_reacquire(struct nbcon_write_context *wctxt,
+				     struct nbcon_write_context *wctxt_init)
+{
+	memcpy(wctxt, wctxt_init, sizeof(*wctxt));
+	while (!nbcon_try_acquire(wctxt)) {
+		cpu_relax();
+		memcpy(wctxt, wctxt_init, sizeof(*wctxt));
+	}
+}
+
+bool serial8250_console_write_atomic(struct uart_8250_port *up,
+				     struct nbcon_write_context *wctxt)
+{
+	struct nbcon_write_context wctxt_init = { };
+	struct nbcon_context *ctxt_init = &ACCESS_PRIVATE(&wctxt_init, ctxt);
+	struct nbcon_context *ctxt = &ACCESS_PRIVATE(wctxt, ctxt);
+	bool finished = false;
+	unsigned int ier;
+
+	touch_nmi_watchdog();
+
+	/* With write_atomic, another context may hold the port->lock. */
+
+	ctxt_init->console = ctxt->console;
+	ctxt_init->prio = ctxt->prio;
+	ctxt_init->thread = ctxt->thread;
+
+	/*
+	 * Enter unsafe in order to disable interrupts. If the console is
+	 * lost before the interrupts are disabled, bail out because another
+	 * context took over the printing. If the console is lost after the
+	 * interrutps are disabled, the console must be reacquired in order
+	 * to re-enable the interrupts. However in that case no printing is
+	 * allowed because another context took over the printing.
+	 */
+
+	if (!nbcon_enter_unsafe(wctxt))
+		return false;
+
+	if (!__serial8250_clear_IER(up, wctxt, &ier))
+		return false;
+
+	if (!nbcon_exit_unsafe(wctxt)) {
+		atomic_console_reacquire(wctxt, &wctxt_init);
+		goto enable_irq;
+	}
+
+	if (!atomic_print_line(up, wctxt)) {
+		atomic_console_reacquire(wctxt, &wctxt_init);
+		goto enable_irq;
+	}
+
+	wait_for_xmitr(up, UART_LSR_BOTH_EMPTY);
+	finished = true;
+enable_irq:
+	/*
+	 * Enter unsafe in order to enable interrupts. If the console is
+	 * lost before the interrupts are enabled, the console must be
+	 * reacquired in order to re-enable the interrupts.
+	 */
+	for (;;) {
+		if (nbcon_enter_unsafe(wctxt) &&
+		    __serial8250_set_IER(up, wctxt, ier)) {
+			break;
 		}
+
+		/* HW-IRQs still disabled. Reacquire to enable them. */
+		atomic_console_reacquire(wctxt, &wctxt_init);
 	}
+	nbcon_exit_unsafe(wctxt);
+
+	return finished;
 }
 
 /*
@@ -3384,78 +3519,116 @@ static void serial8250_console_fifo_write(struct uart_8250_port *up,
  *	Doing runtime PM is really a bad idea for the kernel console.
  *	Thus, we assume the function is called when device is powered up.
  */
-void serial8250_console_write(struct uart_8250_port *up, const char *s,
-			      unsigned int count)
+bool serial8250_console_write_thread(struct uart_8250_port *up,
+				     struct nbcon_write_context *wctxt)
 {
+	struct nbcon_write_context wctxt_init = { };
+	struct nbcon_context *ctxt_init = &ACCESS_PRIVATE(&wctxt_init, ctxt);
+	struct nbcon_context *ctxt = &ACCESS_PRIVATE(wctxt, ctxt);
 	struct uart_8250_em485 *em485 = up->em485;
 	struct uart_port *port = &up->port;
-	unsigned long flags;
-	unsigned int ier, use_fifo;
-	int locked = 1;
-
-	touch_nmi_watchdog();
+	unsigned int count = wctxt->len;
+	const char *s = wctxt->outbuf;
+	bool rs485_started = false;
+	bool finished = false;
+	unsigned int ier;
 
-	if (oops_in_progress)
-		locked = spin_trylock_irqsave(&port->lock, flags);
-	else
-		spin_lock_irqsave(&port->lock, flags);
+	ctxt_init->console = ctxt->console;
+	ctxt_init->prio = ctxt->prio;
+	ctxt_init->thread = ctxt->thread;
 
 	/*
-	 *	First save the IER then disable the interrupts
+	 * Enter unsafe in order to disable interrupts. If the console is
+	 * lost before the interrupts are disabled, bail out because another
+	 * context took over the printing. If the console is lost after the
+	 * interrutps are disabled, the console must be reacquired in order
+	 * to re-enable the interrupts. However in that case no printing is
+	 * allowed because another context took over the printing.
 	 */
-	ier = serial_port_in(port, UART_IER);
-	serial8250_clear_IER(up);
+
+	if (!nbcon_enter_unsafe(wctxt))
+		return false;
+
+	if (!__serial8250_clear_IER(up, wctxt, &ier))
+		return false;
+
+	if (!nbcon_exit_unsafe(wctxt)) {
+		atomic_console_reacquire(wctxt, &wctxt_init);
+		goto enable_irq;
+	}
 
 	/* check scratch reg to see if port powered off during system sleep */
 	if (up->canary && (up->canary != serial_port_in(port, UART_SCR))) {
+		if (!nbcon_enter_unsafe(wctxt)) {
+			atomic_console_reacquire(wctxt, &wctxt_init);
+			goto enable_irq;
+		}
 		serial8250_console_restore(up);
+		if (!nbcon_exit_unsafe(wctxt)) {
+			atomic_console_reacquire(wctxt, &wctxt_init);
+			goto enable_irq;
+		}
 		up->canary = 0;
 	}
 
 	if (em485) {
-		if (em485->tx_stopped)
+		if (em485->tx_stopped) {
+			if (!nbcon_enter_unsafe(wctxt)) {
+				atomic_console_reacquire(wctxt, &wctxt_init);
+				goto enable_irq;
+			}
 			up->rs485_start_tx(up);
-		mdelay(port->rs485.delay_rts_before_send);
+			rs485_started = true;
+			if (!nbcon_exit_unsafe(wctxt)) {
+				atomic_console_reacquire(wctxt, &wctxt_init);
+				goto enable_irq;
+			}
+		}
+		if (port->rs485.delay_rts_before_send) {
+			mdelay(port->rs485.delay_rts_before_send);
+			if (!nbcon_can_proceed(wctxt)) {
+				atomic_console_reacquire(wctxt, &wctxt_init);
+				goto enable_irq;
+			}
+		}
 	}
 
-	use_fifo = (up->capabilities & UART_CAP_FIFO) &&
-		/*
-		 * BCM283x requires to check the fifo
-		 * after each byte.
-		 */
-		!(up->capabilities & UART_CAP_MINI) &&
-		/*
-		 * tx_loadsz contains the transmit fifo size
-		 */
-		up->tx_loadsz > 1 &&
-		(up->fcr & UART_FCR_ENABLE_FIFO) &&
-		port->state &&
-		test_bit(TTY_PORT_INITIALIZED, &port->state->port.iflags) &&
-		/*
-		 * After we put a data in the fifo, the controller will send
-		 * it regardless of the CTS state. Therefore, only use fifo
-		 * if we don't use control flow.
-		 */
-		!(up->port.flags & UPF_CONS_FLOW);
-
-	if (likely(use_fifo))
-		serial8250_console_fifo_write(up, s, count);
-	else
-		uart_console_write(port, s, count, serial8250_console_putchar);
+	if (!serial8250_console_write(port, wctxt, s, count, serial8250_console_putchar)) {
+		atomic_console_reacquire(wctxt, &wctxt_init);
+		goto enable_irq;
+	}
 
+	wait_for_xmitr(up, UART_LSR_BOTH_EMPTY);
+	finished = true;
+enable_irq:
 	/*
-	 *	Finally, wait for transmitter to become empty
-	 *	and restore the IER
+	 * Enter unsafe in order to stop rs485_tx. If the console is
+	 * lost before the rs485_tx is stopped, the console must be
+	 * reacquired in order to stop rs485_tx.
 	 */
-	wait_for_xmitr(up, UART_LSR_BOTH_EMPTY);
-
 	if (em485) {
 		mdelay(port->rs485.delay_rts_after_send);
-		if (em485->tx_stopped)
+		if (em485->tx_stopped && rs485_started) {
+			while (!nbcon_enter_unsafe(wctxt))
+				atomic_console_reacquire(wctxt, &wctxt_init);
 			up->rs485_stop_tx(up);
+			if (!nbcon_exit_unsafe(wctxt))
+				atomic_console_reacquire(wctxt, &wctxt_init);
+		}
 	}
 
-	serial_port_out(port, UART_IER, ier);
+	/*
+	 * Enter unsafe in order to enable interrupts. If the console is
+	 * lost before the interrupts are enabled, the console must be
+	 * reacquired in order to re-enable the interrupts.
+	 */
+	for (;;) {
+		if (nbcon_enter_unsafe(wctxt) &&
+		    __serial8250_set_IER(up, wctxt, ier)) {
+			break;
+		}
+		atomic_console_reacquire(wctxt, &wctxt_init);
+	}
 
 	/*
 	 *	The receive handling will happen properly because the
@@ -3467,8 +3640,9 @@ void serial8250_console_write(struct uart_8250_port *up, const char *s,
 	if (up->msr_saved_flags)
 		serial8250_modem_status(up);
 
-	if (locked)
-		spin_unlock_irqrestore(&port->lock, flags);
+	nbcon_exit_unsafe(wctxt);
+
+	return finished;
 }
 
 static unsigned int probe_baud(struct uart_port *port)
@@ -3488,6 +3662,7 @@ static unsigned int probe_baud(struct uart_port *port)
 
 int serial8250_console_setup(struct uart_port *port, char *options, bool probe)
 {
+	struct uart_8250_port *up = up_to_u8250p(port);
 	int baud = 9600;
 	int bits = 8;
 	int parity = 'n';
@@ -3497,6 +3672,8 @@ int serial8250_console_setup(struct uart_port *port, char *options, bool probe)
 	if (!port->iobase && !port->membase)
 		return -ENODEV;
 
+	up->console_newline_needed = false;
+
 	if (options)
 		uart_parse_options(options, &baud, &parity, &bits, &flow);
 	else if (probe)
diff --git a/drivers/tty/serial/8250/Kconfig b/drivers/tty/serial/8250/Kconfig
index 5313aa31930f..16715f01bdb5 100644
--- a/drivers/tty/serial/8250/Kconfig
+++ b/drivers/tty/serial/8250/Kconfig
@@ -9,6 +9,7 @@ config SERIAL_8250
 	depends on !S390
 	select SERIAL_CORE
 	select SERIAL_MCTRL_GPIO if GPIOLIB
+	select HAVE_ATOMIC_CONSOLE
 	help
 	  This selects whether you want to include the driver for the standard
 	  serial ports.  The standard answer is Y.  People who might say N
diff --git a/drivers/tty/serial/serial_core.c b/drivers/tty/serial/serial_core.c
index 2bd32c8ece39..9901f916dc1a 100644
--- a/drivers/tty/serial/serial_core.c
+++ b/drivers/tty/serial/serial_core.c
@@ -2336,8 +2336,11 @@ int uart_suspend_port(struct uart_driver *drv, struct uart_port *uport)
 	 * able to Re-start_rx later.
 	 */
 	if (!console_suspend_enabled && uart_console(uport)) {
-		if (uport->ops->start_rx)
+		if (uport->ops->start_rx) {
+			spin_lock_irq(&uport->lock);
 			uport->ops->stop_rx(uport);
+			spin_unlock_irq(&uport->lock);
+		}
 		goto unlock;
 	}
 
@@ -2430,8 +2433,11 @@ int uart_resume_port(struct uart_driver *drv, struct uart_port *uport)
 		if (console_suspend_enabled)
 			uart_change_pm(state, UART_PM_STATE_ON);
 		uport->ops->set_termios(uport, &termios, NULL);
-		if (!console_suspend_enabled && uport->ops->start_rx)
+		if (!console_suspend_enabled && uport->ops->start_rx) {
+			spin_lock_irq(&uport->lock);
 			uport->ops->start_rx(uport);
+			spin_unlock_irq(&uport->lock);
+		}
 		if (console_suspend_enabled)
 			console_start(uport->cons);
 	}
diff --git a/include/linux/serial_8250.h b/include/linux/serial_8250.h
index 19376bee9667..cf73a99232d4 100644
--- a/include/linux/serial_8250.h
+++ b/include/linux/serial_8250.h
@@ -125,6 +125,8 @@ struct uart_8250_port {
 #define MSR_SAVE_FLAGS UART_MSR_ANY_DELTA
 	unsigned char		msr_saved_flags;
 
+	bool			console_newline_needed;
+
 	struct uart_8250_dma	*dma;
 	const struct uart_8250_ops *ops;
 
@@ -139,6 +141,9 @@ struct uart_8250_port {
 	/* Serial port overrun backoff */
 	struct delayed_work overrun_backoff;
 	u32 overrun_backoff_time_ms;
+
+	struct nbcon_write_context wctxt;
+	int cookie;
 };
 
 static inline struct uart_8250_port *up_to_u8250p(struct uart_port *up)
@@ -178,8 +183,10 @@ void serial8250_tx_chars(struct uart_8250_port *up);
 unsigned int serial8250_modem_status(struct uart_8250_port *up);
 void serial8250_init_port(struct uart_8250_port *up);
 void serial8250_set_defaults(struct uart_8250_port *up);
-void serial8250_console_write(struct uart_8250_port *up, const char *s,
-			      unsigned int count);
+bool serial8250_console_write_atomic(struct uart_8250_port *up,
+				     struct nbcon_write_context *wctxt);
+bool serial8250_console_write_thread(struct uart_8250_port *up,
+				     struct nbcon_write_context *wctxt);
 int serial8250_console_setup(struct uart_port *port, char *options, bool probe);
 int serial8250_console_exit(struct uart_port *port);
 
-- 
2.30.2

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* Re: locking API: was: [PATCH printk v1 00/18] serial: 8250: implement non-BKL console
  2023-03-28 13:57     ` John Ogness
@ 2023-03-28 15:10       ` Petr Mladek
  2023-03-28 21:47         ` John Ogness
  0 siblings, 1 reply; 92+ messages in thread
From: Petr Mladek @ 2023-03-28 15:10 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Jason Wessel, Daniel Thompson, Douglas Anderson,
	Aaron Tomlin, Luis Chamberlain, kgdb-bugreport,
	Greg Kroah-Hartman, linux-fsdevel, Andrew Morton,
	Guilherme G. Piccoli, David Gow, Tiezhu Yang, Daniel Vetter,
	tangmeng, Paul E. McKenney, Frederic Weisbecker, Neeraj Upadhyay,
	Josh Triplett, Mathieu Desnoyers, Lai Jiangshan, Joel Fernandes,
	rcu

On Tue 2023-03-28 16:03:36, John Ogness wrote:
> On 2023-03-28, Petr Mladek <pmladek@suse.com> wrote:
> >> +	if (!__serial8250_clear_IER(up, wctxt, &ier))
> >> +		return false;
> >> +
> >> +	if (console_exit_unsafe(wctxt)) {
> >> +		can_print = atomic_print_line(up, wctxt);
> >> +		if (!can_print)
> >> +			atomic_console_reacquire(wctxt, &wctxt_init);
> >
> > I am trying to review the 9th patch adding console_can_proceed(),
> > console_enter_unsafe(), console_exit_unsafe() API. And I wanted
> > to see how the struct cons_write_context was actually used.
> 
> First off, I need to post the latest version of the 8250-POC patch. It
> is not officially part of this series and is still going through changes
> for the PREEMPT_RT tree. I will post the latest version directly after
> answering this email.

Sure. I know that it is just a kind of POC.

> > I am confused now. I do not understand the motivation for the extra
> > @wctxt_init copy and atomic_console_reacquire().
> 
> If an atomic context loses ownership while doing certain activities, it
> may need to re-acquire ownership in order to finish or cleanup what it
> started.

This sounds suspicious. If a console/writer context has lost the lock
then all shared/locked resources might already be used by the new
owner.

I would expect that the context could touch only non-shared resources after
loosing the lock.

If it re-acquires the lock then the shared resource might be in
another state. So, doing any further changes might be dangerous.

I could imagine that incrementing/decrementing some counter might
make sense but setting some value sounds strange.


> > Why do we need a copy?
> 
> When ownership is lost, the context is cleared. In order to re-acquire,
> an original copy of the context is needed. There is no technical reason
> to clear the context, so maybe the context should not be cleared after a
> takeover. Otherwise, many drivers will need to implement the "backup
> copy" solution.

It might make sense to clear values that are not longer valid, e.g.
some state values or .len of the buffer. But I would keep the values
that might still be needed to re-acquire the lock. It might be
needed when the context want to re-start the entire operation,

I guess that you wanted to clean the structure to catch potential
misuse. It makes some sense but the copying is really weird.

I think that we might/should add some paranoid checks into all
functions manipulating the shared state instead.


> > And why we need to reacquire it?
> 
> In this particular case the context has disabled interrupts. No other
> context will re-enable interrupts because the driver is implemented such
> that the one who disables is the one who enables. So this context must
> re-acquire ownership in order to re-enable interrupts.

My understanding is that the driver might lose the lock only
during hostile takeover. Is it safe to re-enable interrupts
in this case?

Well, it actually might make sense if the interrupts should
be enabled when the port is unused.

Well, I guess that they will get enabled by the other hostile
owner. It should leave the serial port in a good state when
it releases the lock a normal way.

Anyway, thanks a lot for the info. I still have to scratch my
head around this.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: locking API: was: [PATCH printk v1 00/18] serial: 8250: implement non-BKL console
  2023-03-28 15:10       ` Petr Mladek
@ 2023-03-28 21:47         ` John Ogness
  2023-03-29  8:03           ` Petr Mladek
  0 siblings, 1 reply; 92+ messages in thread
From: John Ogness @ 2023-03-28 21:47 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Jason Wessel, Daniel Thompson, Douglas Anderson,
	Aaron Tomlin, Luis Chamberlain, kgdb-bugreport,
	Greg Kroah-Hartman, linux-fsdevel, Andrew Morton,
	Guilherme G. Piccoli, David Gow, Tiezhu Yang, Daniel Vetter,
	tangmeng, Paul E. McKenney, Frederic Weisbecker, Neeraj Upadhyay,
	Josh Triplett, Mathieu Desnoyers, Lai Jiangshan, Joel Fernandes,
	rcu

On 2023-03-28, Petr Mladek <pmladek@suse.com> wrote:
>> If an atomic context loses ownership while doing certain activities,
>> it may need to re-acquire ownership in order to finish or cleanup
>> what it started.
>
> This sounds suspicious. If a console/writer context has lost the lock
> then all shared/locked resources might already be used by the new
> owner.

Correct.

> I would expect that the context could touch only non-shared resources
> after loosing the lock.

Correct.

> If it re-acquires the lock then the shared resource might be in
> another state. So, doing any further changes might be dangerous.

That is the responsibility of the driver to implement it safely if it is
needed.

> I could imagine that incrementing/decrementing some counter might
> make sense but setting some value sounds strange.

The 8250 driver must disable interrupts before writing to the TX
FIFO. After writing it re-enables the interrupts. However, it might be
the case that the interrupts were already disabled, in which case after
writing they are left disabled.

IOW, whatever context disabled the interrupts is the context that is
expected to re-enable them. This simple rule makes it really easy to
handle nested printing because a context is only concerned with
restoring the IER state that it originally saw.

Using counters or passing around interrupt re-enabling responsibility
would be considerably trickier.

>>> And why we need to reacquire it?
>> 
>> In this particular case the context has disabled interrupts. No other
>> context will re-enable interrupts because the driver is implemented
>> such that the one who disables is the one who enables. So this
>> context must re-acquire ownership in order to re-enable interrupts.
>
> My understanding is that the driver might lose the lock only
> during hostile takeover. Is it safe to re-enable interrupts
> in this case?

Your understanding is incorrect. If a more important outputting context
should arrive, the non-important outputting context will happily and
kindly handover to the higher priority. From the perspective of the
atomic console driver, it lost ownership.

Simple example: The kthread printer is printing and some WARN_ON() is
triggered on another CPU. The warning will be output at a higher
priority and print from the context/CPU of the WARN_ON(). The kthread
printer will lose its ownership by handing over to the warning CPU.

Note that we are _not_ talking about when the unsafe bit is set. We are
talking about a printer that owns the console, is in a safe section, and
loses ownership. If that context was the one that disabled interrupts,
it needs to re-acquire the console in order to safely re-enable the
interrupts. The context that tookover ownership saw that interrupts are
disabled and does _not_ re-enable them when it is finished printing.

John

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: locking API: was: [PATCH printk v1 00/18] serial: 8250: implement non-BKL console
  2023-03-28 21:47         ` John Ogness
@ 2023-03-29  8:03           ` Petr Mladek
  0 siblings, 0 replies; 92+ messages in thread
From: Petr Mladek @ 2023-03-29  8:03 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Jason Wessel, Daniel Thompson, Douglas Anderson,
	Aaron Tomlin, Luis Chamberlain, kgdb-bugreport,
	Greg Kroah-Hartman, linux-fsdevel, Andrew Morton,
	Guilherme G. Piccoli, David Gow, Tiezhu Yang, Daniel Vetter,
	tangmeng, Paul E. McKenney, Frederic Weisbecker, Neeraj Upadhyay,
	Josh Triplett, Mathieu Desnoyers, Lai Jiangshan, Joel Fernandes,
	rcu

On Tue 2023-03-28 23:53:16, John Ogness wrote:
> On 2023-03-28, Petr Mladek <pmladek@suse.com> wrote:
> >> If an atomic context loses ownership while doing certain activities,
> >> it may need to re-acquire ownership in order to finish or cleanup
> >> what it started.
> >
> > This sounds suspicious. If a console/writer context has lost the lock
> > then all shared/locked resources might already be used by the new
> > owner.
> 
> Correct.
> 
> > I would expect that the context could touch only non-shared resources
> > after loosing the lock.
> 
> Correct.
> 
> The 8250 driver must disable interrupts before writing to the TX
> FIFO. After writing it re-enables the interrupts. However, it might be
> the case that the interrupts were already disabled, in which case after
> writing they are left disabled.

I see. The reacquire() makes sense now.

Thanks a lot for explanation.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 92+ messages in thread

* buffer write race: Re: [PATCH printk v1 09/18] printk: nobkl: Add print state functions
  2023-03-02 19:56 ` [PATCH printk v1 09/18] printk: nobkl: Add print state functions John Ogness
@ 2023-03-29 13:58   ` Petr Mladek
  2023-03-29 14:33     ` John Ogness
  2023-03-29 14:05   ` misc details: was: " Petr Mladek
  1 sibling, 1 reply; 92+ messages in thread
From: Petr Mladek @ 2023-03-29 13:58 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman

On Thu 2023-03-02 21:02:09, John Ogness wrote:
> From: Thomas Gleixner <tglx@linutronix.de>
> 
> Provide three functions which are related to the safe handover
> mechanism and allow console drivers to denote takeover unsafe
> sections:
> 
>  - console_can_proceed()
> 
>    Invoked by a console driver to check whether a handover request
>    is pending or whether the console was taken over in a hostile
>    fashion.
> 
> --- a/kernel/printk/printk_nobkl.c
> +++ b/kernel/printk/printk_nobkl.c
> @@ -947,6 +947,145 @@ static void cons_free_percpu_data(struct console *con)
>  	con->pcpu_data = NULL;
>  }
>  
> +/**
> + * console_can_proceed - Check whether printing can proceed
> + * @wctxt:	The write context that was handed to the write function
> + *
> + * Returns:	True if the state is correct. False if a handover
> + *		has been requested or if the console was taken
> + *		over.
> + *
> + * Must be invoked after the record was dumped into the assigned record
> + * buffer

The word "after" made me think about possible races when the record
buffer is being filled. The owner might loose the lock a hostile
way during this action. And we should prevent using the same buffer
when the other owner is still modifying the content.

It should be safe when the same buffer might be used only by nested
contexts. It does not matter if the outer context finishes writing
later. The nested context should not need the buffer anymore.

But a problem might happen when the same buffer is shared between
more non-nested contexts. One context might loose the lock a hostile way.
The other context might get the access after the hostile context
released the lock.

NORMAL and PANIC contexts are safe. These priorities have only
one context and both have their own buffers.

A problem might be with EMERGENCY contexts. Each CPU might have
its own EMERGENCY context. We might prevent this problem if
we do not allow to acquire the lock in EMERGENCY (and NORMAL)
context when panic() is running or after the first hostile
takeover.

If we want to detect these problems and always be on the safe side,
we might need to add a flag 1:1 connected with the buffers.

We either could put a flag into struct printk_buffers. Or we could
bundle this struct into another one for the atomic consoles.
I mean something like:

struct printk_atomic_buffers {
	struct printk_buffers pbufs;
	atomic_t write_lock;
}

And use it in cons_get_record():

#define PRINTK_BUFFER_WRITE_LOCKED_VAL 1

static int cons_get_record(struct cons_write_context *wctxt)
{
	int ret = 0;

	/*
	 * The buffer is not usable when another write context
	 * is still writing the record and lost the lock a hostille
	 * way.
	 */
	if (WARN_ON_ONCE(cmpxchg_acquire(&wctxt->pabufs->write_lock,
			    0, PRINTK_BUFFER_WRITE_LOCKED_VAL) != 0)) {
		return -EBUSY;
	}

	// Fill the buffers

	if (no new message) {
		ret = -ENOENT;
		goto unlock;
	}

unlock:
	/* Release the write lock */
	atomic_set_release(&wctxt->pabufs->write_lock, 0);
	return ret;
}


Note: This is related to the discussion about the 7th patch but
      it is not the same.

      This mail is only about using the same buffer for the same console.

      The other discussion was also about using the same buffer
      for more consoles. It is even more problematic because
      each console uses its own lock.

      It means that we would still need separate buffer for each
      interrupt context. Nested context might be able to get
      the lock for another console a regular way, see
      https://lore.kernel.org/all/ZBndaSUFd4ipvKwj@alley/

> + * and at appropriate safe places in the driver.  For unsafe driver
> + * sections see console_enter_unsafe().
> + *
> + * When this function returns false then the calling context is not allowed
> + * to go forward and has to back out immediately and carefully. The buffer
> + * content is no longer trusted either and the console lock is no longer
> + * held.
> + */
> +bool console_can_proceed(struct cons_write_context *wctxt)
> +{

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 92+ messages in thread

* misc details: was: Re: [PATCH printk v1 09/18] printk: nobkl: Add print state functions
  2023-03-02 19:56 ` [PATCH printk v1 09/18] printk: nobkl: Add print state functions John Ogness
  2023-03-29 13:58   ` buffer write race: " Petr Mladek
@ 2023-03-29 14:05   ` Petr Mladek
  1 sibling, 0 replies; 92+ messages in thread
From: Petr Mladek @ 2023-03-29 14:05 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman

On Thu 2023-03-02 21:02:09, John Ogness wrote:
> From: Thomas Gleixner <tglx@linutronix.de>
> 
> Provide three functions which are related to the safe handover
> mechanism and allow console drivers to denote takeover unsafe
> sections:
> 
>  - console_can_proceed()
> 
>    Invoked by a console driver to check whether a handover request
>    is pending or whether the console was taken over in a hostile
>    fashion.
> 
>  - console_enter/exit_unsafe()
> 
>    Invoked by a console driver to denote that the driver output
>    function is about to enter or to leave an critical region where a
>    hostile take over is unsafe. These functions are also
>    cancellation points.
> 
>    The unsafe state is stored in the console state and allows a
>    takeover attempt to make informed decisions whether to take over
>    and/or output on such a console at all. The unsafe state is also
>    available to the driver in the write context for the
>    atomic_write() output function so the driver can make informed
>    decisions about the required actions or take a special emergency
>    path.
> 
> --- a/kernel/printk/printk_nobkl.c
> +++ b/kernel/printk/printk_nobkl.c
> @@ -947,6 +947,145 @@ static void cons_free_percpu_data(struct console *con)
>  	con->pcpu_data = NULL;
>  }
>  
> +/**
> + * console_can_proceed - Check whether printing can proceed
> + * @wctxt:	The write context that was handed to the write function
> + *
> + * Returns:	True if the state is correct. False if a handover
> + *		has been requested or if the console was taken
> + *		over.
> + *
> + * Must be invoked after the record was dumped into the assigned record
> + * buffer and at appropriate safe places in the driver.  For unsafe driver
> + * sections see console_enter_unsafe().
> + *
> + * When this function returns false then the calling context is not allowed
> + * to go forward and has to back out immediately and carefully. The buffer
> + * content is no longer trusted either and the console lock is no longer
> + * held.
> + */
> +bool console_can_proceed(struct cons_write_context *wctxt)
> +{
> +	struct cons_context *ctxt = &ACCESS_PRIVATE(wctxt, ctxt);
> +	struct console *con = ctxt->console;
> +	struct cons_state state;
> +
> +	cons_state_read(con, CON_STATE_CUR, &state);
> +	/* Store it for analysis or reuse */
> +	copy_full_state(ctxt->old_state, state);
> +
> +	/* Make sure this context is still the owner. */
> +	if (!cons_state_full_match(state, ctxt->state))
> +		return false;
> +
> +	/*
> +	 * Having a safe point for take over and eventually a few
> +	 * duplicated characters or a full line is way better than a
> +	 * hostile takeover. Post processing can take care of the garbage.
> +	 * Continue if the requested priority is not sufficient.
> +	 */
> +	if (state.req_prio <= state.cur_prio)
> +		return true;
> +
> +	/*
> +	 * A console printer within an unsafe region is allowed to continue.
> +	 * It can perform the handover when exiting the safe region. Otherwise
> +	 * a hostile takeover will be necessary.
> +	 */
> +	if (state.unsafe)
> +		return true;

It took me quite some time to scratch my head around the above two comments.
The code is clear but the comments are somehow cryptic ;-)

It is probably because the 1st comment starts talking about a safe point.
But .unsafe is checked after the 2nd comment. And the word "allowed"
confused me in the 2nd comment.

I would explain this in these details in the function description. The
code will be self-explanatory then. I would write something like:

	 * The console is allowed to continue when it still owns the lock
	 * and there is no request from a higher priority context.
	 *
	 * The context might have lost the lock during a hostile takeover.
	 *
	 * The function will handover the lock when there is a request
	 * with a higher priority and the console is in a safe context.
	 * The new owner would print the same line again. But a duplicated
	 * part of a line is better than risking a hostile takeover in
	 * an unsafe context.

> +
> +	/* Release and hand over */
> +	cons_release(ctxt);
> +	/*
> +	 * This does not check whether the handover succeeded.

This is a bit ambiguous. What exactly means that the handover succeeded?
I guess that it means that the context with the higher priority
successfully took over the lock.

A "failure" might be when the other context timeouted and given up.
In that case, nobody would continue printing.

We actually should wake up the kthread when the lock was not
successfully passed. Or even better, we should release the lock
only when the request was still pending. It should be possible
with the cmpxchg().


> +	 * outermost callsite has to make the final decision whether printing
> +	 * should continue

This is a bit misleading. The current owner could not continue after
loosing the lock. It would need to re-start the entire operation.

Is this about the kthread or cons_flush*() layers? Yes, they have to
decide what to do next. Well, we should make it clear that we are talking
about this layer. The con->atomic_write() layer can only carefully
back off.

Maybe, we do not need to describe it here. It should be enough to
mention in the function description that the driver has to
carefully back off and that the buffer content is not longer
trusted. It is already mentioned there.

> +	 * or not (via reacquire, possibly hostile). The
> +	 * console is unlocked already so go back all the way instead of
> +	 * trying to implement heuristics in tons of places.
> +	 */
> +	return false;
> +}
> +
> +static bool __console_update_unsafe(struct cons_write_context *wctxt, bool unsafe)
 > +{
> +	struct cons_context *ctxt = &ACCESS_PRIVATE(wctxt, ctxt);
> +	struct console *con = ctxt->console;
> +	struct cons_state new;
> +
> +	do  {
> +		if (!console_can_proceed(wctxt))
> +			return false;
> +		/*
> +		 * console_can_proceed() saved the real state in
> +		 * ctxt->old_state
> +		 */
> +		copy_full_state(new, ctxt->old_state);
> +		new.unsafe = unsafe;
> +
> +	} while (!cons_state_try_cmpxchg(con, CON_STATE_CUR, &ctxt->old_state, &new));

This updates only the bit in struct cons_state. But there is also
"bool unsafe" in struct cons_write_context. Should the boolean
be updated as well?

Or is the boolean needed at all? It seems that it is set in
cons_emit_record() and never read.

> +
> +	copy_full_state(ctxt->state, new);
> +	return true;
> +}

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: buffer write race: Re: [PATCH printk v1 09/18] printk: nobkl: Add print state functions
  2023-03-29 13:58   ` buffer write race: " Petr Mladek
@ 2023-03-29 14:33     ` John Ogness
  2023-03-30 11:54       ` Petr Mladek
  0 siblings, 1 reply; 92+ messages in thread
From: John Ogness @ 2023-03-29 14:33 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman

On 2023-03-29, Petr Mladek <pmladek@suse.com> wrote:
>> +/**
>> + * console_can_proceed - Check whether printing can proceed
>> + * @wctxt:	The write context that was handed to the write function
>> + *
>> + * Returns:	True if the state is correct. False if a handover
>> + *		has been requested or if the console was taken
>> + *		over.
>> + *
>> + * Must be invoked after the record was dumped into the assigned record
>> + * buffer
>
> The word "after" made me think about possible races when the record
> buffer is being filled. The owner might loose the lock a hostile
> way during this action. And we should prevent using the same buffer
> when the other owner is still modifying the content.
>
> It should be safe when the same buffer might be used only by nested
> contexts. It does not matter if the outer context finishes writing
> later. The nested context should not need the buffer anymore.
>
> But a problem might happen when the same buffer is shared between
> more non-nested contexts. One context might loose the lock a hostile way.
> The other context might get the access after the hostile context
> released the lock.

Hostile takeovers _only occur during panic_.

> NORMAL and PANIC contexts are safe. These priorities have only
> one context and both have their own buffers.
>
> A problem might be with EMERGENCY contexts. Each CPU might have
> its own EMERGENCY context. We might prevent this problem if
> we do not allow to acquire the lock in EMERGENCY (and NORMAL)
> context when panic() is running or after the first hostile
> takeover.

A hostile takeover means a CPU took ownership with PANIC priority. No
CPU can steal ownership from the PANIC owner. Once the PANIC owner
releases ownership, the panic message has been output to the atomic
consoles. Do we really care what happens after that?

John

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: buffer write race: Re: [PATCH printk v1 09/18] printk: nobkl: Add print state functions
  2023-03-29 14:33     ` John Ogness
@ 2023-03-30 11:54       ` Petr Mladek
  0 siblings, 0 replies; 92+ messages in thread
From: Petr Mladek @ 2023-03-30 11:54 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman

On Wed 2023-03-29 16:39:54, John Ogness wrote:
> On 2023-03-29, Petr Mladek <pmladek@suse.com> wrote:
> >> +/**
> >> + * console_can_proceed - Check whether printing can proceed
> >> + * @wctxt:	The write context that was handed to the write function
> >> + *
> >> + * Returns:	True if the state is correct. False if a handover
> >> + *		has been requested or if the console was taken
> >> + *		over.
> >> + *
> >> + * Must be invoked after the record was dumped into the assigned record
> >> + * buffer
> >
> > The word "after" made me think about possible races when the record
> > buffer is being filled. The owner might loose the lock a hostile
> > way during this action. And we should prevent using the same buffer
> > when the other owner is still modifying the content.
> >
> > It should be safe when the same buffer might be used only by nested
> > contexts. It does not matter if the outer context finishes writing
> > later. The nested context should not need the buffer anymore.
> >
> > But a problem might happen when the same buffer is shared between
> > more non-nested contexts. One context might loose the lock a hostile way.
> > The other context might get the access after the hostile context
> > released the lock.
> 
> Hostile takeovers _only occur during panic_.
>
> > NORMAL and PANIC contexts are safe. These priorities have only
> > one context and both have their own buffers.
> >
> > A problem might be with EMERGENCY contexts. Each CPU might have
> > its own EMERGENCY context. We might prevent this problem if
> > we do not allow to acquire the lock in EMERGENCY (and NORMAL)
> > context when panic() is running or after the first hostile
> > takeover.
> 
> A hostile takeover means a CPU took ownership with PANIC priority. No
> CPU can steal ownership from the PANIC owner. Once the PANIC owner
> releases ownership, the panic message has been output to the atomic
> consoles. Do we really care what happens after that?

I see. The hostile take over is allowed only in
cons_atomic_exit(CONS_PRIO_PANIC, prev_prio) that is called at the
very end of panic() before the infinite blinking.

It is true that we do not care at this moment. It is actually called
after "suppress_printk = 1;" so that there should not be any
new messages.

Anyway, it would be nice to document this subtle race somewhere.
I could imagine that people would want to risk the hostile
takeover even earlier so the race might get introduced.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 92+ messages in thread

* dropped handling: was: Re: [PATCH printk v1 10/18] printk: nobkl: Add emit function and callback functions for atomic printing
  2023-03-02 19:56 ` [PATCH printk v1 10/18] printk: nobkl: Add emit function and callback functions for atomic printing John Ogness
  2023-03-03  0:19   ` kernel test robot
@ 2023-03-31 10:29   ` Petr Mladek
  2023-03-31 10:36   ` semantic: " Petr Mladek
  2 siblings, 0 replies; 92+ messages in thread
From: Petr Mladek @ 2023-03-31 10:29 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman

On Thu 2023-03-02 21:02:10, John Ogness wrote:
> From: Thomas Gleixner <tglx@linutronix.de>
> 
> Implement an emit function for non-BKL consoles to output printk
> messages. It utilizes the lockless printk_get_next_message() and
> console_prepend_dropped() functions to retrieve/build the output
> message. The emit function includes the required safety points to
> check for handover/takeover and calls a new write_atomic callback
> of the console driver to output the message. It also includes proper
> handling for updating the non-BKL console sequence number.
> 
> --- a/kernel/printk/printk_nobkl.c
> +++ b/kernel/printk/printk_nobkl.c
> @@ -1086,6 +1086,123 @@ bool console_exit_unsafe(struct cons_write_context *wctxt)
>  	return __console_update_unsafe(wctxt, false);
>  }
>  
> +/**
> + * cons_get_record - Fill the buffer with the next pending ringbuffer record
> + * @wctxt:	The write context which will be handed to the write function
> + *
> + * Returns:	True if there are records available. If the next record should
> + *		be printed, the output buffer is filled and @wctxt->outbuf
> + *		points to the text to print. If @wctxt->outbuf is NULL after
> + *		the call, the record should not be printed but the caller must
> + *		still update the console sequence number.
> + *
> + *		False means that there are no pending records anymore and the
> + *		printing can stop.
> + */
> +static bool cons_get_record(struct cons_write_context *wctxt)
> +{
> +	struct cons_context *ctxt = &ACCESS_PRIVATE(wctxt, ctxt);
> +	struct console *con = ctxt->console;
> +	bool is_extended = console_srcu_read_flags(con) & CON_EXTENDED;
> +	struct printk_message pmsg = {
> +		.pbufs = ctxt->pbufs,
> +	};
> +
> +	if (!printk_get_next_message(&pmsg, ctxt->newseq, is_extended, true))
> +		return false;
> +
> +	ctxt->newseq = pmsg.seq;
> +	ctxt->dropped += pmsg.dropped;
> +
> +	if (pmsg.outbuf_len == 0) {
> +		wctxt->outbuf = NULL;
> +	} else {
> +		if (ctxt->dropped && !is_extended)
> +			console_prepend_dropped(&pmsg, ctxt->dropped);
> +		wctxt->outbuf = &pmsg.pbufs->outbuf[0];
> +	}
> +
> +	wctxt->len = pmsg.outbuf_len;

This function seems to be needed only because we duplicate the information
in both struct printk_message and struct cons_write_context.

I think that we will not need this function at all if we bundle
struct printk_message into struct cons_context. I mean to replace:

struct cons_context {
	[...]
	struct printk_buffers	*pbufs;
	u64			newseq;
	unsigned long		dropped;
	[...]
}

with

struct cons_context {
	[...]
	struct printk_message pmsg;
	[...]
}

> +
> +	return true;
> +}
> +
> +/**
> + * cons_emit_record - Emit record in the acquired context
> + * @wctxt:	The write context that will be handed to the write function
> + *
> + * Returns:	False if the operation was aborted (takeover or handover).
> + *		True otherwise
> + *
> + * When false is returned, the caller is not allowed to touch console state.
> + * The console is owned by someone else. If the caller wants to print more
> + * it has to reacquire the console first.
> + *
> + * When true is returned, @wctxt->ctxt.backlog indicates whether there are
> + * still records pending in the ringbuffer,
> + */
> +static int __maybe_unused cons_emit_record(struct cons_write_context *wctxt)
> +{
> +	struct cons_context *ctxt = &ACCESS_PRIVATE(wctxt, ctxt);
> +	struct console *con = ctxt->console;
> +	bool done = false;
> +
> +	/*
> +	 * @con->dropped is not protected in case of hostile takeovers so
> +	 * the update below is racy. Annotate it accordingly.
> +	 */
> +	ctxt->dropped = data_race(READ_ONCE(con->dropped));
> +
> +	/* Fill the output buffer with the next record */
> +	ctxt->backlog = cons_get_record(wctxt);
> +	if (!ctxt->backlog)
> +		return true;
> +
> +	/* Safety point. Don't touch state in case of takeover */
> +	if (!console_can_proceed(wctxt))
> +		return false;
> +
> +	/* Counterpart to the read above */
> +	WRITE_ONCE(con->dropped, ctxt->dropped);

These racy hacks with ctxt-> dropped won't be needed if we bundle
strcut printk_message into struct cons_context.

> +
> +	/*
> +	 * In case of skipped records, Update sequence state in @con.
> +	 */
> +	if (!wctxt->outbuf)
> +		goto update;
> +
> +	/* Tell the driver about potential unsafe state */
> +	wctxt->unsafe = ctxt->state.unsafe;
> +
> +	if (!ctxt->thread && con->write_atomic) {
> +		done = con->write_atomic(con, wctxt);
> +	} else {
> +		cons_release(ctxt);
> +		WARN_ON_ONCE(1);
> +		return false;
> +	}
> +
> +	/* If not done, the write was aborted due to takeover */
> +	if (!done)
> +		return false;
> +
> +	/* If there was a dropped message, it has now been output. */
> +	if (ctxt->dropped) {
> +		ctxt->dropped = 0;
> +		/* Counterpart to the read above */
> +		WRITE_ONCE(con->dropped, ctxt->dropped);

I suggest to use atomic_t for con->dropped and use

		atomic_sub(ctxt->dropped, &con->dropped);

> +	}
> +update:
> +	ctxt->newseq++;
> +	/*
> +	 * The sequence update attempt is not part of console_release()
> +	 * because in panic situations the console is not released by
> +	 * the panic CPU until all records are written. On 32bit the
> +	 * sequence is separate from state anyway.
> +	 */
> +	return cons_seq_try_update(ctxt);
> +}

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 92+ messages in thread

* semantic: Re: [PATCH printk v1 10/18] printk: nobkl: Add emit function and callback functions for atomic printing
  2023-03-02 19:56 ` [PATCH printk v1 10/18] printk: nobkl: Add emit function and callback functions for atomic printing John Ogness
  2023-03-03  0:19   ` kernel test robot
  2023-03-31 10:29   ` dropped handling: was: " Petr Mladek
@ 2023-03-31 10:36   ` Petr Mladek
       [not found]     ` <87edp29kvq.fsf@jogness.linutronix.de>
  2 siblings, 1 reply; 92+ messages in thread
From: Petr Mladek @ 2023-03-31 10:36 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman

On Thu 2023-03-02 21:02:10, John Ogness wrote:
> From: Thomas Gleixner <tglx@linutronix.de>
> 
> Implement an emit function for non-BKL consoles to output printk
> messages. It utilizes the lockless printk_get_next_message() and
> console_prepend_dropped() functions to retrieve/build the output
> message. The emit function includes the required safety points to
> check for handover/takeover and calls a new write_atomic callback
> of the console driver to output the message. It also includes proper
> handling for updating the non-BKL console sequence number.
> 
> --- a/kernel/printk/printk_nobkl.c
> +++ b/kernel/printk/printk_nobkl.c
> +/**
> + * cons_emit_record - Emit record in the acquired context
> + * @wctxt:	The write context that will be handed to the write function
> + *
> + * Returns:	False if the operation was aborted (takeover or handover).
> + *		True otherwise
> + *
> + * When false is returned, the caller is not allowed to touch console state.
> + * The console is owned by someone else. If the caller wants to print more
> + * it has to reacquire the console first.
> + *
> + * When true is returned, @wctxt->ctxt.backlog indicates whether there are
> + * still records pending in the ringbuffer,

This is inconsistent and a bit confusing. This seems to be the only
function returning "true" when there is no pending output.

All the other functions cons_get_record(), console_emit_next_record(),
and printk_get_next_message() return false in this case.

It has to distinguish 3 different return states anyway, same as
console_emit_next_record(). I suggest to use the same semantic
and distinguish "no pending records" and "handed over lock"
via a "handover" flag. Or maybe the caller should just check
if it still owns the lock.

> + */
> +static int __maybe_unused cons_emit_record(struct cons_write_context *wctxt)
> +{
> +	struct cons_context *ctxt = &ACCESS_PRIVATE(wctxt, ctxt);
> +	struct console *con = ctxt->console;
> +	bool done = false;
> +
> +	/*
> +	 * @con->dropped is not protected in case of hostile takeovers so
> +	 * the update below is racy. Annotate it accordingly.
> +	 */
> +	ctxt->dropped = data_race(READ_ONCE(con->dropped));
> +
> +	/* Fill the output buffer with the next record */
> +	ctxt->backlog = cons_get_record(wctxt);
> +	if (!ctxt->backlog)
> +		return true;
> +
> +	/* Safety point. Don't touch state in case of takeover */
> +	if (!console_can_proceed(wctxt))
> +		return false;
> +
> +	/* Counterpart to the read above */
> +	WRITE_ONCE(con->dropped, ctxt->dropped);
> +
> +	/*
> +	 * In case of skipped records, Update sequence state in @con.
> +	 */
> +	if (!wctxt->outbuf)
> +		goto update;
> +
> +	/* Tell the driver about potential unsafe state */
> +	wctxt->unsafe = ctxt->state.unsafe;
> +
> +	if (!ctxt->thread && con->write_atomic) {

I would expect this check in console_is_usable(), same as for legacy
consoles.

And what is actually the difference between con->write_atomic()
and con->write_thread(), where write_thread() is added later
in 11th patch?

I guess that the motivation is that the kthread variant
might sleep. But I do not see it described anywhere.

Do we really need two callbacks? I would expect that the code
would be basically the same.

Maybe, the callback could call cond_resched() when running
in kthread but this information might be passed via a flag.

Or is this a preparation for tty code where the implementation
would be really different?

> +		done = con->write_atomic(con, wctxt);
> +	} else {
> +		cons_release(ctxt);
> +		WARN_ON_ONCE(1);
> +		return false;
> +	}

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: simplify: was: Re: [PATCH printk v1 06/18] printk: nobkl: Add acquire/release logic
  2023-03-21 15:36     ` Petr Mladek
@ 2023-04-02 18:39       ` John Ogness
  0 siblings, 0 replies; 92+ messages in thread
From: John Ogness @ 2023-04-02 18:39 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman

On 2023-03-21, Petr Mladek <pmladek@suse.com> wrote:
> I would prefer to do the logic change. It might help with review
> and also with the long term maintenance.

As a test I have implemented many of your suggestions. Following your
ideas of simplification, I have also found other areas where things can
be simplified without sacrificing functionality. Thanks for taking the
time to propose an alternate simplified implementation.

John

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: semantic: Re: [PATCH printk v1 10/18] printk: nobkl: Add emit function and callback functions for atomic printing
       [not found]         ` <87ilecsrvl.fsf@jogness.linutronix.de>
@ 2023-04-04 14:09           ` Petr Mladek
  0 siblings, 0 replies; 92+ messages in thread
From: Petr Mladek @ 2023-04-04 14:09 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman

On Mon 2023-04-03 21:17:26, John Ogness wrote:
> On 2023-04-03, Petr Mladek <pmladek@suse.com> wrote:
> >> The main difference is that the kthread variant is invoked _only_
> >> from the kthread printer. It may or may not mean that the callback
> >> can sleep. (That depends on how the console implements the
> >> port_lock() callback.) But the function can be certain that it wasn't
> >> called from any bizarre interrupt/nmi/scheduler contexts.
> >> 
> >> The atomic callback can be called from anywhere! Including when it
> >> was already inside the atomic callback function! That probably
> >> requires much more careful coding than in the kthread case.
> >
> > Is it just about coding? Or is it also about making write_khtread()
> > better schedulable, please?
> 
> For UARTs there probably isn't much of a difference because most disable
> interrupts anyway. A diff in the latest version [0] of the 8250 nbcon
> console between serial8250_console_write_atomic() and
> serial8250_console_write_thread() shows no significant difference in the
> two except that the atomic variant may prefix with a newline if it
> interrupted a printing context.
> 
> But for vtcon and netconsole I expect there would be a very significant
> difference. For vtcon (and possibly netconsole) I expect there will be
> no atomic implementation at all.

Then these consoles might need another solution for the panic()
situation, like blocking the kthread and switching to the legacy
mode.

OK, so it might really make sense to have a separate callback
for the kthread and emergency/panic contexts.


> > Hmm, it is very questional if the callbacks should be optional.
> >
> > Do we really want to allow introducing non-blocking consoles without
> > the way to print emergency and panic messages? Such a console would
> > not be fully-flegged replacement of the legacy mode.
> 
> Not necessarily. For vtcon we are "planning" on a BSoD, probably based
> on the kmsg_dump interface. For netconsole it could be similar.
> 
> We are trying to give drivers an opportunity to implement some safety
> and control to maximize the chances of getting a dump out without
> jeopardizing other panic functions.
> 
> As a quick implementation a UART driver could simply set @unsafe upon
> entrace of write_thread() and clear it on exit. Then its write_atomic()
> would only be called at the very end of panic() if the panic happened
> during printing. For its write_atomic() implementation it could be a
> NOOP if !oops_in_progress. All locking is handled with the port_lock()
> callback, which is only called from the kthread context. It isn't
> particularly pretty, but it most likely would be more reliable than what
> we have now.

If I get it correctly, the above scenario requires both
write_kthread() and write_atomic(). Otherwise, it would not be
able to print in panic() at all. Right, please?


> > What about making write_atomic() mandatory and write_kthread()
> > optional?
> 
> I doubt there will ever be a write_atomic() for vtcon. BSoD based on
> kmsg_dump is a simpler approach.

But we would need to add an infrastructure for the BSoD(). For example,
call yet another callback at the end of panic(). Also it does not mean
that we might completely give up on printk() messages in panic().

Anyway, this might be solved later. I would really like to enforce
having both callbacks and good enough solution for now. It might later
be updated to another good enough solution using another panic() mode.


> > write_atomic() would be used in the kthread when write_kthread()
> > is not defined. write_kthread() would allow to create an alternative
> > implementation that might work better in the well defined kthread
> > context.
> 
> write_atomic() is the difficult callback to implement. It certainly
> could be used in the write_thread() case if no write_thread() was
> provided. But I still think there are valid cases to have a
> write_thread() implementation without a write_atomic().

The proposed framework does not provide a solution for consoles
that can't implement write_atomic(). And ignoring the messages
in panic() is not acceptable.

Either we need to enforce another good enough solution for these
consoles. Or we must not allow them now. We could update the logic
later when we see how the BSoD really looks and works.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 92+ messages in thread

* boot console: was: Re: [PATCH printk v1 11/18] printk: nobkl: Introduce printer threads
  2023-03-02 19:56 ` [PATCH printk v1 11/18] printk: nobkl: Introduce printer threads John Ogness
  2023-03-03  1:23   ` kernel test robot
@ 2023-04-05 10:48   ` Petr Mladek
  2023-04-06  8:09   ` wakeup synchronization: " Petr Mladek
                     ` (3 subsequent siblings)
  5 siblings, 0 replies; 92+ messages in thread
From: Petr Mladek @ 2023-04-05 10:48 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman

On Thu 2023-03-02 21:02:11, John Ogness wrote:
> From: Thomas Gleixner <tglx@linutronix.de>
> 
> Add the infrastructure to create a printer thread per console along
> with the required thread function, which is takeover/handover aware.
> 
> --- a/kernel/printk/internal.h
> +++ b/kernel/printk/internal.h
> @@ -75,6 +77,55 @@ u64 cons_read_seq(struct console *con);
>  void cons_nobkl_cleanup(struct console *con);
>  bool cons_nobkl_init(struct console *con);
>  bool cons_alloc_percpu_data(struct console *con);
> +void cons_kthread_create(struct console *con);
> +
> +/*
> + * Check if the given console is currently capable and allowed to print
> + * records. If the caller only works with certain types of consoles, the
> + * caller is responsible for checking the console type before calling
> + * this function.
> + */
> +static inline bool console_is_usable(struct console *con, short flags)
> +{
> +	if (!(flags & CON_ENABLED))
> +		return false;
> +
> +	if ((flags & CON_SUSPENDED))
> +		return false;
> +
> +	/*
> +	 * The usability of a console varies depending on whether
> +	 * it is a NOBKL console or not.
> +	 */
> +
> +	if (flags & CON_NO_BKL) {
> +		if (have_boot_console)
> +			return false;

I am not sure if this is the right place to discuss it.
Different patches add pieces that are part of the puzzle.

Anyway, how are the NOBKL consoles supposed to work when a boot console
is still registered, please?

I see that a later patch adds:

asmlinkage int vprintk_emit(int facility, int level,
			    const struct dev_printk_info *dev_info,
			    const char *fmt, va_list args)
{
[...]
	/*
	 * Flush the non-BKL consoles. This only leads to direct atomic
	 * printing for non-BKL consoles that do not have a printer
	 * thread available. Otherwise the printer thread will perform
	 * the printing.
	 */
	cons_atomic_flush(&wctxt, true);
[...]
}

This patch adds cons_kthread_create(). And it refuses to create
the kthread as long as there is a boot console registered.

Finally, a later added cons_atomic_flush() ignores consoles where
console_is_usable() returns false:

void cons_atomic_flush(struct cons_write_context *printk_caller_wctxt, bool skip_unsafe)
{
[...]
	for_each_console_srcu(con) {
[...]
		if (!console_is_usable(con, flags))
			continue;

It looks to me that NOBKL consoles will not show anything as long as
a boot console is registered.

And the boot console might never be removed when "keep_bootcon"
parameter is used.


Sigh, this looks like a non-trivial problem. I see it as a combination
of two things:

   + NOBKL consoles are independent. And this is actually one
     big feature.

   + There is no 1:1 relation between boot and real console using
     the same output device (serial port). I mean that
     register_console() is not able to match which real console
     is replacing a particular boot console.

As a result, we could not easily synchronize boot and the related real
console against each other.

I see three possible solutions:

A) Ignore this problem. People are most likely using only one boot
   console. And the real console will get enabled "immediately"
   after this console gets removed. So that there should not be
   any gap.

   The only problem is that people use more real consoles. And
   also unrelated real consoles will not see anything.

   I guess that people might notice that they do not see anything
   on ttyX console until ttySX replaces an early serial console.
   And they might consider this as a regression.


B) Allow to match boot and real consoles using the same output device
   and properly synchronize them against each other.

   It might mean:

       + sharing the same atomic lock (atomic_state)
       + sharing the same device (port) lock
       + avoid running both at the same time by a careful
	 switch during the registration of the real console

    , where sharing the same port lock might theoretically be done
    without 1:1 matching of the related console drivers. They
    would use the same port->lock spin_lock.

    This might also fix the ugly race during console registration
    when we unregister the boot console too early or too late.
    The switch between a boot and the related real console might be
    always smooth.

    The problem is that it might be pretty complicated to achieve
    this.


C) Synchronize also NOBKL consoles using console_sem until
   all boot consoles are removed and kthreads started.

   I might actually be pretty easy. It might be enough to
   move cons_flush_all() from vprintk_emit() into
   console_flush_all() that is called under console_lock().

   I think that we need to keep cons_flush_all() in vprintk_emit()
   to emit the messages directly in EMERGENCY and PANIC modes.
   But we do not need or must not call it there when there is
   still a boot console. We would know that it will called later
   from console_unlock() in this case.


My preferences:

   + A probably is not acceptable. It would make me feel very
     uncomfortable, definitely.

   + B looks like the best solution but it might be very hard to achieve.

   + C seems to be good enough for now.

I think that C is the only realistic way to go unless there is another
reasonable solution.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 92+ messages in thread

* wakeup synchronization: was: Re: [PATCH printk v1 11/18] printk: nobkl: Introduce printer threads
  2023-03-02 19:56 ` [PATCH printk v1 11/18] printk: nobkl: Introduce printer threads John Ogness
  2023-03-03  1:23   ` kernel test robot
  2023-04-05 10:48   ` boot console: was: " Petr Mladek
@ 2023-04-06  8:09   ` Petr Mladek
  2023-04-06  9:46   ` port lock: " Petr Mladek
                     ` (2 subsequent siblings)
  5 siblings, 0 replies; 92+ messages in thread
From: Petr Mladek @ 2023-04-06  8:09 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman

On Thu 2023-03-02 21:02:11, John Ogness wrote:
> From: Thomas Gleixner <tglx@linutronix.de>
> 
> Add the infrastructure to create a printer thread per console along
> with the required thread function, which is takeover/handover aware.
>
> --- a/kernel/printk/printk_nobkl.c
> +++ b/kernel/printk/printk_nobkl.c
> +/**
> + * cons_kthread_func - The printk thread function
> + * @__console:	Console to operate on
> + */
> +static int cons_kthread_func(void *__console)
> +{
> +	struct console *con = __console;
> +	struct cons_write_context wctxt = {
> +		.ctxt.console	= con,
> +		.ctxt.prio	= CONS_PRIO_NORMAL,
> +		.ctxt.thread	= 1,
> +	};
> +	struct cons_context *ctxt = &ACCESS_PRIVATE(&wctxt, ctxt);
> +	unsigned long flags;
> +	short con_flags;
> +	bool backlog;
> +	int cookie;
> +	int ret;
> +
> +	for (;;) {
> +		atomic_inc(&con->kthread_waiting);

Sigh, I really have hard times to scratch my head around the barriers
here. This part looks fine. The rcuwait_wait_event() provides full
barrier before checking the "condition".

But I am not sure about the counter part. It is in another patch.
IMHO, there should be a full barrier before checking
con->kthread_waiting. Something like this:

+  void cons_wake_threads(void)
+ {
+ 	struct console *con;
+ 	int cookie;
+

	/*
	 * Full barrier against rcuwait_wait_event() in	cons_kthread_func().
	 *
	 * The purpose of this barrier is to make sure that the new record is
	 * stored before checking con->kthread_waiting.
	 *
	 * It has the same purpose as the full barrier in rcuwait_wake_up().
	 * It makes sure that cons_kthread_should_wakeup() see the new record
	 * before going into sleep in rcuwait_wait_event().
	 *
	 * The extra barrier is needed here because rcuwait_wake_up() is called
	 * only when we see con->kthread_waiting set. We need to make sure
	 * that either we see con->kthread_waiting or cons_kthread_func()
	 * will see the new record when checking the condition in
	 * rcuwait_wait_event().
	 */
	smp_mb();

+ 	cookie = console_srcu_read_lock();
+ 	for_each_console_srcu(con) {
+ 		if (con->kthread && atomic_read(&con->kthread_waiting))
+ 			irq_work_queue(&con->irq_work);
+ 	}
+ 	console_srcu_read_unlock(cookie);
+ }

I think that I am right. But I am not in a good "see-barriers" mood so
I also might be wrong.

> +
> +		/*
> +		 * Provides a full memory barrier vs. cons_kthread_wake().
> +		 */
> +		ret = rcuwait_wait_event(&con->rcuwait,
> +					 cons_kthread_should_wakeup(con, ctxt),
> +					 TASK_INTERRUPTIBLE);

I am sorry but I would need some explanation for this. I am not
familiar with the rcuwait API. I looked at the code, commit messages,
and various users, and I am still not sure.

My assumption is that this allows to wait for an event on "con"
when the lifetime of this structure is synchronized using SRCU.
The counter-part calls rcuwait_wake_up() under srcu_read_lock().

I am afraid that it it is more complicated in our case.
We do not call rcuwait_wake_up() under srcu_read_lock().
We call it from an irq_work() that might be proceed later
after srcu_read_unlock().

IMHO, we need to make sure that there is no pending irq_work
and nobody could create a new one after exit from
unregister_console(). There seems to be irq_work_sync()
for this purpose.

> +
> +		atomic_dec(&con->kthread_waiting);
> +
> +		if (kthread_should_stop())
> +			break;
> +
> +		/* Wait was interrupted by a spurious signal, go back to sleep */
> +		if (ret)
> +			continue;
> +
> +		for (;;) {
[...]
> +
> +			if (console_is_usable(con, con_flags)) {
> +				/*
> +				 * If the emit fails, this context is no
> +				 * longer the owner. Abort the processing and
> +				 * wait for new records to print.
> +				 */
> +				if (!cons_emit_record(&wctxt))
> +					break;
> +
> +				backlog = ctxt->backlog;
> +			} else {
> +				backlog = false;
> +			}
[...]
> +	}
> +	return 0;
> +}
> +

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 92+ messages in thread

* port lock: was: Re: [PATCH printk v1 11/18] printk: nobkl: Introduce printer threads
  2023-03-02 19:56 ` [PATCH printk v1 11/18] printk: nobkl: Introduce printer threads John Ogness
                     ` (2 preceding siblings ...)
  2023-04-06  8:09   ` wakeup synchronization: " Petr Mladek
@ 2023-04-06  9:46   ` Petr Mladek
  2023-04-20  9:55     ` Petr Mladek
  2023-04-06 13:19   ` misc: " Petr Mladek
  2023-04-13 13:28   ` (k)thread: " Petr Mladek
  5 siblings, 1 reply; 92+ messages in thread
From: Petr Mladek @ 2023-04-06  9:46 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman

On Thu 2023-03-02 21:02:11, John Ogness wrote:
> From: Thomas Gleixner <tglx@linutronix.de>
> 
> Add the infrastructure to create a printer thread per console along
> with the required thread function, which is takeover/handover aware.

> --- a/kernel/printk/printk_nobkl.c
> +++ b/kernel/printk/printk_nobkl.c
> +/**
> + * cons_kthread_func - The printk thread function
> + * @__console:	Console to operate on
> + */
> +static int cons_kthread_func(void *__console)
> +{
> +	struct console *con = __console;
> +	struct cons_write_context wctxt = {
> +		.ctxt.console	= con,
> +		.ctxt.prio	= CONS_PRIO_NORMAL,
> +		.ctxt.thread	= 1,
> +	};
> +	struct cons_context *ctxt = &ACCESS_PRIVATE(&wctxt, ctxt);
> +	unsigned long flags;
> +	short con_flags;
> +	bool backlog;
> +	int cookie;
> +	int ret;
> +
> +	for (;;) {
> +		atomic_inc(&con->kthread_waiting);
> +
> +		/*
> +		 * Provides a full memory barrier vs. cons_kthread_wake().
> +		 */
> +		ret = rcuwait_wait_event(&con->rcuwait,
> +					 cons_kthread_should_wakeup(con, ctxt),
> +					 TASK_INTERRUPTIBLE);
> +
> +		atomic_dec(&con->kthread_waiting);
> +
> +		if (kthread_should_stop())
> +			break;
> +
> +		/* Wait was interrupted by a spurious signal, go back to sleep */
> +		if (ret)
> +			continue;
> +
> +		for (;;) {
> +			cookie = console_srcu_read_lock();
> +
> +			/*
> +			 * Ensure this stays on the CPU to make handover and
> +			 * takeover possible.
> +			 */
> +			if (con->port_lock)
> +				con->port_lock(con, true, &flags);

IMHO, we should use a more generic name. This should be a lock that
provides full synchronization between con->write() and other
operations on the device used by the console.

"port_lock" is specific for the serial consoles. IMHO, other consoles
might use another lock. IMHO, tty uses "console_lock" internally for
this purpose. netconsole seems to has "target_list_lock" that might
possible have this purpose, s390 consoles are using sclp_con_lock,
sclp_vt220_lock, or get_ccwdev_lock(raw->cdev).


Honestly, I expected that we could replace these locks by
cons_acquire_lock(). I know that the new lock is special: sleeping,
timeouting, allows hand-over by priorities.

But I think that we might implement cons_acquire_lock() that would always
busy wait without any timeout. And use some "priority" that would
never handover the lock a voluntary way at least not with a voluntary
one. The only difference would be that it is sleeping. But it might
be acceptable in many cases.

Using the new lock instead of port->lock would allow to remove
the tricks with using spin_trylock() when oops_in_progress is set.

That said, I am not sure if this is possible without major changes.
For example, in case of serial consoles, it would require touching
the layer using port->lock.

Also it would requere 1:1 relation between struct console and the output
device lock. I am not sure if it is always the case. On the other
hand, adding some infrastructure for this 1:1 relationship would
help to solve smooth transition from the boot to the real console
driver.


OK, let's first define what the two locks are supposed to synchronize.
My understanding is that this patchset uses them the following way:

    + The new lock (atomic_state) is used to serialize emiting
      messages between different write contexts. It replaces
      the functionality of console_lock.

      It is a per-console sleeping lock, allows voluntary and hars
      hand-over using priorities and spinning with a timeout.


    + The port_lock is used to synchronize various operations
      of the console driver/device, like probe, init, exit,
      configuration update.

      It is typically a per-console driver/device spin lock.


I guess that we would want to keep both locks:

    + it might help to do the rework manageable

    + the sleeping lock might complicate some operations;
      raw_spin_lock might be necessary at least on
      non-RT system.


Are there better names? What about?

    + emit_lock() or ctxt_lock() for the new special lock

    + device_lock() or driver_lock() as a generic name
      for the driver/device specific lock.

Sigh, the problem with the device_lock()/driver_lock()
is that it might get confused with:

	struct tty_driver	*(*device)(struct console *co, int *index);

It would be really great to make it clear that this callback is about
the connection to the tty layer. I would rename it to:

	struct tty_driver	*(*tty_drv)(struct console *co, int *index);
or
	struct tty_driver	*(*tty_driver)(struct console *co, int *index);


> +			else
> +				migrate_disable();
> +
> +			/*
> +			 * Try to acquire the console without attempting to
> +			 * take over. If an atomic printer wants to hand
> +			 * back to the thread it simply wakes it up.
> +			 */
> +			if (!cons_try_acquire(ctxt))
> +				break;
> +

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 92+ messages in thread

* misc: was: Re: [PATCH printk v1 11/18] printk: nobkl: Introduce printer threads
  2023-03-02 19:56 ` [PATCH printk v1 11/18] printk: nobkl: Introduce printer threads John Ogness
                     ` (3 preceding siblings ...)
  2023-04-06  9:46   ` port lock: " Petr Mladek
@ 2023-04-06 13:19   ` Petr Mladek
  2023-04-13 13:28   ` (k)thread: " Petr Mladek
  5 siblings, 0 replies; 92+ messages in thread
From: Petr Mladek @ 2023-04-06 13:19 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman

On Thu 2023-03-02 21:02:11, John Ogness wrote:
> From: Thomas Gleixner <tglx@linutronix.de>
> 
> Add the infrastructure to create a printer thread per console along
> with the required thread function, which is takeover/handover aware.

This deserves a more detailed description. It should describe
the expected behavior of the newly added pieces of the puzzle.

> --- a/kernel/printk/printk_nobkl.c
> +++ b/kernel/printk/printk_nobkl.c
> @@ -714,6 +717,14 @@ static bool __cons_try_acquire(struct cons_context *ctxt)
>  		goto success;
>  	}
>  
> +	/*
> +	 * A threaded printer context will never spin or perform a
> +	 * hostile takeover. The atomic writer will wake the thread
> +	 * when it is done with the important output.
> +	 */
> +	if (ctxt->thread)
> +		return false;

I suggest to remove this optimization. Or replace it with a check
of the lowest NORMAL priority.

First, it is conceptually questionable. The kthread might actually want to
take over the lock. It is the preferred context when the system
works properly.

Second, it is a bit superfluous. The kthread will give up on the next check
anyway because it is the context with the lowest NORMAL priority.

I guess that this is another relic of the first POC that allowed
to take over the lock from a context of the same priority.

I though more about passing the lock:

Passing the lock between console contexts of the same priority would
have basically the same effect as console_trylock_spinning() trick
in the legacy code. The only motivation would be to reduce
the risk of softlockups. But it would make sense only in the EMERGENCY
contexts. There should be only one NORMAL and PANIC contexts.

Also passing the lock between context of the same priority would
be more complicated with the NOBKL consoles. Some messages (parts)
might be printed many times when the lock is passed in the middle
of the record and the new owner always starts from scratch.

> +
>  	/*
>  	 * If the active context is on the same CPU then there is
>  	 * obviously no handshake possible.
> @@ -871,6 +882,9 @@ static bool __cons_release(struct cons_context *ctxt)
>  	return true;
>  }
>

> +static bool printk_threads_enabled __ro_after_init;

This might deserve a comment when exactly it gets enabled.
My understanding is that it is set during a boot phase
when it is safe to create the kthreads.

> +static bool printk_force_atomic __initdata;

I guess that this will be a kernel parameter. But it is not defined in
this patch. The logic should be introduced together with the parameter.

> +
>  /**
>   * cons_release - Release the console after output is done
>   * @ctxt:	The acquire context that contains the state
> @@ -1203,6 +1219,243 @@ static int __maybe_unused cons_emit_record(struct cons_write_context *wctxt)
>  	return cons_seq_try_update(ctxt);
>  }
>  
> +/**
> + * cons_kthread_should_wakeup - Check whether the printk thread should wakeup
> + * @con:	Console to operate on
> + * @ctxt:	The acquire context that contains the state
> + *		at console_acquire()
> + *
> + * Returns: True if the thread should shutdown or if the console is allowed to
> + * print and a record is available. False otherwise
> + *
> + * After the thread wakes up, it must first check if it should shutdown before
> + * attempting any printing.

I would move this comment right above the kthread_should_stop()
check. I think that it is a bigger chance to see it there.

> + */
> +static bool cons_kthread_should_wakeup(struct console *con, struct cons_context *ctxt)
> +{
> +	bool is_usable;
> +	short flags;
> +	int cookie;
> +
> +	if (kthread_should_stop())
> +		return true;
> +
> +	cookie = console_srcu_read_lock();
> +	flags = console_srcu_read_flags(con);
> +	is_usable = console_is_usable(con, flags);
> +	console_srcu_read_unlock(cookie);
> +
> +	if (!is_usable)
> +		return false;
> +
> +	/* This reads state and sequence on 64bit. On 32bit only state */
> +	cons_state_read(con, CON_STATE_CUR, &ctxt->state);
> +
> +	/*
> +	 * Atomic printing is running on some other CPU. The owner
> +	 * will wake the console thread on unlock if necessary.
> +	 */
> +	if (ctxt->state.locked)
> +		return false;
> +
> +	/* Bring the sequence in @ctxt up to date */
> +	cons_context_set_seq(ctxt);

This name is a bit confusing. It looks like it is setting some state.
But the primary function is to actually read the value.

Also the function sets both "oldseq" and "newseq". This looks superfluous.
The caller will need to refresh the values once again after
cons_try_acquire_lock() once again.

It should be enough to set "oldseq" here and "newseq" in cons_emit_record().

Finally, in an other mail I suggested to move:

    + ctxt.newseq -> xtxt.pmsg.seq
    + ctxt.oldseq -> ctxt.con_seq		// cache of con->seq

What about renaming the function to something like:

    + cons_context_read_con_seq()
    + cons_context_refresh_con_seq()


> +
> +	return prb_read_valid(prb, ctxt->oldseq, NULL);
> +}
> +
> +/**
> + * cons_kthread_func - The printk thread function
> + * @__console:	Console to operate on
> + */
> +static int cons_kthread_func(void *__console)
> +{
> +	struct console *con = __console;
> +	struct cons_write_context wctxt = {
> +		.ctxt.console	= con,
> +		.ctxt.prio	= CONS_PRIO_NORMAL,
> +		.ctxt.thread	= 1,
> +	};
> +	struct cons_context *ctxt = &ACCESS_PRIVATE(&wctxt, ctxt);
> +	unsigned long flags;
> +	short con_flags;
> +	bool backlog;
> +	int cookie;
> +	int ret;
> +
> +	for (;;) {
> +		atomic_inc(&con->kthread_waiting);
> +
> +		/*
> +		 * Provides a full memory barrier vs. cons_kthread_wake().
> +		 */
> +		ret = rcuwait_wait_event(&con->rcuwait,
> +					 cons_kthread_should_wakeup(con, ctxt),
> +					 TASK_INTERRUPTIBLE);
> +
> +		atomic_dec(&con->kthread_waiting);
> +
> +		if (kthread_should_stop())
> +			break;
> +
> +		/* Wait was interrupted by a spurious signal, go back to sleep */
> +		if (ret)
> +			continue;
> +
> +		for (;;) {
> +			cookie = console_srcu_read_lock();
> +
> +			/*
> +			 * Ensure this stays on the CPU to make handover and
> +			 * takeover possible.
> +			 */
> +			if (con->port_lock)
> +				con->port_lock(con, true, &flags);
> +			else
> +				migrate_disable();
> +
> +			/*
> +			 * Try to acquire the console without attempting to
> +			 * take over. If an atomic printer wants to hand
> +			 * back to the thread it simply wakes it up.
> +			 */
> +			if (!cons_try_acquire(ctxt))
> +				break;
> +
> +			con_flags = console_srcu_read_flags(con);
> +
> +			if (console_is_usable(con, con_flags)) {
> +				/*
> +				 * If the emit fails, this context is no
> +				 * longer the owner. Abort the processing and
> +				 * wait for new records to print.
> +				 */
> +				if (!cons_emit_record(&wctxt))

Please, rename the function to cons_emit_next_record() to match
the corresponding console_emit_next_record().

> +					break;
> +				backlog = ctxt->backlog;

Also please pass the 3rd possible return state via "handover" variable
to match the semantic of console_emit_next_record().

> +			} else {
> +				backlog = false;
> +			}
> +
> +			/*
> +			 * If the release fails, this context was not the
> +			 * owner. Abort the processing and wait for new
> +			 * records to print.
> +			 */
> +			if (!cons_release(ctxt))
> +				break;
> +
> +			/* Backlog done? */
> +			if (!backlog)
> +				break;
> +
> +			if (con->port_lock)
> +				con->port_lock(con, false, &flags);
> +			else
> +				migrate_enable();
> +
> +			console_srcu_read_unlock(cookie);
> +
> +			cond_resched();
> +		}
> +		if (con->port_lock)
> +			con->port_lock(con, false, &flags);
> +		else
> +			migrate_enable();
> +
> +		console_srcu_read_unlock(cookie);
> +	}
> +	return 0;
> +}
> +
> +/**
> + * cons_kthread_stop - Stop a printk thread
> + * @con:	Console to operate on
> + */
> +static void cons_kthread_stop(struct console *con)
> +{
> +	lockdep_assert_console_list_lock_held();
> +
> +	if (!con->kthread)
> +		return;

We need some tricks here to make sure that cons_kthread_wakeup()
will not longer wake it up:

	con->block_wakeup = true;
	irq_work_sync(&con->irq_work);

> +	kthread_stop(con->kthread);
> +	con->kthread = NULL;
> +
> +	kfree(con->thread_pbufs);
> +	con->thread_pbufs = NULL;
> +}
> +
> +/**
> + * cons_kthread_create - Create a printk thread
> + * @con:	Console to operate on
> + *
> + * If it fails, let the console proceed. The atomic part might
> + * be usable and useful.
> + */
> +void cons_kthread_create(struct console *con)
> +{
> +	struct task_struct *kt;
> +	struct console *c;
> +
> +	lockdep_assert_console_list_lock_held();
> +
> +	if (!(con->flags & CON_NO_BKL) || !con->write_thread)
> +		return;
> +
> +	if (!printk_threads_enabled || con->kthread)
> +		return;
> +
> +	/*
> +	 * Printer threads cannot be started as long as any boot console is
> +	 * registered because there is no way to synchronize the hardware
> +	 * registers between boot console code and regular console code.
> +	 */
> +	for_each_console(c) {
> +		if (c->flags & CON_BOOT)
> +			return;
> +	}
> +	have_boot_console = false;
> +
> +	con->thread_pbufs = kmalloc(sizeof(*con->thread_pbufs), GFP_KERNEL);
> +	if (!con->thread_pbufs) {
> +		con_printk(KERN_ERR, con, "failed to allocate printing thread buffers\n");
> +		return;
> +	}
> +
> +	kt = kthread_run(cons_kthread_func, con, "pr/%s%d", con->name, con->index);
> +	if (IS_ERR(kt)) {
> +		con_printk(KERN_ERR, con, "failed to start printing thread\n");
> +		kfree(con->thread_pbufs);
> +		con->thread_pbufs = NULL;
> +		return;

We should make sure that this console will still get flushed either
in vprintk_emit() or in console_unlock(). I think that it is not
guaranteed by this patchset.

> +	}
> +
> +	con->kthread = kt;
> +
> +	/*
> +	 * It is important that console printing threads are scheduled
> +	 * shortly after a printk call and with generous runtime budgets.
> +	 */
> +	sched_set_normal(con->kthread, -20);
> +}
> +

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH printk v1 12/18] printk: nobkl: Add printer thread wakeups
  2023-03-02 19:56 ` [PATCH printk v1 12/18] printk: nobkl: Add printer thread wakeups John Ogness
@ 2023-04-12  9:38   ` Petr Mladek
  0 siblings, 0 replies; 92+ messages in thread
From: Petr Mladek @ 2023-04-12  9:38 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman

On Thu 2023-03-02 21:02:12, John Ogness wrote:
> From: Thomas Gleixner <tglx@linutronix.de>
> 
> Add a function to wakeup the printer threads. Use the new function
> when:
> 
>   - records are added to the printk ringbuffer
>   - consoles are started
>   - consoles are resumed
> 
> The actual waking is performed via irq_work so that the wakeup can
> be triggered from any context.
>
> --- a/include/linux/console.h
> +++ b/include/linux/console.h
> @@ -317,6 +318,7 @@ struct cons_context_data;
>   * @thread_pbufs:	Pointer to thread private buffer
>   * @kthread:		Pointer to kernel thread
>   * @rcuwait:		RCU wait for the kernel thread
> + * @irq_work:		IRQ work for thread wakeup

I would call this irq_wakeup_work, wakeup_work, or kthread_wakeup_work.

>   * @kthread_waiting:	Indicator whether the kthread is waiting to be woken
>   * @write_atomic:	Write callback for atomic context
>   * @write_thread:	Write callback for printk threaded printing
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -3226,9 +3237,23 @@ EXPORT_SYMBOL(console_stop);
>  
>  void console_start(struct console *console)
>  {
> +	short flags;
> +
>  	console_list_lock();
>  	console_srcu_write_flags(console, console->flags | CON_ENABLED);
> +	flags = console->flags;
>  	console_list_unlock();
> +
> +	/*
> +	 * Ensure that all SRCU list walks have completed. The related
> +	 * printing context must be able to see it is enabled so that
> +	 * it is guaranteed to wake up and resume printing.
> +	 */
> +	synchronize_srcu(&console_srcu);

Either this is needed only when the console is CON_NO_BKL or
it was needed even before this patchset.

I not sure if we need it at all. It will help only for not-yet-started
SRCU walks. But they should see the change because the modification
was done under console_list_lock(). It should provide some
memory barrier against srcu_read_lock(). But maybe I do not understand
the srcu_read_lock() guarantees completely.

> +
> +	if (flags & CON_NO_BKL)
> +		cons_kthread_wake(console);
> +
>  	__pr_flush(console, 1000, true);
>  }
>  EXPORT_SYMBOL(console_start);
> --- a/kernel/printk/printk_nobkl.c
> +++ b/kernel/printk/printk_nobkl.c
> @@ -1368,6 +1368,37 @@ static int cons_kthread_func(void *__console)
>  	return 0;
>  }
>  
> +/**
> + * cons_irq_work - irq work to wake printk thread
> + * @irq_work:	The irq work to operate on
> + */
> +static void cons_irq_work(struct irq_work *irq_work)
> +{
> +	struct console *con = container_of(irq_work, struct console, irq_work);
> +
> +	cons_kthread_wake(con);
> +}
> +
> +/**
> + * cons_wake_threads - Wake up printing threads
> + *
> + * A printing thread is only woken if it is within the @kthread_waiting
> + * block. If it is not within the block (or enters the block later), it
> + * will see any new records and continue printing on its own.
> + */
> +void cons_wake_threads(void)
> +{
> +	struct console *con;
> +	int cookie;
> +
> +	cookie = console_srcu_read_lock();
> +	for_each_console_srcu(con) {
> +		if (con->kthread && atomic_read(&con->kthread_waiting))

I studied the code more. And I think that the custom con->kthread_waiting
is not need with this approach. IMHO, rcuwait_active() would do
the same job.

IMHO, this is supposed to do the same optimization as
wq_has_sleeper(&log_wait) in __wake_up_klogd().

That said, we need to add smp_wmb() before rcuwait_active().
It is already bundled in wq_has_sleeper() but not in
rcuwait_active().


> +			irq_work_queue(&con->irq_work);
> +	}
> +	console_srcu_read_unlock(cookie);

Note that this solution would require blocking and canceling
the irq_work before stopping the kthread, see
https://lore.kernel.org/r/ZC5+Hn0bOhMrVci6@alley


Alternative solution would be to have a global printk_kthread_waiting
atomic counter and move the SRCU read lock into the IRQ context.

I mean something like:

atomic_t printk_kthread_waiting = ATOMIC_INIT(0);

void cons_thread_wakeup_func(struct irq_work *irq_work)
{
	struct console *con;
	int cookie;

	cookie = console_srcu_read_lock();
	for_each_console_srcu(con) {
		/* The kthread is started later during boot. */
		if (!con->kthread)
			continue;

		/*
		 * Make sure that the record was written before we
		 * wake up the kthread so that
		 * cons_kthread_should_wakeup() will see it.
		 *
		 * It pairs with the implicit barrier in
		 * rcuwait_wait_event().
		smp_mb();
		if (!rcuwait_active(&con->rcuwait))
			continue;

		cons_kthread_wake(con);
	}
}

void cons_wake_threads(void)
{
	/*
	 * Make sure that the record is stored before checking
	 * printk_thread_waiting. So that the kthread will
	 * either see it when checking cons_kthread_should_wakeup().
	 * Or the check below will see the printk_thread_waiting
	 * counter incremented.
	 *
	 * The corresponding barrier is in cons_kthread_func()
	 */
	smp_mb();
	if (atomic_read(&printk_thread_waiting))
		irq_work_queue(cons_thread_wakeup_work);
}

and finally:

static int cons_kthread_func(void *__console)
{
[...]
	for (;;) {
		atomic_inc(&printk_thread_waiting);

		/*
		 * Synchronize against cons_wake_threads().
		 *
		 * Make sure that either cons_wake_threads() will see
		 * that we are going to wait. Or we will see the new
		 * record that was stored before cons_wake_threads()
		 * was called.
		 */
		smp_mb();

		/*
		 * Provides a full memory barrier against rcuwait_active()
		 * check in cons_thread_wakeup_func().
		 */
		ret = rcuwait_wait_event(&con->rcuwait,
					 cons_kthread_should_wakeup(con, ctxt),
					 TASK_INTERRUPTIBLE);

		atomic_dec(&printk_kthread_waiting);
[...]
}

Note that printk_kthread_waiting counter is need in this case because
we do not have a global wait queue. And we could not have a global one
because rcuwait provides supports only a single task. White the
classic waitqueue supports more tasks via struct wait_queue_head.

Difference between the two solutions:

    + Original solution should not need con->kthread_waiting
      in the end. But we will only need to make sure that the irq_work
      can't and is not longer be scheduled when stopping
      the kthread.

    + The alternative solution is is easier against removing
      a console because the srcu list is walked in the irq_work.
      But it would require the global printk_kthread_waiting
      counter because rcuwait supports only one task and we
      need to check if any task is waiting.

The advantage of the alternative solution might be
that srcu_read_lock() would be needed only when there is
any waiting kthread. I am not sure how important it is
to reduce the number of srcu read locked contexts.

I do not really have any preferences.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH printk v1 14/18] printk: nobkl: Provide functions for atomic write enforcement
  2023-03-02 19:56 ` [PATCH printk v1 14/18] printk: nobkl: Provide functions for atomic write enforcement John Ogness
@ 2023-04-12 14:53   ` Petr Mladek
  0 siblings, 0 replies; 92+ messages in thread
From: Petr Mladek @ 2023-04-12 14:53 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman

On Thu 2023-03-02 21:02:14, John Ogness wrote:
> From: Thomas Gleixner <tglx@linutronix.de>
> 
> Threaded printk is the preferred mechanism to tame the noisyness of
> printk, but WARN/OOPS/PANIC require printing out immediately since
> the printer threads might not be able to run.
> 
> Add per CPU state to denote the priority/urgency of the output and
> provide functions to flush the printk backlog for priority elevated
> contexts and when the printing threads are not available (such as
> early boot).
> 
> Note that when a CPU is in a priority elevated state, flushing only
> occurs when dropping back to a lower priority. This allows the full
> set of printk records (WARN/OOPS/PANIC output) to be stored in the
> ringbuffer before beginning to flush the backlog.
> 
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -3943,6 +3954,12 @@ void defer_console_output(void)
>  
>  void printk_trigger_flush(void)
>  {
> +	struct cons_write_context wctxt = { };

This is weird. IMHO, this structure should be console-specific.
It should be defined on the cons_atomic_flush_con() level.

It should be doable. We do not initialize it here anyway.

Maybe it was used to pass some common information, like
prio or skip_unsafe. But it would be a messy design.
It would be hard to follow what needs to get re-initialized
and what reused on different levels of the API.

> +	preempt_disable();
> +	cons_atomic_flush(&wctxt, true);
> +	preempt_enable();
> +
>  	cons_wake_threads();
>  	defer_console_output();
>  }
> --- a/kernel/printk/printk_nobkl.c
> +++ b/kernel/printk/printk_nobkl.c
> @@ -1399,6 +1399,246 @@ void cons_wake_threads(void)
>  	console_srcu_read_unlock(cookie);
>  }
>  
> +/**
> + * struct cons_cpu_state - Per CPU printk context state
> + * @prio:	The current context priority level
> + * @nesting:	Per priority nest counter
> + */
> +struct cons_cpu_state {
> +	enum cons_prio	prio;
> +	int		nesting[CONS_PRIO_MAX];
> +};
> +
> +static DEFINE_PER_CPU(struct cons_cpu_state, cons_pcpu_state);
> +static struct cons_cpu_state early_cons_pcpu_state __initdata;
> +
> +/**
> + * cons_get_cpu_state - Get the per CPU console state pointer
> + *
> + * Returns either a pointer to the per CPU state of the current CPU or to
> + * the init data state during early boot.
> + */
> +static __ref struct cons_cpu_state *cons_get_cpu_state(void)
> +{
> +	if (!printk_percpu_data_ready())
> +		return &early_cons_pcpu_state;
> +
> +	return this_cpu_ptr(&cons_pcpu_state);
> +}
> +
> +/**
> + * cons_get_wctxt - Get the write context for atomic printing
> + * @con:	Console to operate on
> + * @prio:	Priority of the context
> + *
> + * Returns either the per CPU context or the builtin context for
> + * early boot.
> + */
> +static __ref struct cons_write_context *cons_get_wctxt(struct console *con,
> +						       enum cons_prio prio)
> +{
> +	if (!con->pcpu_data)
> +		return &early_cons_ctxt_data.wctxt[prio];
> +
> +	return &this_cpu_ptr(con->pcpu_data)->wctxt[prio];
> +}
> +
> +/**
> + * cons_atomic_try_acquire - Try to acquire the console for atomic printing
> + * @con:	The console to acquire
> + * @ctxt:	The console context instance to work on
> + * @prio:	The priority of the current context
> + */
> +static bool cons_atomic_try_acquire(struct console *con, struct cons_context *ctxt,
> +				    enum cons_prio prio, bool skip_unsafe)
> +{
> +	memset(ctxt, 0, sizeof(*ctxt));
> +	ctxt->console		= con;
> +	ctxt->spinwait_max_us	= 2000;
> +	ctxt->prio		= prio;
> +	ctxt->spinwait		= 1;
> +
> +	/* Try to acquire it directly or via a friendly handover */
> +	if (cons_try_acquire(ctxt))
> +		return true;

Do we really need another layer over cons_try_acquire()?

I would expect that all this is handled inside cons_try_acquire()
and it should be enough to call it once. It always should try to get the lock
a gentle way and it should automatically try the hostile takeover
when ctxt->allow_hostile == 1 or ctxt->skip_unsafe == 0.

I do not know. Maybe this was a way how to split the logic into
more functions. But this looks way too complicated. cons_try_acquire()
already includes the logic how to do the hostile take-over. And
it already gets information whether it can use it via ctxt->hostile
variable. But I might miss something.

> +	/* Investigate whether a hostile takeover is due */
> +	if (ctxt->old_state.cur_prio >= prio)
> +		return false;
> +
> +	if (!ctxt->old_state.unsafe || !skip_unsafe)
> +		ctxt->hostile = 1;
> +	return cons_try_acquire(ctxt);
> +}
> +
> +/**
> + * cons_atomic_flush_con - Flush one console in atomic mode
> + * @wctxt:		The write context struct to use for this context
> + * @con:		The console to flush
> + * @prio:		The priority of the current context
> + * @skip_unsafe:	True, to avoid unsafe hostile takeovers
> + */
> +static void cons_atomic_flush_con(struct cons_write_context *wctxt, struct console *con,
> +				  enum cons_prio prio, bool skip_unsafe)
> +{
> +	struct cons_context *ctxt = &ACCESS_PRIVATE(wctxt, ctxt);
> +	bool wake_thread = false;
> +	short flags;
> +
> +	if (!cons_atomic_try_acquire(con, ctxt, prio, skip_unsafe))
> +		return;
> +
> +	do {
> +		flags = console_srcu_read_flags(con);
> +
> +		if (!console_is_usable(con, flags))
> +			break;
> +
> +		/*
> +		 * For normal prio messages let the printer thread handle
> +		 * the printing if it is available.
> +		 */
> +		if (prio <= CONS_PRIO_NORMAL && con->kthread) {
> +			wake_thread = true;
> +			break;
> +		}
> +
> +		/*
> +		 * cons_emit_record() returns false when the console was
> +		 * handed over or taken over. In both cases the context is
> +		 * no longer valid.
> +		 */
> +		if (!cons_emit_record(wctxt))
> +			return;
> +	} while (ctxt->backlog);
> +
> +	cons_release(ctxt);
> +
> +	if (wake_thread && atomic_read(&con->kthread_waiting))
> +		irq_work_queue(&con->irq_work);
> +}
> +
> +/**
> + * cons_atomic_flush - Flush consoles in atomic mode if required
> + * @printk_caller_wctxt:	The write context struct to use for this
> + *				context (for printk() context only)
> + * @skip_unsafe:		True, to avoid unsafe hostile takeovers
> + */
> +void cons_atomic_flush(struct cons_write_context *printk_caller_wctxt, bool skip_unsafe)
> +{
> +	struct cons_write_context *wctxt;
> +	struct cons_cpu_state *cpu_state;
> +	struct console *con;
> +	short flags;
> +	int cookie;
> +
> +	cpu_state = cons_get_cpu_state();
> +
> +	/*
> +	 * When in an elevated priority, the printk() calls are not
> +	 * individually flushed. This is to allow the full output to
> +	 * be dumped to the ringbuffer before starting with printing
> +	 * the backlog.
> +	 */
> +	if (cpu_state->prio > CONS_PRIO_NORMAL && printk_caller_wctxt)
> +		return;
> +
> +	/*
> +	 * Let the outermost write of this priority print. This avoids
> +	 * nasty hackery for nested WARN() where the printing itself
> +	 * generates one.
> +	 *
> +	 * cpu_state->prio <= CONS_PRIO_NORMAL is not subject to nesting
> +	 * and can proceed in order to allow atomic printing when consoles
> +	 * do not have a printer thread.
> +	 */
> +	if (cpu_state->prio > CONS_PRIO_NORMAL &&
> +	    cpu_state->nesting[cpu_state->prio] != 1)
> +		return;
> +
> +	cookie = console_srcu_read_lock();
> +	for_each_console_srcu(con) {
> +		if (!con->write_atomic)
> +			continue;
> +
> +		flags = console_srcu_read_flags(con);
> +
> +		if (!console_is_usable(con, flags))
> +			continue;
> +
> +		if (cpu_state->prio > CONS_PRIO_NORMAL || !con->kthread) {
> +			if (printk_caller_wctxt)
> +				wctxt = printk_caller_wctxt;
> +			else
> +				wctxt = cons_get_wctxt(con, cpu_state->prio);
> +			cons_atomic_flush_con(wctxt, con, cpu_state->prio, skip_unsafe);
> +		}
> +	}
> +	console_srcu_read_unlock(cookie);
> +}
> +
> +/**
> + * cons_atomic_enter - Enter a context that enforces atomic printing
> + * @prio:	Priority of the context
> + *
> + * Returns:	The previous priority that needs to be fed into
> + *		the corresponding cons_atomic_exit()
> + */
> +enum cons_prio cons_atomic_enter(enum cons_prio prio)
> +{
> +	struct cons_cpu_state *cpu_state;
> +	enum cons_prio prev_prio;
> +
> +	migrate_disable();
> +	cpu_state = cons_get_cpu_state();
> +
> +	prev_prio = cpu_state->prio;
> +	if (prev_prio < prio)
> +		cpu_state->prio = prio;
> +
> +	/*
> +	 * Increment the nesting on @cpu_state->prio so a WARN()
> +	 * nested into a panic printout does not attempt to
> +	 * scribble state.
> +	 */
> +	cpu_state->nesting[cpu_state->prio]++;
> +
> +	return prev_prio;
> +}
> +
> +/**
> + * cons_atomic_exit - Exit a context that enforces atomic printing
> + * @prio:	Priority of the context to leave
> + * @prev_prio:	Priority of the previous context for restore
> + *
> + * @prev_prio is the priority returned by the corresponding cons_atomic_enter().
> + */
> +void cons_atomic_exit(enum cons_prio prio, enum cons_prio prev_prio)
> +{
> +	struct cons_cpu_state *cpu_state;
> +
> +	cons_atomic_flush(NULL, true);
> +
> +	cpu_state = cons_get_cpu_state();
> +
> +	if (cpu_state->prio == CONS_PRIO_PANIC)
> +		cons_atomic_flush(NULL, false);
> +
> +	/*
> +	 * Undo the nesting of cons_atomic_enter() at the CPU state
> +	 * priority.
> +	 */
> +	cpu_state->nesting[cpu_state->prio]--;
> +
> +	/*
> +	 * Restore the previous priority, which was returned by
> +	 * cons_atomic_enter().
> +	 */
> +	cpu_state->prio = prev_prio;
> +
> +	migrate_enable();
> +}
> +

All this is pretty complicated. It seems that it duplicates a lot
of information and checks that are already done in cons_try_acquire().
I wonder if it is another relic of the POC that allowed taking
over a context of the same priority.

In this version, most of the nesting/recursion problems are already
handled by cons_try_acquire(). All nested/recursion context will
never be able to get the lock when the priority stays the same.

What about the following?

It is inspired by printk_safe handling:

#define PRINTK_EMERGENCY_CTXT_MASK	0x0000ffff
#define PRINTK_PANIC_CTXT_MASK		0xffff0000
#define PRINTK_PANIC_CTXT_OFFSET	0x00010000

DEFINE_PER_CPU(int, printk_ctxt_prio);

void printk_emergency_enter(void)
{
	this_cpu_inc(printk_ctxt_prio);
}

void printk_emergency_enter(void)
{
	this_cpu_dec(printk_ctxt_prio);
}

void printk_panic_enter(void)
{
	this_cpu_add(printk_ctxt_prio, PRINTK_PANIC_CTXT_OFFSET);
}

void printk_panic_exit(void)
{
	this_cpu_sub(printk_ctxt_prio, PRINTK_PANIC_CTXT_OFFSET);
}

enum cons_prio printk_prio_to_cons_prio(void)
{
	int pr_ctxt_prio = this_cpu_read(printk_ctxt_prio);

	if (pr_ctxt_prio & PRINTK_PANIC_CTXT_MASK)
		return CONS_PRIO_PANIC;

	if (pr_ctxt_prio & PRINTK_EMERGENCY_CTXT_MASK)
		return CONS_PRIO_EMERGENCY;

	return CONS_PRIO_NORMAL;
}

void cons_context_init(struct cons_write_context *wctxt,
		       struct console *con,
		       enum cons_prio prio,
		       bool allow_hostile)
{
	struct cons_context *ctxt = &ACCESS_PRIVATE(wctxt, ctxt);

	memset(wctxt, 0, sizeof(*wctxt));

	ctxt->prio = prio;
	ctxt->console = con;
	/*
	 * FIXME: This was discussed in another mail. I think that
	 * everything should be fine when every console driver has
	 * its owns buffer for each console constext priority.
	 */
	ctxt->pbufs = cons_get_ctxt_pbufs(con, prio);
	ctxt->allow_hostile = allow_hostile;

	if (prio >= CON_EMERGENCY_PRIO) {
		ctxt->spinwait_max_us	= 2000;
		ctxt->spinwait		= 1;
	}
}

/*
 * Try to flush messages on NOBKL consoles using atomic_write() callback.
 * The console context priority is defined by printk_ctxt_prio.
 *
 * It fails when it is not able to get the console atomic lock.
 * In that case the messages should be flushed by the current owner.
 *
 * It does nothing in NORMAL context when the printk thread already exists.
 * The kthread should take care of the job.
 */
void cons_try_atomic_flush_con(struct console *con, bool allow_hostile)
{
	struct cons_write_context wctxt;
	struct cons_context *ctxt = &wctxt.ctxt;
	enum cons_prio prio;

	prio = printk_prio_to_cons_prio();

	if (con->kthread && prio <= CONS_NORMAL_PRIO)
		return;

	cons_write_context_init(&wctxt, con, prio, allow_hostile);

	if (!cons_try_acquire_wtxtx(&wctxt))
		return;

	for (;;) {
		/*
		 * emit messages as long as there is any
		 * and still owning the lock
		 */
	};

	cons_release_wtxtx(&wctxt);
}

void cons_try_atomic_flush(bool allow_hostile)
{
	struct console *con;
	int cookie;

	cookie = console_srcu_read_lock();
	for_each_console_srcu(con) {
		cons_try_atomic_flush_con(con, allow_hostile);
	}
	console_srcu_read_unlock(cookie);
}

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH printk v1 15/18] printk: nobkl: Stop threads on shutdown/reboot
  2023-03-02 19:56 ` [PATCH printk v1 15/18] printk: nobkl: Stop threads on shutdown/reboot John Ogness
@ 2023-04-13  9:03   ` Petr Mladek
  0 siblings, 0 replies; 92+ messages in thread
From: Petr Mladek @ 2023-04-13  9:03 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel

On Thu 2023-03-02 21:02:15, John Ogness wrote:
> Register a syscore_ops shutdown function to stop all threaded
> printers on shutdown/reboot. This allows printk to transition back
> to atomic printing in order to provide a robust mechanism for
> outputting the final messages.
>
> --- a/kernel/printk/printk_nobkl.c
> +++ b/kernel/printk/printk_nobkl.c
> @@ -1763,3 +1764,33 @@ void cons_nobkl_cleanup(struct console *con)
>  	cons_state_set(con, CON_STATE_REQ, &state);
>  	cons_free_percpu_data(con);
>  }
> +
> +/**
> + * printk_kthread_shutdown - shutdown all threaded printers
> + *
> + * On system shutdown all threaded printers are stopped. This allows printk
> + * to transition back to atomic printing, thus providing a robust mechanism
> + * for the final shutdown/reboot messages to be output.
> + */
> +static void printk_kthread_shutdown(void)
> +{
> +	struct console *con;
> +
> +	console_list_lock();
> +	for_each_console(con) {
> +		if (con->flags & CON_NO_BKL)
> +			cons_kthread_stop(con);
> +	}
> +	console_list_unlock();

It would make sense to explicitly flush the consoles after stopping
the kthreads. There might be pending messages...

> +}
> +
> +static struct syscore_ops printk_syscore_ops = {
> +	.shutdown = printk_kthread_shutdown,
> +};
> +
> +static int __init printk_init_ops(void)
> +{
> +	register_syscore_ops(&printk_syscore_ops);
> +	return 0;
> +}
> +device_initcall(printk_init_ops);

Otherwise it looks good.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH printk v1 16/18] kernel/panic: Add atomic write enforcement to warn/panic
  2023-03-02 19:56 ` [PATCH printk v1 16/18] kernel/panic: Add atomic write enforcement to warn/panic John Ogness
@ 2023-04-13 10:08   ` Petr Mladek
  2023-04-13 12:13     ` John Ogness
  0 siblings, 1 reply; 92+ messages in thread
From: Petr Mladek @ 2023-04-13 10:08 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Andrew Morton, Guilherme G. Piccoli,
	Luis Chamberlain, David Gow, Tiezhu Yang, Daniel Vetter,
	tangmeng

On Thu 2023-03-02 21:02:16, John Ogness wrote:
> From: Thomas Gleixner <tglx@linutronix.de>
> 
> Invoke the atomic write enforcement functions for warn/panic to
> ensure that the information gets out to the consoles.
> 
> For the panic case, add explicit intermediate atomic flush calls to
> ensure immediate flushing at important points. Otherwise the atomic
> flushing only occurs when dropping out of the elevated priority,
> which for panic may never happen.
> 
> It is important to note that if there are any legacy consoles
> registered, they will be attempting to directly print from the
> printk-caller context, which may jeopardize the reliability of the
> atomic consoles. Optimally there should be no legacy consoles
> registered.
> 
> --- a/kernel/panic.c
> +++ b/kernel/panic.c
> @@ -329,6 +332,8 @@ void panic(const char *fmt, ...)
>  	if (_crash_kexec_post_notifiers)
>  		__crash_kexec(NULL);
>  
> +	cons_atomic_flush(NULL, true);

Do we need to explicitly flush the messages here?

cons_atomic_flush() is called also from vprintk_emit(). And there are
many messages printed with the PANIC priority above.

This makes an assumption that either printk() in PANIC context
does not try to show the messages immediately or that this
explicit console_atomic_flush() tries harder. I think
that both assumptions are wrong.

> +
>  	console_unblank();
>  
>  	/*
> @@ -353,6 +358,7 @@ void panic(const char *fmt, ...)
>  		 * We can't use the "normal" timers since we just panicked.
>  		 */
>  		pr_emerg("Rebooting in %d seconds..\n", panic_timeout);
> +		cons_atomic_flush(NULL, true);

Same here.
>  
>  		for (i = 0; i < panic_timeout * 1000; i += PANIC_TIMER_STEP) {
>  			touch_nmi_watchdog();
> @@ -371,6 +377,7 @@ void panic(const char *fmt, ...)
>  		 */
>  		if (panic_reboot_mode != REBOOT_UNDEFINED)
>  			reboot_mode = panic_reboot_mode;
> +		cons_atomic_flush(NULL, true);

And here.

>  		emergency_restart();
>  	}
>  #ifdef __sparc__
> @@ -383,12 +390,16 @@ void panic(const char *fmt, ...)
>  	}
>  #endif
>  #if defined(CONFIG_S390)
> +	cons_atomic_flush(NULL, true);

And here.

>  	disabled_wait();
>  #endif
>  	pr_emerg("---[ end Kernel panic - not syncing: %s ]---\n", buf);
>  
>  	/* Do not scroll important messages printed above */
>  	suppress_printk = 1;
> +
> +	cons_atomic_exit(CONS_PRIO_PANIC, prev_prio);

On the contrary, I would explicitly call cons_atomic_flush(NULL, false)
here instead of hiding it in cons_atomic_exit().

It would make it clear that this is the POINT where panic() tries
harder to get the messages out on NOBKL consoles.

> +
>  	local_irq_enable();
>  	for (i = 0; ; i += PANIC_TIMER_STEP) {
>  		touch_softlockup_watchdog();
> @@ -599,6 +610,10 @@ struct warn_args {
>  void __warn(const char *file, int line, void *caller, unsigned taint,
>  	    struct pt_regs *regs, struct warn_args *args)
>  {
> +	enum cons_prio prev_prio;
> +
> +	prev_prio = cons_atomic_enter(CONS_PRIO_EMERGENCY);
> +
>  	disable_trace_on_warning();
>  
>  	if (file)
> @@ -630,6 +645,8 @@ void __warn(const char *file, int line, void *caller, unsigned taint,
>  
>  	/* Just a warning, don't kill lockdep. */
>  	add_taint(taint, LOCKDEP_STILL_OK);
> +
> +	cons_atomic_exit(CONS_PRIO_EMERGENCY, prev_prio);
>  }

I would more this into separate patch and keep this one only for the
PANIC handling.

Also I think that we want to set the EMERGENCY prio also in oops_enter()?

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH printk v1 17/18] rcu: Add atomic write enforcement for rcu stalls
  2023-03-02 19:56 ` [PATCH printk v1 17/18] rcu: Add atomic write enforcement for rcu stalls John Ogness
@ 2023-04-13 12:10   ` Petr Mladek
  0 siblings, 0 replies; 92+ messages in thread
From: Petr Mladek @ 2023-04-13 12:10 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Paul E. McKenney, Frederic Weisbecker,
	Neeraj Upadhyay, Josh Triplett, Mathieu Desnoyers, Lai Jiangshan,
	Joel Fernandes, rcu

On Thu 2023-03-02 21:02:17, John Ogness wrote:
> Invoke the atomic write enforcement functions for rcu stalls to
> ensure that the information gets out to the consoles.

"ensure" is too strong. It is still just the best effort. It might
fail when the current console user does not pass the lock.

I would say that it will increase the chance to see the messages
on NOBKL consoles by printing the messages directly instead
of waiting for the printk thread.

> It is important to note that if there are any legacy consoles
> registered, they will be attempting to directly print from the
> printk-caller context, which may jeopardize the reliability of the
> atomic consoles. Optimally there should be no legacy consoles
> registered.

The above paragraph is a bit vague. It is not clear how exactly the
legacy consoles affect the reliability,

Does it mean that they might cause a deadlock because they are not
atomic? But there is nothing specific about rcu stalls and priority
of NOBKL consoles. This is a generic problem with legacy consoles.


> --- a/kernel/rcu/tree_stall.h
> +++ b/kernel/rcu/tree_stall.h
> @@ -566,6 +568,8 @@ static void print_other_cpu_stall(unsigned long gp_seq, unsigned long gps)
>  	if (rcu_stall_is_suppressed())
>  		return;
>  
> +	prev_prio = cons_atomic_enter(CONS_PRIO_EMERGENCY);

Thinking loudly: This would set the EMERGENCY priority on this
CPU. But the following function:

  + rcu_dump_cpu_stacks()
    + dump_cpu_task()
      + trigger_single_cpu_backtrace()

might send IPI and the backtrace will be printed from another CPU.
As a result that backtraces won't be printed with EMERGENCY priority.

One solution would be to have also global EMERGENCY priority.

Another possibility would be to use EMERGENCY priority also
in nmi_cpu_backtrace() which is the callback called by the IPI.

I would probably go for the global flag. printk() called in EMERGENCY
priority has to flush also messages added by other CPUs. So that
messages added by other CPUs are printed "directly" anyway.

Also setting the EMERGENCY priority in  nmi_cpu_backtrace() is an
ad-hoc solution. The backtrace is usually called as part of another
global emergency report.

> +
>  	/*
>  	 * OK, time to rat on our buddy...
>  	 * See Documentation/RCU/stallwarn.rst for info on how to debug

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH printk v1 16/18] kernel/panic: Add atomic write enforcement to warn/panic
  2023-04-13 10:08   ` Petr Mladek
@ 2023-04-13 12:13     ` John Ogness
  2023-04-14 10:10       ` Petr Mladek
  0 siblings, 1 reply; 92+ messages in thread
From: John Ogness @ 2023-04-13 12:13 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Andrew Morton, Guilherme G. Piccoli,
	Luis Chamberlain, David Gow, Tiezhu Yang, Daniel Vetter,
	tangmeng

On 2023-04-13, Petr Mladek <pmladek@suse.com> wrote:
>> --- a/kernel/panic.c
>> +++ b/kernel/panic.c
>> @@ -329,6 +332,8 @@ void panic(const char *fmt, ...)
>>  	if (_crash_kexec_post_notifiers)
>>  		__crash_kexec(NULL);
>>  
>> +	cons_atomic_flush(NULL, true);
>
> Do we need to explicitly flush the messages here?

This is where the atomic printing actually starts (after the full dump
has been inserted into the ringbuffer).

> cons_atomic_flush() is called also from vprintk_emit(). And there are
> many messages printed with the PANIC priority above.

vprintk_emit() does not print in this case. From cons_atomic_flush():

        /*
         * When in an elevated priority, the printk() calls are not
         * individually flushed. This is to allow the full output to
         * be dumped to the ringbuffer before starting with printing
         * the backlog.
         */
        if (cpu_state->prio > NBCON_PRIO_NORMAL && printk_caller_wctxt)
                return;

> This makes an assumption that either printk() in PANIC context
> does not try to show the messages immediately or that this
> explicit console_atomic_flush() tries harder. I think
> that both assumptions are wrong.

Both assumptions are correct, because until this point there has been no
effort to print.

>> @@ -353,6 +358,7 @@ void panic(const char *fmt, ...)
>>  		 * We can't use the "normal" timers since we just panicked.
>>  		 */
>>  		pr_emerg("Rebooting in %d seconds..\n", panic_timeout);
>> +		cons_atomic_flush(NULL, true);
>
> Same here.

This flush is just to make sure the rebooting message is
output. For nbcon consoles printk() calls are never synchronous except
for during early boot (before kthreads are ready).

The same goes for the other cons_atomic_flush() calls in this function.

>>  	disabled_wait();
>>  #endif
>>  	pr_emerg("---[ end Kernel panic - not syncing: %s ]---\n", buf);
>>  
>>  	/* Do not scroll important messages printed above */
>>  	suppress_printk = 1;
>> +
>> +	cons_atomic_exit(CONS_PRIO_PANIC, prev_prio);
>
> On the contrary, I would explicitly call cons_atomic_flush(NULL, false)
> here instead of hiding it in cons_atomic_exit().

It is not hiding there. That is the semantic. After entering an atomic
block all printk's are only writing to the ringbuffer. On exiting the
atomic block the ringbuffer is flushed via atomic printing.

Exiting CONS_PRIO_PANIC has a special condition that it first tries to
safely flush all consoles, then will try the unsafe variant for consoles
that were not flushed.

> Also I think that we want to set the EMERGENCY prio also in
> oops_enter()?

Agreed.

John

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH printk v1 18/18] printk: Perform atomic flush in console_flush_on_panic()
  2023-03-02 19:56 ` [PATCH printk v1 18/18] printk: Perform atomic flush in console_flush_on_panic() John Ogness
@ 2023-04-13 12:20   ` Petr Mladek
  0 siblings, 0 replies; 92+ messages in thread
From: Petr Mladek @ 2023-04-13 12:20 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel

On Thu 2023-03-02 21:02:18, John Ogness wrote:
> Typically the panic() function will take care of atomic flushing the
> non-BKL consoles on panic. However, there are several users of
> console_flush_on_panic() outside of panic().
> 
> Also perform atomic flushing in console_flush_on_panic(). A new
> function cons_force_seq() is implemented to support the
> mode=CONSOLE_REPLAY_ALL feature.
> 
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -3160,6 +3160,28 @@ void console_unblank(void)
>   */
>  void console_flush_on_panic(enum con_flush_mode mode)
>  {
> +	struct console *c;
> +	short flags;
> +	int cookie;
> +	u64 seq;
> +
> +	seq = prb_first_valid_seq(prb);
> +
> +	/*
> +	 * Safely flush the atomic consoles before trying to flush any
> +	 * BKL/legacy consoles.
> +	 */
> +	if (mode == CONSOLE_REPLAY_ALL) {
> +		cookie = console_srcu_read_lock();
> +		for_each_console_srcu(c) {
> +			flags = console_srcu_read_flags(c);
> +			if (flags & CON_NO_BKL)
> +				cons_force_seq(c, seq);
> +		}
> +		console_srcu_read_unlock(cookie);
> +	}
> +	cons_atomic_flush(NULL, true);
> +
>  	if (!have_bkl_console)
>  		return;
>  
> @@ -3174,12 +3196,6 @@ void console_flush_on_panic(enum con_flush_mode mode)
>  	console_may_schedule = 0;
>  
>  	if (mode == CONSOLE_REPLAY_ALL) {
> -		struct console *c;
> -		int cookie;
> -		u64 seq;
> -
> -		seq = prb_first_valid_seq(prb);
> -
>  		cookie = console_srcu_read_lock();
>  		for_each_console_srcu(c) {
>  			/*

The code in this for_each_console_srcu(c) will reset r->seq
for all consoles, including NO_BLK ones. It should do it
only for the legacy consoles.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 92+ messages in thread

* (k)thread: was: Re: [PATCH printk v1 11/18] printk: nobkl: Introduce printer threads
  2023-03-02 19:56 ` [PATCH printk v1 11/18] printk: nobkl: Introduce printer threads John Ogness
                     ` (4 preceding siblings ...)
  2023-04-06 13:19   ` misc: " Petr Mladek
@ 2023-04-13 13:28   ` Petr Mladek
  5 siblings, 0 replies; 92+ messages in thread
From: Petr Mladek @ 2023-04-13 13:28 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman

On Thu 2023-03-02 21:02:11, John Ogness wrote:
> From: Thomas Gleixner <tglx@linutronix.de>
> 
> Add the infrastructure to create a printer thread per console along
> with the required thread function, which is takeover/handover aware.
> 

Nit:

Several variable and function name use "thread"

  + con.thread_pbufs
  + con.write_thread()
  + printk_threads_enabled

and many others use "kthread":

  + con.kthread
  + con.kthread_waiting
  + cons_kthread_wake()
  + cons_kthread_create()
  + cons_kthread_should_wakeup()

I do not see any pattern. It would be nice to choose just one variant
for the cons/printk API. I slightly prefer "kthread" but "thread"
would be fine as well.

When we are on the (k)thread naming stuff. We talk about it
historically as a printk (k)thread. But I often feels that it
rather should be a console (k)thread, especially when we have
one per console.

That said, both variants make sense. The thread is used for showing
messages from the printk ring buffer.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH printk v1 02/18] printk: Add NMI check to down_trylock_console_sem()
  2023-03-17 11:37     ` John Ogness
@ 2023-04-13 13:42       ` Petr Mladek
  0 siblings, 0 replies; 92+ messages in thread
From: Petr Mladek @ 2023-04-13 13:42 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel

On Fri 2023-03-17 12:43:56, John Ogness wrote:
> On 2023-03-07, Petr Mladek <pmladek@suse.com> wrote:
> > So that this change would cause a non-paired console_unlock().
> > And console_unlock might still deadlock on the console_sem->lock.
> 
> Yes, but at least it would have flushed beforehand.
> 
> > One solution would be to call console_flush_all() directly in
> > console_flush_on_panic() without taking console_lock().
> >
> > It should not be worse than the current code which ignores
> > the console_trylock() return value.
> 
> I think your suggestion is acceptable.
> 
> > Note that it mostly works because console_flush_on_panic() is called
> > when other CPUs are supposed to be stopped.
> >
> > We only would need to prevent other CPUs from flushing messages
> > as well if they were still running by chance. But we actually already
> > do this, see abandon_console_lock_in_panic(). Well, we should
> > make sure that the abandon_console_lock_in_panic() check is
> > done before flushing the first message.
> >
> > All these changes together would prevent deadlock on
> > console_sem->lock.  But the synchronization "guarantees" should stay
> > the same.
> 
> We could also update console_trylock() and console_lock() to fail and
> infinitely sleep, respectively, when abandon_console_lock_in_panic() is
> true. That would prevent CPUs from newly acquiring the console lock and
> interfering with the panic CPU.

Interesting idea. It should be safe after panic() tries to
stop the CPUs. But I am slightly worried to do this earlier.

I wonder if it might block, for example, trigger_all_cpu_backtrace()
that is called when (panic_print & PANIC_PRINT_ALL_CPU_BT) bit is set.

Best Regards.
Petr

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH printk v1 03/18] printk: Consolidate console deferred printing
  2023-03-17 13:05     ` John Ogness
@ 2023-04-13 15:15       ` Petr Mladek
  0 siblings, 0 replies; 92+ messages in thread
From: Petr Mladek @ 2023-04-13 15:15 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel

On Fri 2023-03-17 14:11:01, John Ogness wrote:
> On 2023-03-08, Petr Mladek <pmladek@suse.com> wrote:
> >> --- a/kernel/printk/printk.c
> >> +++ b/kernel/printk/printk.c
> >> @@ -2321,7 +2321,10 @@ asmlinkage int vprintk_emit(int facility, int level,
> >>  		preempt_enable();
> >>  	}
> >>  
> >> -	wake_up_klogd();
> >> +	if (in_sched)
> >> +		defer_console_output();
> >> +	else
> >> +		wake_up_klogd();
> >
> > Nit: I would add an empty line here. Or I would move this up into the
> >      previous if (in_sched()) condition.
> 
> Empty line is ok. I do not want to move it up because the above
> condition gets more complicated later. IMHO a simple if/else for
> specifying what the irq_work should do is the most straight forward
> here.
> 
> >> @@ -3811,11 +3814,30 @@ static void __wake_up_klogd(int val)
> >>  	preempt_enable();
> >>  }
> >>  
> >> +/**
> >> + * wake_up_klogd - Wake kernel logging daemon
> >> + *
> >> + * Use this function when new records have been added to the ringbuffer
> >> + * and the console printing for those records is handled elsewhere. In
> >
> > "elsewhere" is ambiguous. I would write:
> >
> > "and the console printing for those records maybe handled in this context".
> 
> The reason for using the word "elsewhere" is because in later patches it
> is also the printing threads that can handle it. I can change it to
> "this context" for this patch, but then after adding threads I will need
> to adjust the comment again. How about:
> 
> "and the console printing for those records should not be handled by the
> irq_work context because another context will handle it."

It is better but still a bit hard to follow. As a reader, I see three
contexts:

   + context that calls wake_up_klogd()
   + irq_work context
   + another context that would handle the console printing

The confusing part is that the "another context". It might be the
context calling wake_up_klogd(). If feels like scratching
right ear by left hand ;-)

In fact, also the next sentence "In this case only the logging
daemon needs to be woken." is misleading. Also the printk
kthreads need to be woken but it is done by another function.

OK, what about?

 * wake_up_klogd - Wake kernel logging daemon
 *
 * Use this function when new records have been added to the ringbuffer
 * and the console printing does not have to be deferred to irq_work
 * context. This function will only wake the logging daemons.


Heh, the "wake_up_klogd_work" has became confusing since it started
handling deferred console output. And it is even more confusing now
when it does not handle the kthreads which are yet another deferred
output. But I can't think of any reasonable solution at the moment.

Maybe, we should somehow distinguish the API that will handle only
the legacy consoles. For example, suspend_console() handles both
but console_flush_all() will handle only the legacy ones.

I think that we are going to use nbcon_ prefix for the API
handling the new consoles. Maybe we could use another prefix
for the legacy-consoles-specific API.

Hmm, what about?

    + "bcon_" like the opposite of "nbcon_" but it might be
      confused with boot console

    + "lcon_" like "legacy" or "locked" consoles

    + "scon" like synchronized or serialized consoles.


Honestly, I am not sure how much important this is. But it might
be pretty helpful for anyone who would try to understand the code
in the future. And this rework might be really challenging for
future archaeologists. Not to say, that legacy consoles will
likely stay with us many years, maybe decades.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH printk v1 04/18] printk: Add per-console suspended state
  2023-03-17 13:22     ` John Ogness
@ 2023-04-14  9:56       ` Petr Mladek
  0 siblings, 0 replies; 92+ messages in thread
From: Petr Mladek @ 2023-04-14  9:56 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman

On Fri 2023-03-17 14:28:04, John Ogness wrote:
> On 2023-03-08, Petr Mladek <pmladek@suse.com> wrote:
> >> --- a/include/linux/console.h
> >> +++ b/include/linux/console.h
> >> @@ -153,6 +153,8 @@ static inline int con_debug_leave(void)
> >>   *			receiving the printk spam for obvious reasons.
> >>   * @CON_EXTENDED:	The console supports the extended output format of
> >>   *			/dev/kmesg which requires a larger output buffer.
> >> + * @CON_SUSPENDED:	Indicates if a console is suspended. If true, the
> >> + *			printing callbacks must not be called.
> >>   */
> >>  enum cons_flags {
> >>  	CON_PRINTBUFFER		= BIT(0),
> >> @@ -162,6 +164,7 @@ enum cons_flags {
> >>  	CON_ANYTIME		= BIT(4),
> >>  	CON_BRL			= BIT(5),
> >>  	CON_EXTENDED		= BIT(6),
> >> +	CON_SUSPENDED		= BIT(7),
> >
> > We have to show it in /proc/consoles, see fs/proc/consoles.c.
> 
> Are we supposed to show all flags in /proc/consoles? Currently
> CON_EXTENDED is not shown either.

Good question. It is true that CON_SUSPENDED flag is not that useful.
Userspace will likely be frozen when this flag is set. I am fine
with not showing it.

Well, CON_EXTENDED might actually be useful. It defines the format
of the console messages. It would be nice to fix this but it is
not in the scope of this patchset.

> >> --- a/kernel/printk/printk.c
> >> +++ b/kernel/printk/printk.c
> >> @@ -2574,11 +2590,26 @@ void suspend_console(void)
> >>  
> >>  void resume_console(void)
> >>  {
> >> +	struct console *con;
> >> +
> >>  	if (!console_suspend_enabled)
> >>  		return;
> >>  	down_console_sem();
> >>  	console_suspended = 0;
> >>  	console_unlock();
> >> +
> >> +	console_list_lock();
> >> +	for_each_console(con)
> >> +		console_srcu_write_flags(con, con->flags & ~CON_SUSPENDED);
> >> +	console_list_unlock();
> >> +
> >> +	/*
> >> +	 * Ensure that all SRCU list walks have completed. All printing
> >> +	 * contexts must be able to see they are no longer suspended so
> >> +	 * that they are guaranteed to wake up and resume printing.
> >> +	 */
> >> +	synchronize_srcu(&console_srcu);
> >> +
> >
> > The setting of the global "console_suspended" and per-console
> > CON_SUSPENDED flag is not synchronized. As a result, they might
> > become inconsistent:
> 
> They do not need to be synchronized and it doesn't matter if they become
> inconsistent. With this patch they are no longer related. One is for
> tracking the state of the console (CON_SUSPENDED), the other is for
> tracking the suspend trick for the console_lock.

OK, the race might be theoretical because it would be a race between
suspend_console() and resume_console(). But it exists:

CPU0					CPU1

suspend_console()

  console_list_lock();
    for_each_console(con)
      con->flags |= CON_SUSPENDED;
  console_list_unlock();

					resume_console()

					  down_console_sem();
					    console_suspended = 0;
					  console_unlock();

					  console_list_lock();
					    for_each_console(con)
					      con->flags &= ~CON_SUSPENDED;
					  console_list_unlock();

  down_console_sem();
    console_supended = 1;
  up_console_sem();

Result:

    console_supended == 1;
    con->flags & CON_SUSPENDED == 0;

    + NO_BKL consoles would work because they ignore console_supend.
    + legacy consoles won't work because console_unlock() would
      return early.

This does not look right.


> > I think that we could just remove the global "console_suspended" flag.
> >
> > IMHO, it used to be needed to avoid moving the global "console_seq" forward
> > when the consoles were suspended. But it is not needed now with the
> > per-console con->seq. console_flush_all() skips consoles when
> > console_is_usable() fails and it bails out when there is no progress.
> 
> The @console_suspended flag is used to allow console_lock/console_unlock
> to be called without triggering printing. This is probably so that vt
> code can make use of the console_lock for its own internal locking, even
> when in a state where ->write() should not be called. I would expect we
> still need it, even if the consoles do not.

But it would still work. console_unlock() could always call
console_flush_all() now. It just does not make any progress
when all consoles have CON_SUSPENDED flag set.

Note that this is a new behavior since the commit a699449bb13b70b8bd
("printk: refactor and rework printing logic"). Before this commit,
the main loop in console_unlock() always incremented console_seq
even when no console was enabled. This is why console_unlock()
had to skip the main loop when the consoles were suspended.

I believe that @console_suspended is not longer needed.
Let's replace it with the per-console flag and do not worry
about races.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH printk v1 16/18] kernel/panic: Add atomic write enforcement to warn/panic
  2023-04-13 12:13     ` John Ogness
@ 2023-04-14 10:10       ` Petr Mladek
  0 siblings, 0 replies; 92+ messages in thread
From: Petr Mladek @ 2023-04-14 10:10 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Andrew Morton, Guilherme G. Piccoli,
	Luis Chamberlain, David Gow, Tiezhu Yang, Daniel Vetter,
	tangmeng

On Thu 2023-04-13 14:19:13, John Ogness wrote:
> On 2023-04-13, Petr Mladek <pmladek@suse.com> wrote:
> >> --- a/kernel/panic.c
> >> +++ b/kernel/panic.c
> >> @@ -329,6 +332,8 @@ void panic(const char *fmt, ...)
> >>  	if (_crash_kexec_post_notifiers)
> >>  		__crash_kexec(NULL);
> >>  
> >> +	cons_atomic_flush(NULL, true);
> >
> > Do we need to explicitly flush the messages here?
> 
> This is where the atomic printing actually starts (after the full dump
> has been inserted into the ringbuffer).
> 
> > cons_atomic_flush() is called also from vprintk_emit(). And there are
> > many messages printed with the PANIC priority above.
> 
> vprintk_emit() does not print in this case. From cons_atomic_flush():
> 
>         /*
>          * When in an elevated priority, the printk() calls are not
>          * individually flushed. This is to allow the full output to
>          * be dumped to the ringbuffer before starting with printing
>          * the backlog.
>          */
>         if (cpu_state->prio > NBCON_PRIO_NORMAL && printk_caller_wctxt)
>                 return;

OK, what is the motivation for this behavior, please?
Does it has any advantages?

> 
> > This makes an assumption that either printk() in PANIC context
> > does not try to show the messages immediately or that this
> > explicit console_atomic_flush() tries harder. I think
> > that both assumptions are wrong.
> 
> Both assumptions are correct, because until this point there has been no
> effort to print.

Honestly, this makes me nervous. It means that panic() messages will
not reach the console unless they are explicitly flushed.

First, it is error-prone because it requires calling
console_atomic_flush() in all relevant code paths on the right
locations.

Second, it expects that panic() code could never fail between
the explicit console_atomic_flush() calls. If it failed, it might
be pretty useful to see the last printed message.

Third, messages might get lost when there are too many. And it is
realistic. For example, see panic_print_sys_info() it might add
quite long reports.

I would really prefer to flush atomic consoles with every printk()
unless there is a really good reason not to do it.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: port lock: was: Re: [PATCH printk v1 11/18] printk: nobkl: Introduce printer threads
  2023-04-06  9:46   ` port lock: " Petr Mladek
@ 2023-04-20  9:55     ` Petr Mladek
  2023-04-20 10:33       ` John Ogness
  0 siblings, 1 reply; 92+ messages in thread
From: Petr Mladek @ 2023-04-20  9:55 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman

On Thu 2023-04-06 11:46:37, Petr Mladek wrote:
> On Thu 2023-03-02 21:02:11, John Ogness wrote:
> > From: Thomas Gleixner <tglx@linutronix.de>
> > 
> > Add the infrastructure to create a printer thread per console along
> > with the required thread function, which is takeover/handover aware.
> 
> > --- a/kernel/printk/printk_nobkl.c
> > +++ b/kernel/printk/printk_nobkl.c
> > +/**
> > + * cons_kthread_func - The printk thread function
> > + * @__console:	Console to operate on
> > + */
> > +static int cons_kthread_func(void *__console)
> > +{
> > +	struct console *con = __console;
> > +	struct cons_write_context wctxt = {
> > +		.ctxt.console	= con,
> > +		.ctxt.prio	= CONS_PRIO_NORMAL,
> > +		.ctxt.thread	= 1,
> > +	};
> > +	struct cons_context *ctxt = &ACCESS_PRIVATE(&wctxt, ctxt);
> > +	unsigned long flags;
> > +	short con_flags;
> > +	bool backlog;
> > +	int cookie;
> > +	int ret;
> > +
> > +	for (;;) {
> > +		atomic_inc(&con->kthread_waiting);
> > +
> > +		/*
> > +		 * Provides a full memory barrier vs. cons_kthread_wake().
> > +		 */
> > +		ret = rcuwait_wait_event(&con->rcuwait,
> > +					 cons_kthread_should_wakeup(con, ctxt),
> > +					 TASK_INTERRUPTIBLE);
> > +
> > +		atomic_dec(&con->kthread_waiting);
> > +
> > +		if (kthread_should_stop())
> > +			break;
> > +
> > +		/* Wait was interrupted by a spurious signal, go back to sleep */
> > +		if (ret)
> > +			continue;
> > +
> > +		for (;;) {
> > +			cookie = console_srcu_read_lock();
> > +
> > +			/*
> > +			 * Ensure this stays on the CPU to make handover and
> > +			 * takeover possible.
> > +			 */
> > +			if (con->port_lock)
> > +				con->port_lock(con, true, &flags);
> 
> IMHO, we should use a more generic name. This should be a lock that
> provides full synchronization between con->write() and other
> operations on the device used by the console.
> 
> "port_lock" is specific for the serial consoles. IMHO, other consoles
> might use another lock. IMHO, tty uses "console_lock" internally for
> this purpose. netconsole seems to has "target_list_lock" that might
> possible have this purpose, s390 consoles are using sclp_con_lock,
> sclp_vt220_lock, or get_ccwdev_lock(raw->cdev).
> 
> 
> Honestly, I expected that we could replace these locks by
> cons_acquire_lock(). I know that the new lock is special: sleeping,
> timeouting, allows hand-over by priorities.
> 
> But I think that we might implement cons_acquire_lock() that would always
> busy wait without any timeout. And use some "priority" that would
> never handover the lock a voluntary way at least not with a voluntary
> one. The only difference would be that it is sleeping. But it might
> be acceptable in many cases.
> 
> Using the new lock instead of port->lock would allow to remove
> the tricks with using spin_trylock() when oops_in_progress is set.
> 
> That said, I am not sure if this is possible without major changes.
> For example, in case of serial consoles, it would require touching
> the layer using port->lock.
> 
> Also it would requere 1:1 relation between struct console and the output
> device lock. I am not sure if it is always the case. On the other
> hand, adding some infrastructure for this 1:1 relationship would
> help to solve smooth transition from the boot to the real console
> driver.
> 
> 
> OK, let's first define what the two locks are supposed to synchronize.
> My understanding is that this patchset uses them the following way:
> 
>     + The new lock (atomic_state) is used to serialize emiting
>       messages between different write contexts. It replaces
>       the functionality of console_lock.
> 
>       It is a per-console sleeping lock, allows voluntary and hars
>       hand-over using priorities and spinning with a timeout.
> 
> 
>     + The port_lock is used to synchronize various operations
>       of the console driver/device, like probe, init, exit,
>       configuration update.
> 
>       It is typically a per-console driver/device spin lock.
> 
> 
> I guess that we would want to keep both locks:
> 
>     + it might help to do the rework manageable
> 
>     + the sleeping lock might complicate some operations;
>       raw_spin_lock might be necessary at least on
>       non-RT system.

I forgot to check how these two locks are supposed to be used
in write_atomic().

It seems that cons_atomic_flush_con() takes only the new lock
(atomic_state) and ignores the port_lock(). It should be safe
against write_kthread(). But it is not safe against other
operations with the console device that are synchronized
only by the port_lock().

This looks like a potential source of problems and regressions.

Do I miss something, please?
Is there any plan how to deal with this?

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: port lock: was: Re: [PATCH printk v1 11/18] printk: nobkl: Introduce printer threads
  2023-04-20  9:55     ` Petr Mladek
@ 2023-04-20 10:33       ` John Ogness
  2023-04-20 13:33         ` Petr Mladek
  0 siblings, 1 reply; 92+ messages in thread
From: John Ogness @ 2023-04-20 10:33 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman

On 2023-04-20, Petr Mladek <pmladek@suse.com> wrote:
>> OK, let's first define what the two locks are supposed to synchronize.
>> My understanding is that this patchset uses them the following way:
>> 
>>     + The new lock (atomic_state) is used to serialize emiting
>>       messages between different write contexts. It replaces
>>       the functionality of console_lock.

It replaces the functionality of console_lock, but operates at a finer
level. It is serializing all access to the hardware registers involved
in outputting. For the 8250 driver, this is the IER register.

>>       It is a per-console sleeping lock, allows voluntary and has
>>       hand-over using priorities and spinning with a timeout.

It is not a sleeping lock. It is used as a trylock or spinning with
timeout. It has the special feature that it can be handed over to or
stolen by another context with a higher ownership priority.

>>     + The port_lock is used to synchronize various operations
>>       of the console driver/device, like probe, init, exit,
>>       configuration update.
>> 
>>       It is typically a per-console driver/device spin lock.
>> 
>> 
>> I guess that we would want to keep both locks:

I agree because the port_lock has a much larger scope and is fully
preemptible under PREEMPT_RT.

> I forgot to check how these two locks are supposed to be used
> in write_atomic().
>
> It seems that cons_atomic_flush_con() takes only the new lock
> (atomic_state) and ignores the port_lock(). It should be safe
> against write_kthread(). But it is not safe against other
> operations with the console device that are synchronized
> only by the port_lock().

Yes, it is because the console drivers will also take the atomic_state
lock when needed. You can see this in the POC patch I posted [0].

For example, a new function serial8250_enter_unsafe() is used by the
serial drivers to mark the beginning of an unsafe section. To use this
function, the port_lock must be held. This function additionally takes
the atomic_state lock. Then the driver is allowed to touch hardware
registers related to outputting (IER).

But typically the driver will use a new higher level function, for
example serial8250_in_IER(), which will enter unsafe, read the register,
and exit unsafe. This provides the necessary synchronization against
write_atomic() (for the 8250 driver).

Please also remember that hostile takeovers of drivers in unsafe
sections are done as a last resort in panic, after all other nbcon
consoles have safely flushed their buffers. So we should not spend too
many brain cycles on "what if the atomic_state lock is stolen while in
an unsafe section" questions. The answer is: then you are in "hope and
pray" mode.

John

[0] https://lore.kernel.org/lkml/877cv1geo4.fsf@jogness.linutronix.de

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: port lock: was: Re: [PATCH printk v1 11/18] printk: nobkl: Introduce printer threads
  2023-04-20 10:33       ` John Ogness
@ 2023-04-20 13:33         ` Petr Mladek
  2023-04-21 16:15           ` Petr Mladek
  0 siblings, 1 reply; 92+ messages in thread
From: Petr Mladek @ 2023-04-20 13:33 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman

On Thu 2023-04-20 12:39:31, John Ogness wrote:
> On 2023-04-20, Petr Mladek <pmladek@suse.com> wrote:
> >> OK, let's first define what the two locks are supposed to synchronize.
> >> My understanding is that this patchset uses them the following way:
> >> 
> >>     + The new lock (atomic_state) is used to serialize emiting
> >>       messages between different write contexts. It replaces
> >>       the functionality of console_lock.
> 
> It replaces the functionality of console_lock, but operates at a finer
> level. It is serializing all access to the hardware registers involved
> in outputting. For the 8250 driver, this is the IER register.
> 
> >>       It is a per-console sleeping lock, allows voluntary and has
> >>       hand-over using priorities and spinning with a timeout.
> 
> It is not a sleeping lock. It is used as a trylock or spinning with
> timeout. It has the special feature that it can be handed over to or
> stolen by another context with a higher ownership priority.

What prevents the owner from sleeping, please?
Is the disabled preemption enforced?

I see migrate_disable() in con_kthread_func() when con->port_lock
does not exist. It means that the preemption is possible in
this case.

> >>     + The port_lock is used to synchronize various operations
> >>       of the console driver/device, like probe, init, exit,
> >>       configuration update.
> >> 
> >>       It is typically a per-console driver/device spin lock.
> >> 
> >> 
> >> I guess that we would want to keep both locks:
> 
> I agree because the port_lock has a much larger scope and is fully
> preemptive under PREEMPT_RT.

Do you really want to call the entire cons_emit_record()
in non-preemptive context on RT?

It should be enough to disable preemption in the unsafe sections.
IMHO, it might be enabled in the "safe" sections. The only
drawback would be that the emergency context might take over
the lock in the middle of the line. But emergency context should be
rare. So it should be rare.


> > I forgot to check how these two locks are supposed to be used
> > in write_atomic().
> >
> > It seems that cons_atomic_flush_con() takes only the new lock
> > (atomic_state) and ignores the port_lock(). It should be safe
> > against write_kthread(). But it is not safe against other
> > operations with the console device that are synchronized
> > only by the port_lock().
> 
> Yes, it is because the console drivers will also take the atomic_state
> lock when needed. You can see this in the POC patch I posted [0].
> 
> For example, a new function serial8250_enter_unsafe() is used by the
> serial drivers to mark the beginning of an unsafe section. To use this
> function, the port_lock must be held. This function additionally takes
> the atomic_state lock. Then the driver is allowed to touch hardware
> registers related to outputting (IER).

I see.

I have missed it because the driver is still taking port->lock
directly on many locations. It seems to be in the init code.
It makes sense because it is called before the port gets
registered as a console.

Sigh, sigh, sigh, this scares me a lot. How do we know that all
code paths are correctly serialized or that they could never
be called in parallel?

Especially, the generic serial8250 API seems to be used in many drivers
together with a lot of custom code.

For example, what about serial8250_handle_irq()? How is it serialized
against serial8250_console_write_atomic()?

> But typically the driver will use a new higher level function, for
> example serial8250_in_IER(), which will enter unsafe, read the register,
> and exit unsafe. This provides the necessary synchronization against
> write_atomic() (for the 8250 driver).

Hmm, I think that we should create an API that would take the right
locks according to the state:

     + port->lock when the console is not registered
     + port->lock + con->atomic_state lock when the console is registered

and use it everywhere intestead of taking port->lock directly.

> Please also remember that hostile takeovers of drivers in unsafe
> sections are done as a last resort in panic, after all other nbcon
> consoles have safely flushed their buffers. So we should not spend too
> many brain cycles on "what if the atomic_state lock is stolen while in
> an unsafe section" questions. The answer is: then you are in "hope and
> pray" mode.

I know. And the hostile takeover is not my concern.

My concern are races between write_atomic() in emergency context
and other driver code serialized only by the port->lock.

We need an API that will make sure that any code serialized
by port->lock is properly serialized against write->atomic()
when the console is registered. Also we need to synchronize
the registration and port->lock.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: port lock: was: Re: [PATCH printk v1 11/18] printk: nobkl: Introduce printer threads
  2023-04-20 13:33         ` Petr Mladek
@ 2023-04-21 16:15           ` Petr Mladek
  0 siblings, 0 replies; 92+ messages in thread
From: Petr Mladek @ 2023-04-21 16:15 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman

On Thu 2023-04-20 15:33:10, Petr Mladek wrote:
> On Thu 2023-04-20 12:39:31, John Ogness wrote:
> I know. And the hostile takeover is not my concern.
> 
> My concern are races between write_atomic() in emergency context
> and other driver code serialized only by the port->lock.
> 
> We need an API that will make sure that any code serialized
> by port->lock is properly serialized against write->atomic()
> when the console is registered.

I though more about it. My idea is the following:

A. The nbcon side might have basically four modes
   for taking the new nbcon lock. It might have four interfaces:

    nbcon_trylock(struct console *con,
		  enum cons_prio prio);
    nbcon_trylock_emergency(struct console *con,
		  enum cons_prio prio);
    nbcon_trylock_panic(struct console *con,
		  enum cons_prio prio);
    nbcon_lock(struct console *con,
		  enum cons_prio prio);

, where

   + nbcon_trylock() would use the current approach for
     the printk kthread. It means that it would try to get
     the lock with a timeout. But it would never try to
     steel the lock.

   + nbcon_trylock_emergency() would use the current approach
     used in emergency context. It would busy wait and
     then try to steel the lock. But it would take over the lock
     only when it is in safe context.

    + nbcon_trylock_panic() would behave the same way as
      nbcon_trylock_emergency(). But it would allow to
      take over the lock even when it is unsafe. It might
      still fail when it is not called on the CPU that
      handles the panic().

    + nbcon_lock() would wait until the lock is really
      available.

  and

  enum cons_prio would be one of the four priorities.

  The API should disable cpu migration to make sure that
  it will stay the same until the lock is released.

  The caller should rememner the priority somewhere,
  e,g. in struct cons_ctxt.


B. The port->lock side would switch to the new nbcon lock
   when the console is registered. There are two big questions
   that come to my mind:

  1. The original code does not expect that it might lose
     the lock.

     It should be enough to mark the entire section .unsafe.
     In that case, only the final panic() call might steel
     the lock.


  2. The console registration must be done a safe way
     to make sure that all callers will use the same
     real lock (port->lock or nbcon_lock).


  IMHO, the uart_port part might look like:

void uart_port_lock_irqsafe(struct uart_port *port,
	int *cookie,
	unsigned long &flags)
{
	struct console *con;

try_again:
	/* Synchrnonization against console registration. */
	*cookie = console_srcu_read_lock();

	con = rcu_access_pointer(nbcon->cons);

	if (!can_use_nbcon_lock(con)) {
		/* Only the port lock is available. */
		spin_lock_irqsafe(&port->lock, *flags);
		port->locked = LOCKED_BY_PORT_LOCK;
		return;
	}


	/*
	 * The nbcon lock is available. Take it instead of
	 * the port->lock. The only exception is when
	 * there is registration in progress. In this case,
	 * port->lock has to be taken as well.
	 *
	 * It will always be taken only with the normal priority.
	 * when called from the port lock side.
	 */
	nbcon_lock(con, CON_PRIO_NORMAL);
	local_irq_save(*flags);

	if (cons->registration_in_progress) {
		spin_lock(&port->lock);
		port->locked = LOCKED_BY_BOTH_LOCKS;
	} else {
		port->locked = LOCKED_BY_NBCON_LOCK;
	}

	/*
	 * Makes sure that only nbcon_lock_panic() would
	 * be able to steel this lock.
	 */
	if (!nbcon_enter_unsafe(con, CON_PRIO_NORMAL)) {
		revert locks;
		goto try_again;
	}
}

void uart_port_unlock_irqrestore(struct uart_port *port,
	int *cookie, unsigned long *flags)
{
	struct console *con;

	con = rcu_access_pointer(nbcon->cons);

	switch (port->locked) {
	LOCKED_BY_PORT_LOCK:
		spin_unlock_irqrestore(&port->lock, *flags);
		break;
	LOCKED_BY_BOTH_LOCKS:
		spin_unlock(&port->lock);
		fallthrough;
	LOCKED_BY_NBCON_LOCK:
		nbcon_exit_unsafe(con, CON_PRIO_NORMAL);
		local_irq_restore(*flags);
		nbcon_unlock(con, CON_PRIO_NORMAL);
	};

	console_srcu_unlock(*cookie);
}


and finally the registration:

void register_console(struct console *newcon)
{
[...]
	if (con->flags & CON_NBCON) {
		nbcon_lock(con);
		nbcon->regisration_in_progress = true;
		nbcon_unlock(con);

		/*
		 * Make sure that callers are locked by both
		 * nbcon_lock() and port->lock()
		 */
		synchronize_srcu();
	}

	/* Insert the console into console_list */

	if (con->flags & CON_NBCON) {
		nbcon_lock(con);
		nbcon->regisration_in_progress = false;
		nbcon_unlock(con);
	}
[...]
}

and similar thing in uregister_console().

I am not sure if I should send this on Friday evening. But I reworked
it many times and I do not longer see any obvious reason why it
could not work.

My relief is that it builds on top of your code. It basically just
adds the port_lock interface. I hope that it would actually simplify
things a lot.

Well, a huge challenge might be to replace all spin_lock(port->lock)
calls with the new API. There is a lot of code shared between various
consoles and we wanted to migrate them one-by-one.

On the other hand, the new port_lock() API should behave as simple
spin lock when port->cons is a legacy console.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 92+ messages in thread

end of thread, other threads:[~2023-04-21 16:15 UTC | newest]

Thread overview: 92+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-02 19:56 [PATCH printk v1 00/18] threaded/atomic console support John Ogness
2023-03-02 19:56 ` [PATCH printk v1 01/18] kdb: do not assume write() callback available John Ogness
2023-03-07 14:57   ` Petr Mladek
2023-03-07 16:34   ` Doug Anderson
2023-03-09 10:52   ` Daniel Thompson
2023-03-09 11:26     ` Petr Mladek
2023-03-09 11:30       ` Daniel Thompson
2023-03-02 19:56 ` [PATCH printk v1 02/18] printk: Add NMI check to down_trylock_console_sem() John Ogness
2023-03-07 16:05   ` Petr Mladek
2023-03-17 11:37     ` John Ogness
2023-04-13 13:42       ` Petr Mladek
2023-03-02 19:56 ` [PATCH printk v1 03/18] printk: Consolidate console deferred printing John Ogness
2023-03-08 13:15   ` Petr Mladek
2023-03-17 13:05     ` John Ogness
2023-04-13 15:15       ` Petr Mladek
2023-03-02 19:56 ` [PATCH printk v1 04/18] printk: Add per-console suspended state John Ogness
2023-03-08 14:40   ` Petr Mladek
2023-03-17 13:22     ` John Ogness
2023-04-14  9:56       ` Petr Mladek
2023-03-02 19:56 ` [PATCH printk v1 05/18] printk: Add non-BKL console basic infrastructure John Ogness
2023-03-09 14:08   ` global states: was: " Petr Mladek
2023-03-17 13:29     ` John Ogness
2023-03-09 15:32   ` naming: " Petr Mladek
2023-03-17 13:39     ` John Ogness
2023-03-21 16:04   ` union: was: " Petr Mladek
2023-03-27 16:28     ` John Ogness
2023-03-28  8:20       ` Petr Mladek
2023-03-28  9:42         ` John Ogness
2023-03-28 12:52           ` Petr Mladek
2023-03-28 13:47           ` Steven Rostedt
2023-03-02 19:56 ` [PATCH printk v1 06/18] printk: nobkl: Add acquire/release logic John Ogness
2023-03-06  9:07   ` Dan Carpenter
2023-03-06  9:39     ` John Ogness
2023-03-13 16:07   ` Petr Mladek
2023-03-17 14:56     ` John Ogness
2023-03-20 16:10       ` Petr Mladek
2023-03-17 17:34   ` simplify: was: " Petr Mladek
2023-03-21 15:36     ` Petr Mladek
2023-04-02 18:39       ` John Ogness
2023-03-02 19:56 ` [PATCH printk v1 07/18] printk: nobkl: Add buffer management John Ogness
2023-03-21 16:38   ` Petr Mladek
2023-03-23 13:38     ` John Ogness
2023-03-23 15:25       ` Petr Mladek
2023-03-02 19:56 ` [PATCH printk v1 08/18] printk: nobkl: Add sequence handling John Ogness
2023-03-27 15:45   ` Petr Mladek
2023-03-02 19:56 ` [PATCH printk v1 09/18] printk: nobkl: Add print state functions John Ogness
2023-03-29 13:58   ` buffer write race: " Petr Mladek
2023-03-29 14:33     ` John Ogness
2023-03-30 11:54       ` Petr Mladek
2023-03-29 14:05   ` misc details: was: " Petr Mladek
2023-03-02 19:56 ` [PATCH printk v1 10/18] printk: nobkl: Add emit function and callback functions for atomic printing John Ogness
2023-03-03  0:19   ` kernel test robot
2023-03-03 10:55     ` John Ogness
2023-03-31 10:29   ` dropped handling: was: " Petr Mladek
2023-03-31 10:36   ` semantic: " Petr Mladek
     [not found]     ` <87edp29kvq.fsf@jogness.linutronix.de>
     [not found]       ` <ZCraqrkqFtsfLWuP@alley>
     [not found]         ` <87ilecsrvl.fsf@jogness.linutronix.de>
2023-04-04 14:09           ` Petr Mladek
2023-03-02 19:56 ` [PATCH printk v1 11/18] printk: nobkl: Introduce printer threads John Ogness
2023-03-03  1:23   ` kernel test robot
2023-03-03 10:56     ` John Ogness
2023-04-05 10:48   ` boot console: was: " Petr Mladek
2023-04-06  8:09   ` wakeup synchronization: " Petr Mladek
2023-04-06  9:46   ` port lock: " Petr Mladek
2023-04-20  9:55     ` Petr Mladek
2023-04-20 10:33       ` John Ogness
2023-04-20 13:33         ` Petr Mladek
2023-04-21 16:15           ` Petr Mladek
2023-04-06 13:19   ` misc: " Petr Mladek
2023-04-13 13:28   ` (k)thread: " Petr Mladek
2023-03-02 19:56 ` [PATCH printk v1 12/18] printk: nobkl: Add printer thread wakeups John Ogness
2023-04-12  9:38   ` Petr Mladek
2023-03-02 19:56 ` [PATCH printk v1 13/18] printk: nobkl: Add write context storage for atomic writes John Ogness
2023-03-02 19:56 ` [PATCH printk v1 14/18] printk: nobkl: Provide functions for atomic write enforcement John Ogness
2023-04-12 14:53   ` Petr Mladek
2023-03-02 19:56 ` [PATCH printk v1 15/18] printk: nobkl: Stop threads on shutdown/reboot John Ogness
2023-04-13  9:03   ` Petr Mladek
2023-03-02 19:56 ` [PATCH printk v1 16/18] kernel/panic: Add atomic write enforcement to warn/panic John Ogness
2023-04-13 10:08   ` Petr Mladek
2023-04-13 12:13     ` John Ogness
2023-04-14 10:10       ` Petr Mladek
2023-03-02 19:56 ` [PATCH printk v1 17/18] rcu: Add atomic write enforcement for rcu stalls John Ogness
2023-04-13 12:10   ` Petr Mladek
2023-03-02 19:56 ` [PATCH printk v1 18/18] printk: Perform atomic flush in console_flush_on_panic() John Ogness
2023-04-13 12:20   ` Petr Mladek
2023-03-02 19:58 ` [PATCH printk v1 00/18] serial: 8250: implement non-BKL console John Ogness
2023-03-28 13:33   ` locking API: was: " Petr Mladek
2023-03-28 13:57     ` John Ogness
2023-03-28 15:10       ` Petr Mladek
2023-03-28 21:47         ` John Ogness
2023-03-29  8:03           ` Petr Mladek
2023-03-28 13:59   ` [PATCH printk v1 00/18] POC: serial: 8250: implement nbcon console John Ogness
2023-03-09 10:55 ` [PATCH printk v1 00/18] threaded/atomic console support Daniel Thompson
2023-03-09 11:14   ` John Ogness

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.