All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH printk v4 00/15] implement threaded console printing
@ 2022-04-21 21:22 John Ogness
  2022-04-21 21:22 ` [PATCH printk v4 01/15] printk: rename cpulock functions John Ogness
                   ` (15 more replies)
  0 siblings, 16 replies; 99+ messages in thread
From: John Ogness @ 2022-04-21 21:22 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Andrew Morton, Randy Dunlap, Marco Elver,
	Stephen Boyd, Alexander Potapenko, Nicholas Piggin,
	Greg Kroah-Hartman, Jiri Slaby, Paul E. McKenney,
	Frederic Weisbecker, Neeraj Upadhyay, Josh Triplett,
	Mathieu Desnoyers, Lai Jiangshan, Joel Fernandes, Kees Cook,
	Luis Chamberlain, Xiaoming Ni, Peter Zijlstra, Andy Shevchenko,
	Corey Minyard, Bjorn Andersson, Sebastian Andrzej Siewior,
	Mark Brown, Daniel Lezcano, Matti Vaittinen, Dmitry Torokhov,
	Eric W. Biederman, Shawn Guo, Wang Qing, rcu

This is v4 of a series to implement a kthread for each registered
console. v3 is here [0]. The kthreads locklessly retrieve the
records from the printk ringbuffer and also do not cause any lock
contention between each other. This allows consoles to run at full
speed. For example, a netconsole is able to dump records much
faster than a serial or vt console. Also, during normal operation,
printk() callers are completely decoupled from console printing.

There are situations where kthread printing is not sufficient. For
example, during panic situations, where the kthreads may not get a
chance to schedule. In such cases, the current method of attempting
to print directly within the printk() caller context is used. New
functions printk_prefer_direct_enter() and
printk_prefer_direct_exit() are made available to mark areas of the
kernel where direct printing is preferred. (These should only be
areas that do not occur during normal operation.)

This series also introduces pr_flush(): a might_sleep() function
that will block until all active printing threads have caught up
to the latest record at the time of the pr_flush() call. This
function is useful, for example, to wait until pending records
are flushed to consoles before suspending.

Note that this series does *not* increase the reliability of console
printing. Rather it focuses on the non-interference aspect of
printk() by decoupling printk() callers from printing (during normal
operation). Nonetheless, the reliability aspect should not worsen
due to this series.

John Ogness

[0] https://lore.kernel.org/lkml/20220419234637.357112-1-john.ogness@linutronix.de

Changes since v3:

- For defer_console_output(), call allow_direct_printing() instead
  of only checking @printk_prefer_direct.

- Remove console_lock_single_hold() and
  console_unlock_single_release() functions. Use the console_lock
  for console_stop(), console_start(), unregister_console(),
  printk_kthread_func().

- Introduce macros console_flags_set() and console_flags_clear() to
  adjust con->flags using READ_ONCE()/WRITE_ONCE() in order to
  guarantee consistent values for the variable. (This does not make
  the RMW operations atomic, but the console_lock and con->lock are
  still used to synchronize between tasks that modify con->flags.)

- Add and/or expand comments for allow_direct_printing(),
  console_cpu_notify(), __console_unlock(), console_stop(),
  console_start(), unregister_console(), printk_kthread_func(),
  defer_console_output().

John Ogness (15):
  printk: rename cpulock functions
  printk: cpu sync always disable interrupts
  printk: add missing memory barrier to wake_up_klogd()
  printk: wake up all waiters
  printk: wake waiters for safe and NMI contexts
  printk: get caller_id/timestamp after migration disable
  printk: call boot_delay_msec() in printk_delay()
  printk: add con_printk() macro for console details
  printk: refactor and rework printing logic
  printk: move buffer definitions into console_emit_next_record() caller
  printk: add pr_flush()
  printk: add functions to prefer direct printing
  printk: add kthread console printers
  printk: extend console_lock for proper kthread support
  printk: remove @console_locked

 drivers/tty/sysrq.c     |    2 +
 include/linux/console.h |   19 +
 include/linux/printk.h  |   82 ++-
 kernel/hung_task.c      |   11 +-
 kernel/panic.c          |    4 +
 kernel/printk/printk.c  | 1234 +++++++++++++++++++++++++++++----------
 kernel/rcu/tree_stall.h |    2 +
 kernel/reboot.c         |   14 +-
 kernel/watchdog.c       |    4 +
 kernel/watchdog_hld.c   |    4 +
 lib/dump_stack.c        |    4 +-
 lib/nmi_backtrace.c     |    4 +-
 12 files changed, 1059 insertions(+), 325 deletions(-)


base-commit: 84d7df104dbab9c3dda8f2c5b46f9a6fc256fe02
-- 
2.30.2


^ permalink raw reply	[flat|nested] 99+ messages in thread

* [PATCH printk v4 01/15] printk: rename cpulock functions
  2022-04-21 21:22 [PATCH printk v4 00/15] implement threaded console printing John Ogness
@ 2022-04-21 21:22 ` John Ogness
  2022-04-21 21:22 ` [PATCH printk v4 02/15] printk: cpu sync always disable interrupts John Ogness
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 99+ messages in thread
From: John Ogness @ 2022-04-21 21:22 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Andrew Morton, Randy Dunlap, Marco Elver,
	Stephen Boyd, Alexander Potapenko, Nicholas Piggin

Since the printk cpulock is CPU-reentrant and since it is used
in all contexts, its usage must be carefully considered and
most likely will require programming locklessly. To avoid
mistaking the printk cpulock as a typical lock, rename it to
cpu_sync. The main functions then become:

    printk_cpu_sync_get_irqsave(flags);
    printk_cpu_sync_put_irqrestore(flags);

Add extra notes of caution in the function description to help
developers understand the requirements for correct usage.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
---
 include/linux/printk.h | 54 +++++++++++++++++++-------------
 kernel/printk/printk.c | 71 +++++++++++++++++++++---------------------
 lib/dump_stack.c       |  4 +--
 lib/nmi_backtrace.c    |  4 +--
 4 files changed, 73 insertions(+), 60 deletions(-)

diff --git a/include/linux/printk.h b/include/linux/printk.h
index 1522df223c0f..859323a52985 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -277,43 +277,55 @@ static inline void printk_trigger_flush(void)
 #endif
 
 #ifdef CONFIG_SMP
-extern int __printk_cpu_trylock(void);
-extern void __printk_wait_on_cpu_lock(void);
-extern void __printk_cpu_unlock(void);
+extern int __printk_cpu_sync_try_get(void);
+extern void __printk_cpu_sync_wait(void);
+extern void __printk_cpu_sync_put(void);
 
 /**
- * printk_cpu_lock_irqsave() - Acquire the printk cpu-reentrant spinning
- *                             lock and disable interrupts.
+ * printk_cpu_sync_get_irqsave() - Acquire the printk cpu-reentrant spinning
+ *                                 lock and disable interrupts.
  * @flags: Stack-allocated storage for saving local interrupt state,
- *         to be passed to printk_cpu_unlock_irqrestore().
+ *         to be passed to printk_cpu_sync_put_irqrestore().
  *
  * If the lock is owned by another CPU, spin until it becomes available.
  * Interrupts are restored while spinning.
+ *
+ * CAUTION: This function must be used carefully. It does not behave like a
+ * typical lock. Here are important things to watch out for...
+ *
+ *     * This function is reentrant on the same CPU. Therefore the calling
+ *       code must not assume exclusive access to data if code accessing the
+ *       data can run reentrant or within NMI context on the same CPU.
+ *
+ *     * If there exists usage of this function from NMI context, it becomes
+ *       unsafe to perform any type of locking or spinning to wait for other
+ *       CPUs after calling this function from any context. This includes
+ *       using spinlocks or any other busy-waiting synchronization methods.
  */
-#define printk_cpu_lock_irqsave(flags)		\
-	for (;;) {				\
-		local_irq_save(flags);		\
-		if (__printk_cpu_trylock())	\
-			break;			\
-		local_irq_restore(flags);	\
-		__printk_wait_on_cpu_lock();	\
+#define printk_cpu_sync_get_irqsave(flags)		\
+	for (;;) {					\
+		local_irq_save(flags);			\
+		if (__printk_cpu_sync_try_get())	\
+			break;				\
+		local_irq_restore(flags);		\
+		__printk_cpu_sync_wait();		\
 	}
 
 /**
- * printk_cpu_unlock_irqrestore() - Release the printk cpu-reentrant spinning
- *                                  lock and restore interrupts.
- * @flags: Caller's saved interrupt state, from printk_cpu_lock_irqsave().
+ * printk_cpu_sync_put_irqrestore() - Release the printk cpu-reentrant spinning
+ *                                    lock and restore interrupts.
+ * @flags: Caller's saved interrupt state, from printk_cpu_sync_get_irqsave().
  */
-#define printk_cpu_unlock_irqrestore(flags)	\
+#define printk_cpu_sync_put_irqrestore(flags)	\
 	do {					\
-		__printk_cpu_unlock();		\
+		__printk_cpu_sync_put();	\
 		local_irq_restore(flags);	\
-	} while (0)				\
+	} while (0)
 
 #else
 
-#define printk_cpu_lock_irqsave(flags) ((void)flags)
-#define printk_cpu_unlock_irqrestore(flags) ((void)flags)
+#define printk_cpu_sync_get_irqsave(flags) ((void)flags)
+#define printk_cpu_sync_put_irqrestore(flags) ((void)flags)
 
 #endif /* CONFIG_SMP */
 
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index da03c15ecc89..13a1eebe72af 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -3667,26 +3667,26 @@ EXPORT_SYMBOL_GPL(kmsg_dump_rewind);
 #endif
 
 #ifdef CONFIG_SMP
-static atomic_t printk_cpulock_owner = ATOMIC_INIT(-1);
-static atomic_t printk_cpulock_nested = ATOMIC_INIT(0);
+static atomic_t printk_cpu_sync_owner = ATOMIC_INIT(-1);
+static atomic_t printk_cpu_sync_nested = ATOMIC_INIT(0);
 
 /**
- * __printk_wait_on_cpu_lock() - Busy wait until the printk cpu-reentrant
- *                               spinning lock is not owned by any CPU.
+ * __printk_cpu_sync_wait() - Busy wait until the printk cpu-reentrant
+ *                            spinning lock is not owned by any CPU.
  *
  * Context: Any context.
  */
-void __printk_wait_on_cpu_lock(void)
+void __printk_cpu_sync_wait(void)
 {
 	do {
 		cpu_relax();
-	} while (atomic_read(&printk_cpulock_owner) != -1);
+	} while (atomic_read(&printk_cpu_sync_owner) != -1);
 }
-EXPORT_SYMBOL(__printk_wait_on_cpu_lock);
+EXPORT_SYMBOL(__printk_cpu_sync_wait);
 
 /**
- * __printk_cpu_trylock() - Try to acquire the printk cpu-reentrant
- *                          spinning lock.
+ * __printk_cpu_sync_try_get() - Try to acquire the printk cpu-reentrant
+ *                               spinning lock.
  *
  * If no processor has the lock, the calling processor takes the lock and
  * becomes the owner. If the calling processor is already the owner of the
@@ -3695,7 +3695,7 @@ EXPORT_SYMBOL(__printk_wait_on_cpu_lock);
  * Context: Any context. Expects interrupts to be disabled.
  * Return: 1 on success, otherwise 0.
  */
-int __printk_cpu_trylock(void)
+int __printk_cpu_sync_try_get(void)
 {
 	int cpu;
 	int old;
@@ -3705,79 +3705,80 @@ int __printk_cpu_trylock(void)
 	/*
 	 * Guarantee loads and stores from this CPU when it is the lock owner
 	 * are _not_ visible to the previous lock owner. This pairs with
-	 * __printk_cpu_unlock:B.
+	 * __printk_cpu_sync_put:B.
 	 *
 	 * Memory barrier involvement:
 	 *
-	 * If __printk_cpu_trylock:A reads from __printk_cpu_unlock:B, then
-	 * __printk_cpu_unlock:A can never read from __printk_cpu_trylock:B.
+	 * If __printk_cpu_sync_try_get:A reads from __printk_cpu_sync_put:B,
+	 * then __printk_cpu_sync_put:A can never read from
+	 * __printk_cpu_sync_try_get:B.
 	 *
 	 * Relies on:
 	 *
-	 * RELEASE from __printk_cpu_unlock:A to __printk_cpu_unlock:B
+	 * RELEASE from __printk_cpu_sync_put:A to __printk_cpu_sync_put:B
 	 * of the previous CPU
 	 *    matching
-	 * ACQUIRE from __printk_cpu_trylock:A to __printk_cpu_trylock:B
-	 * of this CPU
+	 * ACQUIRE from __printk_cpu_sync_try_get:A to
+	 * __printk_cpu_sync_try_get:B of this CPU
 	 */
-	old = atomic_cmpxchg_acquire(&printk_cpulock_owner, -1,
-				     cpu); /* LMM(__printk_cpu_trylock:A) */
+	old = atomic_cmpxchg_acquire(&printk_cpu_sync_owner, -1,
+				     cpu); /* LMM(__printk_cpu_sync_try_get:A) */
 	if (old == -1) {
 		/*
 		 * This CPU is now the owner and begins loading/storing
-		 * data: LMM(__printk_cpu_trylock:B)
+		 * data: LMM(__printk_cpu_sync_try_get:B)
 		 */
 		return 1;
 
 	} else if (old == cpu) {
 		/* This CPU is already the owner. */
-		atomic_inc(&printk_cpulock_nested);
+		atomic_inc(&printk_cpu_sync_nested);
 		return 1;
 	}
 
 	return 0;
 }
-EXPORT_SYMBOL(__printk_cpu_trylock);
+EXPORT_SYMBOL(__printk_cpu_sync_try_get);
 
 /**
- * __printk_cpu_unlock() - Release the printk cpu-reentrant spinning lock.
+ * __printk_cpu_sync_put() - Release the printk cpu-reentrant spinning lock.
  *
  * The calling processor must be the owner of the lock.
  *
  * Context: Any context. Expects interrupts to be disabled.
  */
-void __printk_cpu_unlock(void)
+void __printk_cpu_sync_put(void)
 {
-	if (atomic_read(&printk_cpulock_nested)) {
-		atomic_dec(&printk_cpulock_nested);
+	if (atomic_read(&printk_cpu_sync_nested)) {
+		atomic_dec(&printk_cpu_sync_nested);
 		return;
 	}
 
 	/*
 	 * This CPU is finished loading/storing data:
-	 * LMM(__printk_cpu_unlock:A)
+	 * LMM(__printk_cpu_sync_put:A)
 	 */
 
 	/*
 	 * Guarantee loads and stores from this CPU when it was the
 	 * lock owner are visible to the next lock owner. This pairs
-	 * with __printk_cpu_trylock:A.
+	 * with __printk_cpu_sync_try_get:A.
 	 *
 	 * Memory barrier involvement:
 	 *
-	 * If __printk_cpu_trylock:A reads from __printk_cpu_unlock:B,
-	 * then __printk_cpu_trylock:B reads from __printk_cpu_unlock:A.
+	 * If __printk_cpu_sync_try_get:A reads from __printk_cpu_sync_put:B,
+	 * then __printk_cpu_sync_try_get:B reads from __printk_cpu_sync_put:A.
 	 *
 	 * Relies on:
 	 *
-	 * RELEASE from __printk_cpu_unlock:A to __printk_cpu_unlock:B
+	 * RELEASE from __printk_cpu_sync_put:A to __printk_cpu_sync_put:B
 	 * of this CPU
 	 *    matching
-	 * ACQUIRE from __printk_cpu_trylock:A to __printk_cpu_trylock:B
-	 * of the next CPU
+	 * ACQUIRE from __printk_cpu_sync_try_get:A to
+	 * __printk_cpu_sync_try_get:B of the next CPU
 	 */
-	atomic_set_release(&printk_cpulock_owner,
-			   -1); /* LMM(__printk_cpu_unlock:B) */
+	atomic_set_release(&printk_cpu_sync_owner,
+			   -1); /* LMM(__printk_cpu_sync_put:B) */
 }
-EXPORT_SYMBOL(__printk_cpu_unlock);
+EXPORT_SYMBOL(__printk_cpu_sync_put);
 #endif /* CONFIG_SMP */
diff --git a/lib/dump_stack.c b/lib/dump_stack.c
index 6b7f1bf6715d..83471e81501a 100644
--- a/lib/dump_stack.c
+++ b/lib/dump_stack.c
@@ -102,9 +102,9 @@ asmlinkage __visible void dump_stack_lvl(const char *log_lvl)
 	 * Permit this cpu to perform nested stack dumps while serialising
 	 * against other CPUs
 	 */
-	printk_cpu_lock_irqsave(flags);
+	printk_cpu_sync_get_irqsave(flags);
 	__dump_stack(log_lvl);
-	printk_cpu_unlock_irqrestore(flags);
+	printk_cpu_sync_put_irqrestore(flags);
 }
 EXPORT_SYMBOL(dump_stack_lvl);
 
diff --git a/lib/nmi_backtrace.c b/lib/nmi_backtrace.c
index 199ab201d501..d01aec6ae15c 100644
--- a/lib/nmi_backtrace.c
+++ b/lib/nmi_backtrace.c
@@ -99,7 +99,7 @@ bool nmi_cpu_backtrace(struct pt_regs *regs)
 		 * Allow nested NMI backtraces while serializing
 		 * against other CPUs.
 		 */
-		printk_cpu_lock_irqsave(flags);
+		printk_cpu_sync_get_irqsave(flags);
 		if (!READ_ONCE(backtrace_idle) && regs && cpu_in_idle(instruction_pointer(regs))) {
 			pr_warn("NMI backtrace for cpu %d skipped: idling at %pS\n",
 				cpu, (void *)instruction_pointer(regs));
@@ -110,7 +110,7 @@ bool nmi_cpu_backtrace(struct pt_regs *regs)
 			else
 				dump_stack();
 		}
-		printk_cpu_unlock_irqrestore(flags);
+		printk_cpu_sync_put_irqrestore(flags);
 		cpumask_clear_cpu(cpu, to_cpumask(backtrace_mask));
 		return true;
 	}
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH printk v4 02/15] printk: cpu sync always disable interrupts
  2022-04-21 21:22 [PATCH printk v4 00/15] implement threaded console printing John Ogness
  2022-04-21 21:22 ` [PATCH printk v4 01/15] printk: rename cpulock functions John Ogness
@ 2022-04-21 21:22 ` John Ogness
  2022-04-21 21:22 ` [PATCH printk v4 03/15] printk: add missing memory barrier to wake_up_klogd() John Ogness
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 99+ messages in thread
From: John Ogness @ 2022-04-21 21:22 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel

The CPU sync functions are a NOP for !CONFIG_SMP. But for
!CONFIG_SMP they still need to disable interrupts in order to
preserve context within the CPU sync sections.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
---
 include/linux/printk.h | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/include/linux/printk.h b/include/linux/printk.h
index 859323a52985..b70a42f94031 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -281,9 +281,16 @@ extern int __printk_cpu_sync_try_get(void);
 extern void __printk_cpu_sync_wait(void);
 extern void __printk_cpu_sync_put(void);
 
+#else
+
+#define __printk_cpu_sync_try_get() true
+#define __printk_cpu_sync_wait()
+#define __printk_cpu_sync_put()
+#endif /* CONFIG_SMP */
+
 /**
- * printk_cpu_sync_get_irqsave() - Acquire the printk cpu-reentrant spinning
- *                                 lock and disable interrupts.
+ * printk_cpu_sync_get_irqsave() - Disable interrupts and acquire the printk
+ *                                 cpu-reentrant spinning lock.
  * @flags: Stack-allocated storage for saving local interrupt state,
  *         to be passed to printk_cpu_sync_put_irqrestore().
  *
@@ -322,13 +329,6 @@ extern void __printk_cpu_sync_put(void);
 		local_irq_restore(flags);	\
 	} while (0)
 
-#else
-
-#define printk_cpu_sync_get_irqsave(flags) ((void)flags)
-#define printk_cpu_sync_put_irqrestore(flags) ((void)flags)
-
-#endif /* CONFIG_SMP */
-
 extern int kptr_restrict;
 
 /**
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH printk v4 03/15] printk: add missing memory barrier to wake_up_klogd()
  2022-04-21 21:22 [PATCH printk v4 00/15] implement threaded console printing John Ogness
  2022-04-21 21:22 ` [PATCH printk v4 01/15] printk: rename cpulock functions John Ogness
  2022-04-21 21:22 ` [PATCH printk v4 02/15] printk: cpu sync always disable interrupts John Ogness
@ 2022-04-21 21:22 ` John Ogness
  2022-04-21 21:22 ` [PATCH printk v4 04/15] printk: wake up all waiters John Ogness
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 99+ messages in thread
From: John Ogness @ 2022-04-21 21:22 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel

It is important that any new records are visible to preparing
waiters before the waker checks if the wait queue is empty.
Otherwise it is possible that:

- there are new records available
- the waker sees an empty wait queue and does not wake
- the preparing waiter sees no new records and begins to wait

This is exactly the problem that the function description of
waitqueue_active() warns about.

Use wq_has_sleeper() instead of waitqueue_active() because it
includes the necessary full memory barrier.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
---
 kernel/printk/printk.c | 39 ++++++++++++++++++++++++++++++++++++---
 1 file changed, 36 insertions(+), 3 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 13a1eebe72af..f817dfb4852d 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -746,8 +746,19 @@ static ssize_t devkmsg_read(struct file *file, char __user *buf,
 			goto out;
 		}
 
+		/*
+		 * Guarantee this task is visible on the waitqueue before
+		 * checking the wake condition.
+		 *
+		 * The full memory barrier within set_current_state() of
+		 * prepare_to_wait_event() pairs with the full memory barrier
+		 * within wq_has_sleeper().
+		 *
+		 * This pairs with wake_up_klogd:A.
+		 */
 		ret = wait_event_interruptible(log_wait,
-				prb_read_valid(prb, atomic64_read(&user->seq), r));
+				prb_read_valid(prb,
+					atomic64_read(&user->seq), r)); /* LMM(devkmsg_read:A) */
 		if (ret)
 			goto out;
 	}
@@ -1513,7 +1524,18 @@ static int syslog_print(char __user *buf, int size)
 		seq = syslog_seq;
 
 		mutex_unlock(&syslog_lock);
-		len = wait_event_interruptible(log_wait, prb_read_valid(prb, seq, NULL));
+		/*
+		 * Guarantee this task is visible on the waitqueue before
+		 * checking the wake condition.
+		 *
+		 * The full memory barrier within set_current_state() of
+		 * prepare_to_wait_event() pairs with the full memory barrier
+		 * within wq_has_sleeper().
+		 *
+		 * This pairs with wake_up_klogd:A.
+		 */
+		len = wait_event_interruptible(log_wait,
+				prb_read_valid(prb, seq, NULL)); /* LMM(syslog_print:A) */
 		mutex_lock(&syslog_lock);
 
 		if (len)
@@ -3316,7 +3338,18 @@ void wake_up_klogd(void)
 		return;
 
 	preempt_disable();
-	if (waitqueue_active(&log_wait)) {
+	/*
+	 * Guarantee any new records can be seen by tasks preparing to wait
+	 * before this context checks if the wait queue is empty.
+	 *
+	 * The full memory barrier within wq_has_sleeper() pairs with the full
+	 * memory barrier within set_current_state() of
+	 * prepare_to_wait_event(), which is called after ___wait_event() adds
+	 * the waiter but before it has checked the wait condition.
+	 *
+	 * This pairs with devkmsg_read:A and syslog_print:A.
+	 */
+	if (wq_has_sleeper(&log_wait)) { /* LMM(wake_up_klogd:A) */
 		this_cpu_or(printk_pending, PRINTK_PENDING_WAKEUP);
 		irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
 	}
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH printk v4 04/15] printk: wake up all waiters
  2022-04-21 21:22 [PATCH printk v4 00/15] implement threaded console printing John Ogness
                   ` (2 preceding siblings ...)
  2022-04-21 21:22 ` [PATCH printk v4 03/15] printk: add missing memory barrier to wake_up_klogd() John Ogness
@ 2022-04-21 21:22 ` John Ogness
  2022-04-21 21:22 ` [PATCH printk v4 05/15] printk: wake waiters for safe and NMI contexts John Ogness
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 99+ messages in thread
From: John Ogness @ 2022-04-21 21:22 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel

There can be multiple tasks waiting for new records. They should
all be woken. Use wake_up_interruptible_all() instead of
wake_up_interruptible().

Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
---
 kernel/printk/printk.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index f817dfb4852d..e23357002648 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -3326,7 +3326,7 @@ static void wake_up_klogd_work_func(struct irq_work *irq_work)
 	}
 
 	if (pending & PRINTK_PENDING_WAKEUP)
-		wake_up_interruptible(&log_wait);
+		wake_up_interruptible_all(&log_wait);
 }
 
 static DEFINE_PER_CPU(struct irq_work, wake_up_klogd_work) =
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH printk v4 05/15] printk: wake waiters for safe and NMI contexts
  2022-04-21 21:22 [PATCH printk v4 00/15] implement threaded console printing John Ogness
                   ` (3 preceding siblings ...)
  2022-04-21 21:22 ` [PATCH printk v4 04/15] printk: wake up all waiters John Ogness
@ 2022-04-21 21:22 ` John Ogness
  2022-04-21 21:22 ` [PATCH printk v4 06/15] printk: get caller_id/timestamp after migration disable John Ogness
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 99+ messages in thread
From: John Ogness @ 2022-04-21 21:22 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel

When printk() is called from safe or NMI contexts, it will directly
store the record (vprintk_store()) and then defer the console output.
However, defer_console_output() only causes console printing and does
not wake any waiters of new records.

Wake waiters from defer_console_output() so that they also are aware
of the new records from safe and NMI contexts.

Fixes: 03fc7f9c99c1 ("printk/nmi: Prevent deadlock when accessing the main log buffer in NMI")
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
---
 kernel/printk/printk.c | 28 ++++++++++++++++------------
 1 file changed, 16 insertions(+), 12 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index e23357002648..7bb148a1debb 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -754,7 +754,7 @@ static ssize_t devkmsg_read(struct file *file, char __user *buf,
 		 * prepare_to_wait_event() pairs with the full memory barrier
 		 * within wq_has_sleeper().
 		 *
-		 * This pairs with wake_up_klogd:A.
+		 * This pairs with __wake_up_klogd:A.
 		 */
 		ret = wait_event_interruptible(log_wait,
 				prb_read_valid(prb,
@@ -1532,7 +1532,7 @@ static int syslog_print(char __user *buf, int size)
 		 * prepare_to_wait_event() pairs with the full memory barrier
 		 * within wq_has_sleeper().
 		 *
-		 * This pairs with wake_up_klogd:A.
+		 * This pairs with __wake_up_klogd:A.
 		 */
 		len = wait_event_interruptible(log_wait,
 				prb_read_valid(prb, seq, NULL)); /* LMM(syslog_print:A) */
@@ -3332,7 +3332,7 @@ static void wake_up_klogd_work_func(struct irq_work *irq_work)
 static DEFINE_PER_CPU(struct irq_work, wake_up_klogd_work) =
 	IRQ_WORK_INIT_LAZY(wake_up_klogd_work_func);
 
-void wake_up_klogd(void)
+static void __wake_up_klogd(int val)
 {
 	if (!printk_percpu_data_ready())
 		return;
@@ -3349,22 +3349,26 @@ void wake_up_klogd(void)
 	 *
 	 * This pairs with devkmsg_read:A and syslog_print:A.
 	 */
-	if (wq_has_sleeper(&log_wait)) { /* LMM(wake_up_klogd:A) */
-		this_cpu_or(printk_pending, PRINTK_PENDING_WAKEUP);
+	if (wq_has_sleeper(&log_wait) || /* LMM(__wake_up_klogd:A) */
+	    (val & PRINTK_PENDING_OUTPUT)) {
+		this_cpu_or(printk_pending, val);
 		irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
 	}
 	preempt_enable();
 }
 
-void defer_console_output(void)
+void wake_up_klogd(void)
 {
-	if (!printk_percpu_data_ready())
-		return;
+	__wake_up_klogd(PRINTK_PENDING_WAKEUP);
+}
 
-	preempt_disable();
-	this_cpu_or(printk_pending, PRINTK_PENDING_OUTPUT);
-	irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
-	preempt_enable();
+void defer_console_output(void)
+{
+	/*
+	 * New messages may have been added directly to the ringbuffer
+	 * using vprintk_store(), so wake any waiters as well.
+	 */
+	__wake_up_klogd(PRINTK_PENDING_WAKEUP | PRINTK_PENDING_OUTPUT);
 }
 
 void printk_trigger_flush(void)
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH printk v4 06/15] printk: get caller_id/timestamp after migration disable
  2022-04-21 21:22 [PATCH printk v4 00/15] implement threaded console printing John Ogness
                   ` (4 preceding siblings ...)
  2022-04-21 21:22 ` [PATCH printk v4 05/15] printk: wake waiters for safe and NMI contexts John Ogness
@ 2022-04-21 21:22 ` John Ogness
  2022-04-21 21:22 ` [PATCH printk v4 07/15] printk: call boot_delay_msec() in printk_delay() John Ogness
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 99+ messages in thread
From: John Ogness @ 2022-04-21 21:22 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel

Currently the local CPU timestamp and caller_id for the record are
collected while migration is enabled. Since this information is
CPU-specific, it should be collected with migration disabled.

Migration is disabled immediately after collecting this information
anyway, so just move the information collection to after the
migration disabling.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
---
 kernel/printk/printk.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 7bb148a1debb..82ad3d3d0d4a 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2063,7 +2063,7 @@ static inline void printk_delay(void)
 static inline u32 printk_caller_id(void)
 {
 	return in_task() ? task_pid_nr(current) :
-		0x80000000 + raw_smp_processor_id();
+		0x80000000 + smp_processor_id();
 }
 
 /**
@@ -2145,7 +2145,6 @@ int vprintk_store(int facility, int level,
 		  const struct dev_printk_info *dev_info,
 		  const char *fmt, va_list args)
 {
-	const u32 caller_id = printk_caller_id();
 	struct prb_reserved_entry e;
 	enum printk_info_flags flags = 0;
 	struct printk_record r;
@@ -2155,10 +2154,14 @@ int vprintk_store(int facility, int level,
 	u8 *recursion_ptr;
 	u16 reserve_size;
 	va_list args2;
+	u32 caller_id;
 	u16 text_len;
 	int ret = 0;
 	u64 ts_nsec;
 
+	if (!printk_enter_irqsave(recursion_ptr, irqflags))
+		return 0;
+
 	/*
 	 * Since the duration of printk() can vary depending on the message
 	 * and state of the ringbuffer, grab the timestamp now so that it is
@@ -2167,8 +2170,7 @@ int vprintk_store(int facility, int level,
 	 */
 	ts_nsec = local_clock();
 
-	if (!printk_enter_irqsave(recursion_ptr, irqflags))
-		return 0;
+	caller_id = printk_caller_id();
 
 	/*
 	 * The sprintf needs to come first since the syslog prefix might be
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH printk v4 07/15] printk: call boot_delay_msec() in printk_delay()
  2022-04-21 21:22 [PATCH printk v4 00/15] implement threaded console printing John Ogness
                   ` (5 preceding siblings ...)
  2022-04-21 21:22 ` [PATCH printk v4 06/15] printk: get caller_id/timestamp after migration disable John Ogness
@ 2022-04-21 21:22 ` John Ogness
  2022-04-21 21:22 ` [PATCH printk v4 08/15] printk: add con_printk() macro for console details John Ogness
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 99+ messages in thread
From: John Ogness @ 2022-04-21 21:22 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel

boot_delay_msec() is always called immediately before printk_delay()
so just call it from within printk_delay().

Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
---
 kernel/printk/printk.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 82ad3d3d0d4a..2f99e0b383b9 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2048,8 +2048,10 @@ static u8 *__printk_recursion_counter(void)
 
 int printk_delay_msec __read_mostly;
 
-static inline void printk_delay(void)
+static inline void printk_delay(int level)
 {
+	boot_delay_msec(level);
+
 	if (unlikely(printk_delay_msec)) {
 		int m = printk_delay_msec;
 
@@ -2274,8 +2276,7 @@ asmlinkage int vprintk_emit(int facility, int level,
 		in_sched = true;
 	}
 
-	boot_delay_msec(level);
-	printk_delay();
+	printk_delay(level);
 
 	printed_len = vprintk_store(facility, level, dev_info, fmt, args);
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH printk v4 08/15] printk: add con_printk() macro for console details
  2022-04-21 21:22 [PATCH printk v4 00/15] implement threaded console printing John Ogness
                   ` (6 preceding siblings ...)
  2022-04-21 21:22 ` [PATCH printk v4 07/15] printk: call boot_delay_msec() in printk_delay() John Ogness
@ 2022-04-21 21:22 ` John Ogness
  2022-04-21 21:22 ` [PATCH printk v4 09/15] printk: refactor and rework printing logic John Ogness
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 99+ messages in thread
From: John Ogness @ 2022-04-21 21:22 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel

It is useful to generate log messages that include details about
the related console. Rather than duplicate the code to assemble
the details, put that code into a macro con_printk().

Once console printers become threaded, this macro will find more
users.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
---
 kernel/printk/printk.c | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 2f99e0b383b9..e36d3ed41afa 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -3015,6 +3015,11 @@ static void try_enable_default_console(struct console *newcon)
 		newcon->flags |= CON_CONSDEV;
 }
 
+#define con_printk(lvl, con, fmt, ...)			\
+	printk(lvl pr_fmt("%sconsole [%s%d] " fmt),	\
+	       (con->flags & CON_BOOT) ? "boot" : "",	\
+	       con->name, con->index, ##__VA_ARGS__)
+
 /*
  * The console driver calls this routine during kernel initialization
  * to register the console printing procedure with printk() and to
@@ -3153,9 +3158,7 @@ void register_console(struct console *newcon)
 	 * users know there might be something in the kernel's log buffer that
 	 * went to the bootconsole (that they do not see on the real console)
 	 */
-	pr_info("%sconsole [%s%d] enabled\n",
-		(newcon->flags & CON_BOOT) ? "boot" : "" ,
-		newcon->name, newcon->index);
+	con_printk(KERN_INFO, newcon, "enabled\n");
 	if (bootcon_enabled &&
 	    ((newcon->flags & (CON_CONSDEV | CON_BOOT)) == CON_CONSDEV) &&
 	    !keep_bootcon) {
@@ -3174,9 +3177,7 @@ int unregister_console(struct console *console)
 	struct console *con;
 	int res;
 
-	pr_info("%sconsole [%s%d] disabled\n",
-		(console->flags & CON_BOOT) ? "boot" : "" ,
-		console->name, console->index);
+	con_printk(KERN_INFO, console, "disabled\n");
 
 	res = _braille_unregister_console(console);
 	if (res < 0)
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH printk v4 09/15] printk: refactor and rework printing logic
  2022-04-21 21:22 [PATCH printk v4 00/15] implement threaded console printing John Ogness
                   ` (7 preceding siblings ...)
  2022-04-21 21:22 ` [PATCH printk v4 08/15] printk: add con_printk() macro for console details John Ogness
@ 2022-04-21 21:22 ` John Ogness
  2022-04-21 21:22 ` [PATCH printk v4 10/15] printk: move buffer definitions into console_emit_next_record() caller John Ogness
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 99+ messages in thread
From: John Ogness @ 2022-04-21 21:22 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman

Refactor/rework printing logic in order to prepare for moving to
threaded console printing.

- Move @console_seq into struct console so that the current
  "position" of each console can be tracked individually.

- Move @console_dropped into struct console so that the current drop
  count of each console can be tracked individually.

- Modify printing logic so that each console independently loads,
  prepares, and prints its next record.

- Remove exclusive_console logic. Since console positions are
  handled independently, replaying past records occurs naturally.

- Update the comments explaining why preemption is disabled while
  printing from printk() context.

With these changes, there is a change in behavior: the console
replaying the log (formerly exclusive console) will no longer block
other consoles. New messages appear on the other consoles while the
newly added console is still replaying.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
---
 include/linux/console.h |   2 +
 kernel/printk/printk.c  | 441 +++++++++++++++++++++-------------------
 2 files changed, 230 insertions(+), 213 deletions(-)

diff --git a/include/linux/console.h b/include/linux/console.h
index 7cd758a4f44e..8c1686e2c233 100644
--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -151,6 +151,8 @@ struct console {
 	int	cflag;
 	uint	ispeed;
 	uint	ospeed;
+	u64	seq;
+	unsigned long dropped;
 	void	*data;
 	struct	 console *next;
 };
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index e36d3ed41afa..3dea8bbaf402 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -280,11 +280,6 @@ static bool panic_in_progress(void)
  */
 static int console_locked, console_suspended;
 
-/*
- * If exclusive_console is non-NULL then only this console is to be printed to.
- */
-static struct console *exclusive_console;
-
 /*
  *	Array of consoles built from command line options (console=)
  */
@@ -374,12 +369,6 @@ static u64 syslog_seq;
 static size_t syslog_partial;
 static bool syslog_time;
 
-/* All 3 protected by @console_sem. */
-/* the next printk record to write to the console */
-static u64 console_seq;
-static u64 exclusive_console_stop_seq;
-static unsigned long console_dropped;
-
 struct latched_seq {
 	seqcount_latch_t	latch;
 	u64			val[2];
@@ -1933,47 +1922,26 @@ static int console_trylock_spinning(void)
 }
 
 /*
- * Call the console drivers, asking them to write out
- * log_buf[start] to log_buf[end - 1].
- * The console_lock must be held.
+ * Call the specified console driver, asking it to write out the specified
+ * text and length. For non-extended consoles, if any records have been
+ * dropped, a dropped message will be written out first.
  */
-static void call_console_drivers(const char *ext_text, size_t ext_len,
-				 const char *text, size_t len)
+static void call_console_driver(struct console *con, const char *text, size_t len)
 {
 	static char dropped_text[64];
-	size_t dropped_len = 0;
-	struct console *con;
+	size_t dropped_len;
 
 	trace_console_rcuidle(text, len);
 
-	if (!console_drivers)
-		return;
-
-	if (console_dropped) {
+	if (con->dropped && !(con->flags & CON_EXTENDED)) {
 		dropped_len = snprintf(dropped_text, sizeof(dropped_text),
 				       "** %lu printk messages dropped **\n",
-				       console_dropped);
-		console_dropped = 0;
+				       con->dropped);
+		con->dropped = 0;
+		con->write(con, dropped_text, dropped_len);
 	}
 
-	for_each_console(con) {
-		if (exclusive_console && con != exclusive_console)
-			continue;
-		if (!(con->flags & CON_ENABLED))
-			continue;
-		if (!con->write)
-			continue;
-		if (!cpu_online(smp_processor_id()) &&
-		    !(con->flags & CON_ANYTIME))
-			continue;
-		if (con->flags & CON_EXTENDED)
-			con->write(con, ext_text, ext_len);
-		else {
-			if (dropped_len)
-				con->write(con, dropped_text, dropped_len);
-			con->write(con, text, len);
-		}
-	}
+	con->write(con, text, len);
 }
 
 /*
@@ -2283,15 +2251,18 @@ asmlinkage int vprintk_emit(int facility, int level,
 	/* If called from the scheduler, we can not call up(). */
 	if (!in_sched) {
 		/*
-		 * Disable preemption to avoid being preempted while holding
-		 * console_sem which would prevent anyone from printing to
-		 * console
+		 * The caller may be holding system-critical or
+		 * timing-sensitive locks. Disable preemption during
+		 * printing of all remaining records to all consoles so that
+		 * this context can return as soon as possible. Hopefully
+		 * another printk() caller will take over the printing.
 		 */
 		preempt_disable();
 		/*
 		 * Try to acquire and then immediately release the console
-		 * semaphore.  The release will print out buffers and wake up
-		 * /dev/kmsg and syslog() users.
+		 * semaphore. The release will print out buffers. With the
+		 * spinning variant, this context tries to take over the
+		 * printing from another printing context.
 		 */
 		if (console_trylock_spinning())
 			console_unlock();
@@ -2329,11 +2300,9 @@ EXPORT_SYMBOL(_printk);
 
 #define prb_read_valid(rb, seq, r)	false
 #define prb_first_valid_seq(rb)		0
+#define prb_next_seq(rb)		0
 
 static u64 syslog_seq;
-static u64 console_seq;
-static u64 exclusive_console_stop_seq;
-static unsigned long console_dropped;
 
 static size_t record_print_text(const struct printk_record *r,
 				bool syslog, bool time)
@@ -2350,8 +2319,7 @@ static ssize_t msg_print_ext_body(char *buf, size_t size,
 				  struct dev_printk_info *dev_info) { return 0; }
 static void console_lock_spinning_enable(void) { }
 static int console_lock_spinning_disable_and_check(void) { return 0; }
-static void call_console_drivers(const char *ext_text, size_t ext_len,
-				 const char *text, size_t len) {}
+static void call_console_driver(struct console *con, const char *text, size_t len) { }
 static bool suppress_message_printing(int level) { return false; }
 
 #endif /* CONFIG_PRINTK */
@@ -2621,22 +2589,6 @@ int is_console_locked(void)
 }
 EXPORT_SYMBOL(is_console_locked);
 
-/*
- * Check if we have any console that is capable of printing while cpu is
- * booting or shutting down. Requires console_sem.
- */
-static int have_callable_console(void)
-{
-	struct console *con;
-
-	for_each_console(con)
-		if ((con->flags & CON_ENABLED) &&
-				(con->flags & CON_ANYTIME))
-			return 1;
-
-	return 0;
-}
-
 /*
  * Return true when this CPU should unlock console_sem without pushing all
  * messages to the console. This reduces the chance that the console is
@@ -2657,15 +2609,182 @@ static bool abandon_console_lock_in_panic(void)
 }
 
 /*
- * Can we actually use the console at this time on this cpu?
+ * Check if the given console is currently capable and allowed to print
+ * records.
+ *
+ * Requires the console_lock.
+ */
+static inline bool console_is_usable(struct console *con)
+{
+	if (!(con->flags & CON_ENABLED))
+		return false;
+
+	if (!con->write)
+		return false;
+
+	/*
+	 * Console drivers may assume that per-cpu resources have been
+	 * allocated. So unless they're explicitly marked as being able to
+	 * cope (CON_ANYTIME) don't call them until this CPU is officially up.
+	 */
+	if (!cpu_online(raw_smp_processor_id()) &&
+	    !(con->flags & CON_ANYTIME))
+		return false;
+
+	return true;
+}
+
+static void __console_unlock(void)
+{
+	console_locked = 0;
+	up_console_sem();
+}
+
+/*
+ * Print one record for the given console. The record printed is whatever
+ * record is the next available record for the given console.
+ *
+ * @handover will be set to true if a printk waiter has taken over the
+ * console_lock, in which case the caller is no longer holding the
+ * console_lock. Otherwise it is set to false.
+ *
+ * Returns false if the given console has no next record to print, otherwise
+ * true.
  *
- * Console drivers may assume that per-cpu resources have been allocated. So
- * unless they're explicitly marked as being able to cope (CON_ANYTIME) don't
- * call them until this CPU is officially up.
+ * Requires the console_lock.
  */
-static inline int can_use_console(void)
+static bool console_emit_next_record(struct console *con, bool *handover)
 {
-	return cpu_online(raw_smp_processor_id()) || have_callable_console();
+	static char ext_text[CONSOLE_EXT_LOG_MAX];
+	static char text[CONSOLE_LOG_MAX];
+	static int panic_console_dropped;
+	struct printk_info info;
+	struct printk_record r;
+	unsigned long flags;
+	char *write_text;
+	size_t len;
+
+	prb_rec_init_rd(&r, &info, text, sizeof(text));
+
+	*handover = false;
+
+	if (!prb_read_valid(prb, con->seq, &r))
+		return false;
+
+	if (con->seq != r.info->seq) {
+		con->dropped += r.info->seq - con->seq;
+		con->seq = r.info->seq;
+		if (panic_in_progress() && panic_console_dropped++ > 10) {
+			suppress_panic_printk = 1;
+			pr_warn_once("Too many dropped messages. Suppress messages on non-panic CPUs to prevent livelock.\n");
+		}
+	}
+
+	/* Skip record that has level above the console loglevel. */
+	if (suppress_message_printing(r.info->level)) {
+		con->seq++;
+		goto skip;
+	}
+
+	if (con->flags & CON_EXTENDED) {
+		write_text = &ext_text[0];
+		len = info_print_ext_header(ext_text, sizeof(ext_text), r.info);
+		len += msg_print_ext_body(ext_text + len, sizeof(ext_text) - len,
+					  &r.text_buf[0], r.info->text_len, &r.info->dev_info);
+	} else {
+		write_text = &text[0];
+		len = record_print_text(&r, console_msg_format & MSG_FORMAT_SYSLOG, printk_time);
+	}
+
+	/*
+	 * While actively printing out messages, if another printk()
+	 * were to occur on another CPU, it may wait for this one to
+	 * finish. This task can not be preempted if there is a
+	 * waiter waiting to take over.
+	 *
+	 * Interrupts are disabled because the hand over to a waiter
+	 * must not be interrupted until the hand over is completed
+	 * (@console_waiter is cleared).
+	 */
+	printk_safe_enter_irqsave(flags);
+	console_lock_spinning_enable();
+
+	stop_critical_timings();	/* don't trace print latency */
+	call_console_driver(con, write_text, len);
+	start_critical_timings();
+
+	con->seq++;
+
+	*handover = console_lock_spinning_disable_and_check();
+	printk_safe_exit_irqrestore(flags);
+skip:
+	return true;
+}
+
+/*
+ * Print out all remaining records to all consoles.
+ *
+ * @do_cond_resched is set by the caller. It can be true only in schedulable
+ * context.
+ *
+ * @next_seq is set to the sequence number after the last available record.
+ * The value is valid only when this function returns true. It means that all
+ * usable consoles are completely flushed.
+ *
+ * @handover will be set to true if a printk waiter has taken over the
+ * console_lock, in which case the caller is no longer holding the
+ * console_lock. Otherwise it is set to false.
+ *
+ * Returns true when there was at least one usable console and all messages
+ * were flushed to all usable consoles. A returned false informs the caller
+ * that everything was not flushed (either there were no usable consoles or
+ * another context has taken over printing or it is a panic situation and this
+ * is not the panic CPU). Regardless the reason, the caller should assume it
+ * is not useful to immediately try again.
+ *
+ * Requires the console_lock.
+ */
+static bool console_flush_all(bool do_cond_resched, u64 *next_seq, bool *handover)
+{
+	bool any_usable = false;
+	struct console *con;
+	bool any_progress;
+
+	*next_seq = 0;
+	*handover = false;
+
+	do {
+		any_progress = false;
+
+		for_each_console(con) {
+			bool progress;
+
+			if (!console_is_usable(con))
+				continue;
+			any_usable = true;
+
+			progress = console_emit_next_record(con, handover);
+			if (*handover)
+				return false;
+
+			/* Track the next of the highest seq flushed. */
+			if (con->seq > *next_seq)
+				*next_seq = con->seq;
+
+			if (!progress)
+				continue;
+			any_progress = true;
+
+			/* Allow panic_cpu to take over the consoles safely. */
+			if (abandon_console_lock_in_panic())
+				return false;
+
+			if (do_cond_resched)
+				cond_resched();
+		}
+	} while (any_progress);
+
+	return any_usable;
 }
 
 /**
@@ -2678,28 +2797,20 @@ static inline int can_use_console(void)
  * by printk().  If this is the case, console_unlock(); emits
  * the output prior to releasing the lock.
  *
- * If there is output waiting, we wake /dev/kmsg and syslog() users.
- *
  * console_unlock(); may be called from any context.
  */
 void console_unlock(void)
 {
-	static char ext_text[CONSOLE_EXT_LOG_MAX];
-	static char text[CONSOLE_LOG_MAX];
-	static int panic_console_dropped;
-	unsigned long flags;
-	bool do_cond_resched, retry;
-	struct printk_info info;
-	struct printk_record r;
-	u64 __maybe_unused next_seq;
+	bool do_cond_resched;
+	bool handover;
+	bool flushed;
+	u64 next_seq;
 
 	if (console_suspended) {
 		up_console_sem();
 		return;
 	}
 
-	prb_rec_init_rd(&r, &info, text, sizeof(text));
-
 	/*
 	 * Console drivers are called with interrupts disabled, so
 	 * @console_may_schedule should be cleared before; however, we may
@@ -2708,125 +2819,34 @@ void console_unlock(void)
 	 * between lines if allowable.  Not doing so can cause a very long
 	 * scheduling stall on a slow console leading to RCU stall and
 	 * softlockup warnings which exacerbate the issue with more
-	 * messages practically incapacitating the system.
-	 *
-	 * console_trylock() is not able to detect the preemptive
-	 * context reliably. Therefore the value must be stored before
-	 * and cleared after the "again" goto label.
+	 * messages practically incapacitating the system. Therefore, create
+	 * a local to use for the printing loop.
 	 */
 	do_cond_resched = console_may_schedule;
-again:
-	console_may_schedule = 0;
-
-	/*
-	 * We released the console_sem lock, so we need to recheck if
-	 * cpu is online and (if not) is there at least one CON_ANYTIME
-	 * console.
-	 */
-	if (!can_use_console()) {
-		console_locked = 0;
-		up_console_sem();
-		return;
-	}
 
-	for (;;) {
-		size_t ext_len = 0;
-		int handover;
-		size_t len;
-
-skip:
-		if (!prb_read_valid(prb, console_seq, &r))
-			break;
-
-		if (console_seq != r.info->seq) {
-			console_dropped += r.info->seq - console_seq;
-			console_seq = r.info->seq;
-			if (panic_in_progress() && panic_console_dropped++ > 10) {
-				suppress_panic_printk = 1;
-				pr_warn_once("Too many dropped messages. Suppress messages on non-panic CPUs to prevent livelock.\n");
-			}
-		}
-
-		if (suppress_message_printing(r.info->level)) {
-			/*
-			 * Skip record we have buffered and already printed
-			 * directly to the console when we received it, and
-			 * record that has level above the console loglevel.
-			 */
-			console_seq++;
-			goto skip;
-		}
+	do {
+		console_may_schedule = 0;
 
-		/* Output to all consoles once old messages replayed. */
-		if (unlikely(exclusive_console &&
-			     console_seq >= exclusive_console_stop_seq)) {
-			exclusive_console = NULL;
-		}
+		flushed = console_flush_all(do_cond_resched, &next_seq, &handover);
+		if (!handover)
+			__console_unlock();
 
 		/*
-		 * Handle extended console text first because later
-		 * record_print_text() will modify the record buffer in-place.
+		 * Abort if there was a failure to flush all messages to all
+		 * usable consoles. Either it is not possible to flush (in
+		 * which case it would be an infinite loop of retrying) or
+		 * another context has taken over printing.
 		 */
-		if (nr_ext_console_drivers) {
-			ext_len = info_print_ext_header(ext_text,
-						sizeof(ext_text),
-						r.info);
-			ext_len += msg_print_ext_body(ext_text + ext_len,
-						sizeof(ext_text) - ext_len,
-						&r.text_buf[0],
-						r.info->text_len,
-						&r.info->dev_info);
-		}
-		len = record_print_text(&r,
-				console_msg_format & MSG_FORMAT_SYSLOG,
-				printk_time);
-		console_seq++;
+		if (!flushed)
+			break;
 
 		/*
-		 * While actively printing out messages, if another printk()
-		 * were to occur on another CPU, it may wait for this one to
-		 * finish. This task can not be preempted if there is a
-		 * waiter waiting to take over.
-		 *
-		 * Interrupts are disabled because the hand over to a waiter
-		 * must not be interrupted until the hand over is completed
-		 * (@console_waiter is cleared).
+		 * Some context may have added new records after
+		 * console_flush_all() but before unlocking the console.
+		 * Re-check if there is a new record to flush. If the trylock
+		 * fails, another context is already handling the printing.
 		 */
-		printk_safe_enter_irqsave(flags);
-		console_lock_spinning_enable();
-
-		stop_critical_timings();	/* don't trace print latency */
-		call_console_drivers(ext_text, ext_len, text, len);
-		start_critical_timings();
-
-		handover = console_lock_spinning_disable_and_check();
-		printk_safe_exit_irqrestore(flags);
-		if (handover)
-			return;
-
-		/* Allow panic_cpu to take over the consoles safely */
-		if (abandon_console_lock_in_panic())
-			break;
-
-		if (do_cond_resched)
-			cond_resched();
-	}
-
-	/* Get consistent value of the next-to-be-used sequence number. */
-	next_seq = console_seq;
-
-	console_locked = 0;
-	up_console_sem();
-
-	/*
-	 * Someone could have filled up the buffer again, so re-check if there's
-	 * something to flush. In case we cannot trylock the console_sem again,
-	 * there's a new owner and the console_unlock() from them will do the
-	 * flush, no worries.
-	 */
-	retry = prb_read_valid(prb, next_seq, NULL);
-	if (retry && !abandon_console_lock_in_panic() && console_trylock())
-		goto again;
+	} while (prb_read_valid(prb, next_seq, NULL) && console_trylock());
 }
 EXPORT_SYMBOL(console_unlock);
 
@@ -2886,8 +2906,14 @@ void console_flush_on_panic(enum con_flush_mode mode)
 	console_trylock();
 	console_may_schedule = 0;
 
-	if (mode == CONSOLE_REPLAY_ALL)
-		console_seq = prb_first_valid_seq(prb);
+	if (mode == CONSOLE_REPLAY_ALL) {
+		struct console *c;
+		u64 seq;
+
+		seq = prb_first_valid_seq(prb);
+		for_each_console(c)
+			c->seq = seq;
+	}
 	console_unlock();
 }
 
@@ -3127,26 +3153,15 @@ void register_console(struct console *newcon)
 	if (newcon->flags & CON_EXTENDED)
 		nr_ext_console_drivers++;
 
+	newcon->dropped = 0;
 	if (newcon->flags & CON_PRINTBUFFER) {
-		/*
-		 * console_unlock(); will print out the buffered messages
-		 * for us.
-		 *
-		 * We're about to replay the log buffer.  Only do this to the
-		 * just-registered console to avoid excessive message spam to
-		 * the already-registered consoles.
-		 *
-		 * Set exclusive_console with disabled interrupts to reduce
-		 * race window with eventual console_flush_on_panic() that
-		 * ignores console_lock.
-		 */
-		exclusive_console = newcon;
-		exclusive_console_stop_seq = console_seq;
-
 		/* Get a consistent copy of @syslog_seq. */
 		mutex_lock(&syslog_lock);
-		console_seq = syslog_seq;
+		newcon->seq = syslog_seq;
 		mutex_unlock(&syslog_lock);
+	} else {
+		/* Begin with next message. */
+		newcon->seq = prb_next_seq(prb);
 	}
 	console_unlock();
 	console_sysfs_notify();
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH printk v4 10/15] printk: move buffer definitions into console_emit_next_record() caller
  2022-04-21 21:22 [PATCH printk v4 00/15] implement threaded console printing John Ogness
                   ` (8 preceding siblings ...)
  2022-04-21 21:22 ` [PATCH printk v4 09/15] printk: refactor and rework printing logic John Ogness
@ 2022-04-21 21:22 ` John Ogness
  2022-04-21 21:22 ` [PATCH printk v4 11/15] printk: add pr_flush() John Ogness
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 99+ messages in thread
From: John Ogness @ 2022-04-21 21:22 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel

Extended consoles print extended messages and do not print messages about
dropped records.

Non-extended consoles print "normal" messages as well as extra messages
about dropped records.

Currently the buffers for these various message types are defined within
the functions that might use them and their usage is based upon the
CON_EXTENDED flag. This will be a problem when moving to kthread printers
because each printer must be able to provide its own buffers.

Move all the message buffer definitions outside of
console_emit_next_record(). The caller knows if extended or dropped
messages should be printed and can specify the appropriate buffers to
use. The console_emit_next_record() and call_console_driver() functions
can know what to print based on whether specified buffers are non-NULL.

With this change, buffer definition/allocation/specification is separated
from the code that does the various types of string printing.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
---
 kernel/printk/printk.c | 60 ++++++++++++++++++++++++++++++------------
 1 file changed, 43 insertions(+), 17 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 3dea8bbaf402..dec5355c5b5b 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -394,6 +394,9 @@ static struct latched_seq clear_seq = {
 /* the maximum size of a formatted record (i.e. with prefix added per line) */
 #define CONSOLE_LOG_MAX		1024
 
+/* the maximum size for a dropped text message */
+#define DROPPED_TEXT_MAX	64
+
 /* the maximum size allowed to be reserved for a record */
 #define LOG_LINE_MAX		(CONSOLE_LOG_MAX - PREFIX_MAX)
 
@@ -1923,18 +1926,18 @@ static int console_trylock_spinning(void)
 
 /*
  * Call the specified console driver, asking it to write out the specified
- * text and length. For non-extended consoles, if any records have been
+ * text and length. If @dropped_text is non-NULL and any records have been
  * dropped, a dropped message will be written out first.
  */
-static void call_console_driver(struct console *con, const char *text, size_t len)
+static void call_console_driver(struct console *con, const char *text, size_t len,
+				char *dropped_text)
 {
-	static char dropped_text[64];
 	size_t dropped_len;
 
 	trace_console_rcuidle(text, len);
 
-	if (con->dropped && !(con->flags & CON_EXTENDED)) {
-		dropped_len = snprintf(dropped_text, sizeof(dropped_text),
+	if (con->dropped && dropped_text) {
+		dropped_len = snprintf(dropped_text, DROPPED_TEXT_MAX,
 				       "** %lu printk messages dropped **\n",
 				       con->dropped);
 		con->dropped = 0;
@@ -2296,6 +2299,7 @@ EXPORT_SYMBOL(_printk);
 #else /* CONFIG_PRINTK */
 
 #define CONSOLE_LOG_MAX		0
+#define DROPPED_TEXT_MAX	0
 #define printk_time		false
 
 #define prb_read_valid(rb, seq, r)	false
@@ -2319,7 +2323,10 @@ static ssize_t msg_print_ext_body(char *buf, size_t size,
 				  struct dev_printk_info *dev_info) { return 0; }
 static void console_lock_spinning_enable(void) { }
 static int console_lock_spinning_disable_and_check(void) { return 0; }
-static void call_console_driver(struct console *con, const char *text, size_t len) { }
+static void call_console_driver(struct console *con, const char *text, size_t len,
+				char *dropped_text)
+{
+}
 static bool suppress_message_printing(int level) { return false; }
 
 #endif /* CONFIG_PRINTK */
@@ -2644,6 +2651,14 @@ static void __console_unlock(void)
  * Print one record for the given console. The record printed is whatever
  * record is the next available record for the given console.
  *
+ * @text is a buffer of size CONSOLE_LOG_MAX.
+ *
+ * If extended messages should be printed, @ext_text is a buffer of size
+ * CONSOLE_EXT_LOG_MAX. Otherwise @ext_text must be NULL.
+ *
+ * If dropped messages should be printed, @dropped_text is a buffer of size
+ * DROPPED_TEXT_MAX. Otherwise @dropped_text must be NULL.
+ *
  * @handover will be set to true if a printk waiter has taken over the
  * console_lock, in which case the caller is no longer holding the
  * console_lock. Otherwise it is set to false.
@@ -2653,10 +2668,9 @@ static void __console_unlock(void)
  *
  * Requires the console_lock.
  */
-static bool console_emit_next_record(struct console *con, bool *handover)
+static bool console_emit_next_record(struct console *con, char *text, char *ext_text,
+				     char *dropped_text, bool *handover)
 {
-	static char ext_text[CONSOLE_EXT_LOG_MAX];
-	static char text[CONSOLE_LOG_MAX];
 	static int panic_console_dropped;
 	struct printk_info info;
 	struct printk_record r;
@@ -2664,7 +2678,7 @@ static bool console_emit_next_record(struct console *con, bool *handover)
 	char *write_text;
 	size_t len;
 
-	prb_rec_init_rd(&r, &info, text, sizeof(text));
+	prb_rec_init_rd(&r, &info, text, CONSOLE_LOG_MAX);
 
 	*handover = false;
 
@@ -2686,13 +2700,13 @@ static bool console_emit_next_record(struct console *con, bool *handover)
 		goto skip;
 	}
 
-	if (con->flags & CON_EXTENDED) {
-		write_text = &ext_text[0];
-		len = info_print_ext_header(ext_text, sizeof(ext_text), r.info);
-		len += msg_print_ext_body(ext_text + len, sizeof(ext_text) - len,
+	if (ext_text) {
+		write_text = ext_text;
+		len = info_print_ext_header(ext_text, CONSOLE_EXT_LOG_MAX, r.info);
+		len += msg_print_ext_body(ext_text + len, CONSOLE_EXT_LOG_MAX - len,
 					  &r.text_buf[0], r.info->text_len, &r.info->dev_info);
 	} else {
-		write_text = &text[0];
+		write_text = text;
 		len = record_print_text(&r, console_msg_format & MSG_FORMAT_SYSLOG, printk_time);
 	}
 
@@ -2710,7 +2724,7 @@ static bool console_emit_next_record(struct console *con, bool *handover)
 	console_lock_spinning_enable();
 
 	stop_critical_timings();	/* don't trace print latency */
-	call_console_driver(con, write_text, len);
+	call_console_driver(con, write_text, len, dropped_text);
 	start_critical_timings();
 
 	con->seq++;
@@ -2746,6 +2760,9 @@ static bool console_emit_next_record(struct console *con, bool *handover)
  */
 static bool console_flush_all(bool do_cond_resched, u64 *next_seq, bool *handover)
 {
+	static char dropped_text[DROPPED_TEXT_MAX];
+	static char ext_text[CONSOLE_EXT_LOG_MAX];
+	static char text[CONSOLE_LOG_MAX];
 	bool any_usable = false;
 	struct console *con;
 	bool any_progress;
@@ -2763,7 +2780,16 @@ static bool console_flush_all(bool do_cond_resched, u64 *next_seq, bool *handove
 				continue;
 			any_usable = true;
 
-			progress = console_emit_next_record(con, handover);
+			if (con->flags & CON_EXTENDED) {
+				/* Extended consoles do not print "dropped messages". */
+				progress = console_emit_next_record(con, &text[0],
+								    &ext_text[0], NULL,
+								    handover);
+			} else {
+				progress = console_emit_next_record(con, &text[0],
+								    NULL, &dropped_text[0],
+								    handover);
+			}
 			if (*handover)
 				return false;
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH printk v4 11/15] printk: add pr_flush()
  2022-04-21 21:22 [PATCH printk v4 00/15] implement threaded console printing John Ogness
                   ` (9 preceding siblings ...)
  2022-04-21 21:22 ` [PATCH printk v4 10/15] printk: move buffer definitions into console_emit_next_record() caller John Ogness
@ 2022-04-21 21:22 ` John Ogness
  2022-04-21 21:22 ` [PATCH printk v4 12/15] printk: add functions to prefer direct printing John Ogness
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 99+ messages in thread
From: John Ogness @ 2022-04-21 21:22 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel

Provide a might-sleep function to allow waiting for console printers
to catch up to the latest logged message.

Use pr_flush() whenever it is desirable to get buffered messages
printed before continuing: suspend_console(), resume_console(),
console_stop(), console_start(), console_unblank().

Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
---
 include/linux/printk.h |  7 ++++
 kernel/printk/printk.c | 83 ++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 90 insertions(+)

diff --git a/include/linux/printk.h b/include/linux/printk.h
index b70a42f94031..091fba7283e1 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -170,6 +170,8 @@ extern void __printk_safe_exit(void);
 #define printk_deferred_enter __printk_safe_enter
 #define printk_deferred_exit __printk_safe_exit
 
+extern bool pr_flush(int timeout_ms, bool reset_on_progress);
+
 /*
  * Please don't use printk_ratelimit(), because it shares ratelimiting state
  * with all other unrelated printk_ratelimit() callsites.  Instead use
@@ -220,6 +222,11 @@ static inline void printk_deferred_exit(void)
 {
 }
 
+static inline bool pr_flush(int timeout_ms, bool reset_on_progress)
+{
+	return true;
+}
+
 static inline int printk_ratelimit(void)
 {
 	return 0;
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index dec5355c5b5b..a06999d55278 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2296,6 +2296,8 @@ asmlinkage __visible int _printk(const char *fmt, ...)
 }
 EXPORT_SYMBOL(_printk);
 
+static bool __pr_flush(struct console *con, int timeout_ms, bool reset_on_progress);
+
 #else /* CONFIG_PRINTK */
 
 #define CONSOLE_LOG_MAX		0
@@ -2328,6 +2330,7 @@ static void call_console_driver(struct console *con, const char *text, size_t le
 {
 }
 static bool suppress_message_printing(int level) { return false; }
+static bool __pr_flush(struct console *con, int timeout_ms, bool reset_on_progress) { return true; }
 
 #endif /* CONFIG_PRINTK */
 
@@ -2515,6 +2518,7 @@ void suspend_console(void)
 	if (!console_suspend_enabled)
 		return;
 	pr_info("Suspending console(s) (use no_console_suspend to debug)\n");
+	pr_flush(1000, true);
 	console_lock();
 	console_suspended = 1;
 	up_console_sem();
@@ -2527,6 +2531,7 @@ void resume_console(void)
 	down_console_sem();
 	console_suspended = 0;
 	console_unlock();
+	pr_flush(1000, true);
 }
 
 /**
@@ -2912,6 +2917,9 @@ void console_unblank(void)
 		if ((c->flags & CON_ENABLED) && c->unblank)
 			c->unblank();
 	console_unlock();
+
+	if (!oops_in_progress)
+		pr_flush(1000, true);
 }
 
 /**
@@ -2970,6 +2978,7 @@ struct tty_driver *console_device(int *index)
  */
 void console_stop(struct console *console)
 {
+	__pr_flush(console, 1000, true);
 	console_lock();
 	console->flags &= ~CON_ENABLED;
 	console_unlock();
@@ -2981,6 +2990,7 @@ void console_start(struct console *console)
 	console_lock();
 	console->flags |= CON_ENABLED;
 	console_unlock();
+	__pr_flush(console, 1000, true);
 }
 EXPORT_SYMBOL(console_start);
 
@@ -3352,6 +3362,79 @@ static int __init printk_late_init(void)
 late_initcall(printk_late_init);
 
 #if defined CONFIG_PRINTK
+/* If @con is specified, only wait for that console. Otherwise wait for all. */
+static bool __pr_flush(struct console *con, int timeout_ms, bool reset_on_progress)
+{
+	int remaining = timeout_ms;
+	struct console *c;
+	u64 last_diff = 0;
+	u64 printk_seq;
+	u64 diff;
+	u64 seq;
+
+	might_sleep();
+
+	seq = prb_next_seq(prb);
+
+	for (;;) {
+		diff = 0;
+
+		console_lock();
+		for_each_console(c) {
+			if (con && con != c)
+				continue;
+			if (!console_is_usable(c))
+				continue;
+			printk_seq = c->seq;
+			if (printk_seq < seq)
+				diff += seq - printk_seq;
+		}
+		console_unlock();
+
+		if (diff != last_diff && reset_on_progress)
+			remaining = timeout_ms;
+
+		if (diff == 0 || remaining == 0)
+			break;
+
+		if (remaining < 0) {
+			/* no timeout limit */
+			msleep(100);
+		} else if (remaining < 100) {
+			msleep(remaining);
+			remaining = 0;
+		} else {
+			msleep(100);
+			remaining -= 100;
+		}
+
+		last_diff = diff;
+	}
+
+	return (diff == 0);
+}
+
+/**
+ * pr_flush() - Wait for printing threads to catch up.
+ *
+ * @timeout_ms:        The maximum time (in ms) to wait.
+ * @reset_on_progress: Reset the timeout if forward progress is seen.
+ *
+ * A value of 0 for @timeout_ms means no waiting will occur. A value of -1
+ * represents infinite waiting.
+ *
+ * If @reset_on_progress is true, the timeout will be reset whenever any
+ * printer has been seen to make some forward progress.
+ *
+ * Context: Process context. May sleep while acquiring console lock.
+ * Return: true if all enabled printers are caught up.
+ */
+bool pr_flush(int timeout_ms, bool reset_on_progress)
+{
+	return __pr_flush(NULL, timeout_ms, reset_on_progress);
+}
+EXPORT_SYMBOL(pr_flush);
+
 /*
  * Delayed printk version, for scheduler-internal messages:
  */
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH printk v4 12/15] printk: add functions to prefer direct printing
  2022-04-21 21:22 [PATCH printk v4 00/15] implement threaded console printing John Ogness
                   ` (10 preceding siblings ...)
  2022-04-21 21:22 ` [PATCH printk v4 11/15] printk: add pr_flush() John Ogness
@ 2022-04-21 21:22 ` John Ogness
  2022-04-21 21:22 ` [PATCH printk v4 13/15] printk: add kthread console printers John Ogness
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 99+ messages in thread
From: John Ogness @ 2022-04-21 21:22 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman, Jiri Slaby, Paul E. McKenney,
	Frederic Weisbecker, Neeraj Upadhyay, Josh Triplett,
	Mathieu Desnoyers, Lai Jiangshan, Joel Fernandes, Kees Cook,
	Andrew Morton, Luis Chamberlain, Xiaoming Ni, Peter Zijlstra,
	Andy Shevchenko, Corey Minyard, Bjorn Andersson,
	Sebastian Andrzej Siewior, Marco Elver, Mark Brown,
	Daniel Lezcano, Matti Vaittinen, Dmitry Torokhov,
	Eric W. Biederman, Shawn Guo, Wang Qing, rcu

Once kthread printing is available, console printing will no longer
occur in the context of the printk caller. However, there are some
special contexts where it is desirable for the printk caller to
directly print out kernel messages. Using pr_flush() to wait for
threaded printers is only possible if the caller is in a sleepable
context and the kthreads are active. That is not always the case.

Introduce printk_prefer_direct_enter() and printk_prefer_direct_exit()
functions to explicitly (and globally) activate/deactivate preferred
direct console printing. The term "direct console printing" refers to
printing to all enabled consoles from the context of the printk
caller. The term "prefer" is used because this type of printing is
only best effort. If the console is currently locked or other
printers are already actively printing, the printk caller will need
to rely on the other contexts to handle the printing.

This preferred direct printing is how all printing has been handled
until now (unless it was explicitly deferred).

When kthread printing is introduced, there may be some unanticipated
problems due to kthreads being unable to flush important messages.
In order to minimize such risks, preferred direct printing is
activated for the primary important messages when the system
experiences general types of major errors. These are:

 - emergency reboot/shutdown
 - cpu and rcu stalls
 - hard and soft lockups
 - hung tasks
 - warn
 - sysrq

Note that since kthread printing does not yet exist, no behavior
changes result from this commit. This is only implementing the
counter and marking the various places where preferred direct
printing is active.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Acked-by: Paul E. McKenney <paulmck@kernel.org> # for RCU
---
 drivers/tty/sysrq.c     |  2 ++
 include/linux/printk.h  | 11 +++++++++++
 kernel/hung_task.c      | 11 ++++++++++-
 kernel/panic.c          |  4 ++++
 kernel/printk/printk.c  | 28 ++++++++++++++++++++++++++++
 kernel/rcu/tree_stall.h |  2 ++
 kernel/reboot.c         | 14 +++++++++++++-
 kernel/watchdog.c       |  4 ++++
 kernel/watchdog_hld.c   |  4 ++++
 9 files changed, 78 insertions(+), 2 deletions(-)

diff --git a/drivers/tty/sysrq.c b/drivers/tty/sysrq.c
index bbfd004449b5..2884cd638d64 100644
--- a/drivers/tty/sysrq.c
+++ b/drivers/tty/sysrq.c
@@ -578,6 +578,7 @@ void __handle_sysrq(int key, bool check_mask)
 
 	rcu_sysrq_start();
 	rcu_read_lock();
+	printk_prefer_direct_enter();
 	/*
 	 * Raise the apparent loglevel to maximum so that the sysrq header
 	 * is shown to provide the user with positive feedback.  We do not
@@ -619,6 +620,7 @@ void __handle_sysrq(int key, bool check_mask)
 		pr_cont("\n");
 		console_loglevel = orig_log_level;
 	}
+	printk_prefer_direct_exit();
 	rcu_read_unlock();
 	rcu_sysrq_end();
 
diff --git a/include/linux/printk.h b/include/linux/printk.h
index 091fba7283e1..cd26aab0ab2a 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -170,6 +170,9 @@ extern void __printk_safe_exit(void);
 #define printk_deferred_enter __printk_safe_enter
 #define printk_deferred_exit __printk_safe_exit
 
+extern void printk_prefer_direct_enter(void);
+extern void printk_prefer_direct_exit(void);
+
 extern bool pr_flush(int timeout_ms, bool reset_on_progress);
 
 /*
@@ -222,6 +225,14 @@ static inline void printk_deferred_exit(void)
 {
 }
 
+static inline void printk_prefer_direct_enter(void)
+{
+}
+
+static inline void printk_prefer_direct_exit(void)
+{
+}
+
 static inline bool pr_flush(int timeout_ms, bool reset_on_progress)
 {
 	return true;
diff --git a/kernel/hung_task.c b/kernel/hung_task.c
index 52501e5f7655..02a65d554340 100644
--- a/kernel/hung_task.c
+++ b/kernel/hung_task.c
@@ -127,6 +127,8 @@ static void check_hung_task(struct task_struct *t, unsigned long timeout)
 	 * complain:
 	 */
 	if (sysctl_hung_task_warnings) {
+		printk_prefer_direct_enter();
+
 		if (sysctl_hung_task_warnings > 0)
 			sysctl_hung_task_warnings--;
 		pr_err("INFO: task %s:%d blocked for more than %ld seconds.\n",
@@ -142,6 +144,8 @@ static void check_hung_task(struct task_struct *t, unsigned long timeout)
 
 		if (sysctl_hung_task_all_cpu_backtrace)
 			hung_task_show_all_bt = true;
+
+		printk_prefer_direct_exit();
 	}
 
 	touch_nmi_watchdog();
@@ -204,12 +208,17 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
 	}
  unlock:
 	rcu_read_unlock();
-	if (hung_task_show_lock)
+	if (hung_task_show_lock) {
+		printk_prefer_direct_enter();
 		debug_show_all_locks();
+		printk_prefer_direct_exit();
+	}
 
 	if (hung_task_show_all_bt) {
 		hung_task_show_all_bt = false;
+		printk_prefer_direct_enter();
 		trigger_all_cpu_backtrace();
+		printk_prefer_direct_exit();
 	}
 
 	if (hung_task_call_panic)
diff --git a/kernel/panic.c b/kernel/panic.c
index 55b50e052ec3..7d422597403f 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -560,6 +560,8 @@ void __warn(const char *file, int line, void *caller, unsigned taint,
 {
 	disable_trace_on_warning();
 
+	printk_prefer_direct_enter();
+
 	if (file)
 		pr_warn("WARNING: CPU: %d PID: %d at %s:%d %pS\n",
 			raw_smp_processor_id(), current->pid, file, line,
@@ -597,6 +599,8 @@ void __warn(const char *file, int line, void *caller, unsigned taint,
 
 	/* Just a warning, don't kill lockdep. */
 	add_taint(taint, LOCKDEP_STILL_OK);
+
+	printk_prefer_direct_exit();
 }
 
 #ifndef __WARN_FLAGS
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index a06999d55278..ed7f738261cc 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -362,6 +362,34 @@ static int console_msg_format = MSG_FORMAT_DEFAULT;
 static DEFINE_MUTEX(syslog_lock);
 
 #ifdef CONFIG_PRINTK
+static atomic_t printk_prefer_direct = ATOMIC_INIT(0);
+
+/**
+ * printk_prefer_direct_enter - cause printk() calls to attempt direct
+ *                              printing to all enabled consoles
+ *
+ * Since it is not possible to call into the console printing code from any
+ * context, there is no guarantee that direct printing will occur.
+ *
+ * This globally effects all printk() callers.
+ *
+ * Context: Any context.
+ */
+void printk_prefer_direct_enter(void)
+{
+	atomic_inc(&printk_prefer_direct);
+}
+
+/**
+ * printk_prefer_direct_exit - restore printk() behavior
+ *
+ * Context: Any context.
+ */
+void printk_prefer_direct_exit(void)
+{
+	WARN_ON(atomic_dec_if_positive(&printk_prefer_direct) < 0);
+}
+
 DECLARE_WAIT_QUEUE_HEAD(log_wait);
 /* All 3 protected by @syslog_lock. */
 /* the next printk record to read by syslog(READ) or /proc/kmsg */
diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
index 0c5d8516516a..d612707c2ed0 100644
--- a/kernel/rcu/tree_stall.h
+++ b/kernel/rcu/tree_stall.h
@@ -619,6 +619,7 @@ static void print_cpu_stall(unsigned long gps)
 	 * See Documentation/RCU/stallwarn.rst for info on how to debug
 	 * RCU CPU stall warnings.
 	 */
+	printk_prefer_direct_enter();
 	trace_rcu_stall_warning(rcu_state.name, TPS("SelfDetected"));
 	pr_err("INFO: %s self-detected stall on CPU\n", rcu_state.name);
 	raw_spin_lock_irqsave_rcu_node(rdp->mynode, flags);
@@ -656,6 +657,7 @@ static void print_cpu_stall(unsigned long gps)
 	 */
 	set_tsk_need_resched(current);
 	set_preempt_need_resched();
+	printk_prefer_direct_exit();
 }
 
 static void check_cpu_stall(struct rcu_data *rdp)
diff --git a/kernel/reboot.c b/kernel/reboot.c
index 6bcc5d6a6572..4177645e74d6 100644
--- a/kernel/reboot.c
+++ b/kernel/reboot.c
@@ -447,9 +447,11 @@ static int __orderly_reboot(void)
 	ret = run_cmd(reboot_cmd);
 
 	if (ret) {
+		printk_prefer_direct_enter();
 		pr_warn("Failed to start orderly reboot: forcing the issue\n");
 		emergency_sync();
 		kernel_restart(NULL);
+		printk_prefer_direct_exit();
 	}
 
 	return ret;
@@ -462,6 +464,7 @@ static int __orderly_poweroff(bool force)
 	ret = run_cmd(poweroff_cmd);
 
 	if (ret && force) {
+		printk_prefer_direct_enter();
 		pr_warn("Failed to start orderly shutdown: forcing the issue\n");
 
 		/*
@@ -471,6 +474,7 @@ static int __orderly_poweroff(bool force)
 		 */
 		emergency_sync();
 		kernel_power_off();
+		printk_prefer_direct_exit();
 	}
 
 	return ret;
@@ -528,6 +532,8 @@ EXPORT_SYMBOL_GPL(orderly_reboot);
  */
 static void hw_failure_emergency_poweroff_func(struct work_struct *work)
 {
+	printk_prefer_direct_enter();
+
 	/*
 	 * We have reached here after the emergency shutdown waiting period has
 	 * expired. This means orderly_poweroff has not been able to shut off
@@ -544,6 +550,8 @@ static void hw_failure_emergency_poweroff_func(struct work_struct *work)
 	 */
 	pr_emerg("Hardware protection shutdown failed. Trying emergency restart\n");
 	emergency_restart();
+
+	printk_prefer_direct_exit();
 }
 
 static DECLARE_DELAYED_WORK(hw_failure_emergency_poweroff_work,
@@ -582,11 +590,13 @@ void hw_protection_shutdown(const char *reason, int ms_until_forced)
 {
 	static atomic_t allow_proceed = ATOMIC_INIT(1);
 
+	printk_prefer_direct_enter();
+
 	pr_emerg("HARDWARE PROTECTION shutdown (%s)\n", reason);
 
 	/* Shutdown should be initiated only once. */
 	if (!atomic_dec_and_test(&allow_proceed))
-		return;
+		goto out;
 
 	/*
 	 * Queue a backup emergency shutdown in the event of
@@ -594,6 +604,8 @@ void hw_protection_shutdown(const char *reason, int ms_until_forced)
 	 */
 	hw_failure_emergency_poweroff(ms_until_forced);
 	orderly_poweroff(true);
+out:
+	printk_prefer_direct_exit();
 }
 EXPORT_SYMBOL_GPL(hw_protection_shutdown);
 
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 9166220457bc..40024e03d422 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -424,6 +424,8 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
 		/* Start period for the next softlockup warning. */
 		update_report_ts();
 
+		printk_prefer_direct_enter();
+
 		pr_emerg("BUG: soft lockup - CPU#%d stuck for %us! [%s:%d]\n",
 			smp_processor_id(), duration,
 			current->comm, task_pid_nr(current));
@@ -442,6 +444,8 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
 		add_taint(TAINT_SOFTLOCKUP, LOCKDEP_STILL_OK);
 		if (softlockup_panic)
 			panic("softlockup: hung tasks");
+
+		printk_prefer_direct_exit();
 	}
 
 	return HRTIMER_RESTART;
diff --git a/kernel/watchdog_hld.c b/kernel/watchdog_hld.c
index 247bf0b1582c..701f35f0e2d4 100644
--- a/kernel/watchdog_hld.c
+++ b/kernel/watchdog_hld.c
@@ -135,6 +135,8 @@ static void watchdog_overflow_callback(struct perf_event *event,
 		if (__this_cpu_read(hard_watchdog_warn) == true)
 			return;
 
+		printk_prefer_direct_enter();
+
 		pr_emerg("Watchdog detected hard LOCKUP on cpu %d\n",
 			 this_cpu);
 		print_modules();
@@ -155,6 +157,8 @@ static void watchdog_overflow_callback(struct perf_event *event,
 		if (hardlockup_panic)
 			nmi_panic(regs, "Hard LOCKUP");
 
+		printk_prefer_direct_exit();
+
 		__this_cpu_write(hard_watchdog_warn, true);
 		return;
 	}
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH printk v4 13/15] printk: add kthread console printers
  2022-04-21 21:22 [PATCH printk v4 00/15] implement threaded console printing John Ogness
                   ` (11 preceding siblings ...)
  2022-04-21 21:22 ` [PATCH printk v4 12/15] printk: add functions to prefer direct printing John Ogness
@ 2022-04-21 21:22 ` John Ogness
  2022-04-22  7:48   ` Petr Mladek
  2022-04-21 21:22 ` [PATCH printk v4 14/15] printk: extend console_lock for proper kthread support John Ogness
                   ` (2 subsequent siblings)
  15 siblings, 1 reply; 99+ messages in thread
From: John Ogness @ 2022-04-21 21:22 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman

Create a kthread for each console to perform console printing. During
normal operation (@system_state == SYSTEM_RUNNING), the kthread
printers are responsible for all printing on their respective
consoles.

During non-normal operation, console printing is done as it has been:
within the context of the printk caller or within irqwork triggered
by the printk caller, referred to as direct printing.

Since threaded console printers are responsible for all printing
during normal operation, this also includes messages generated via
deferred printk calls. If direct printing is in effect during a
deferred printk call, the queued irqwork will perform the direct
printing. To make it clear that this is the only time that the
irqwork will perform direct printing, rename the flag
PRINTK_PENDING_OUTPUT to PRINTK_PENDING_DIRECT_OUTPUT.

Threaded console printers synchronize against each other and against
console lockers by taking the console lock for each message that is
printed.

Note that the kthread printers do not care about direct printing.
They will always try to print if new records are available. They can
be blocked by direct printing, but will be woken again once direct
printing is finished.

Console unregistration is a bit tricky because the associated
kthread printer cannot be stopped while the console lock is held.
A policy is implemented that states: whichever task clears
con->thread (under the console lock) is responsible for stopping
the kthread. unregister_console() will clear con->thread while
the console lock is held and then stop the kthread after releasing
the console lock.

For consoles that have implemented the exit() callback, the kthread
is stopped before exit() is called.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
---
 include/linux/console.h |   2 +
 kernel/printk/printk.c  | 329 +++++++++++++++++++++++++++++++++++++---
 2 files changed, 309 insertions(+), 22 deletions(-)

diff --git a/include/linux/console.h b/include/linux/console.h
index 8c1686e2c233..9a251e70c090 100644
--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -153,6 +153,8 @@ struct console {
 	uint	ospeed;
 	u64	seq;
 	unsigned long dropped;
+	struct task_struct *thread;
+
 	void	*data;
 	struct	 console *next;
 };
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index ed7f738261cc..e4cdc424c826 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -361,6 +361,13 @@ static int console_msg_format = MSG_FORMAT_DEFAULT;
 /* syslog_lock protects syslog_* variables and write access to clear_seq. */
 static DEFINE_MUTEX(syslog_lock);
 
+/*
+ * A flag to signify if printk_activate_kthreads() has already started the
+ * kthread printers. If true, any later registered consoles must start their
+ * own kthread directly. The flag is write protected by the console_lock.
+ */
+static bool printk_kthreads_available;
+
 #ifdef CONFIG_PRINTK
 static atomic_t printk_prefer_direct = ATOMIC_INIT(0);
 
@@ -390,6 +397,39 @@ void printk_prefer_direct_exit(void)
 	WARN_ON(atomic_dec_if_positive(&printk_prefer_direct) < 0);
 }
 
+/*
+ * Calling printk() always wakes kthread printers so that they can
+ * flush the new message to their respective consoles. Also, if direct
+ * printing is allowed, printk() tries to flush the messages directly.
+ *
+ * Direct printing is allowed in situations when the kthreads
+ * are not available or the system is in a problematic state.
+ *
+ * See the implementation about possible races.
+ */
+static inline bool allow_direct_printing(void)
+{
+	/*
+	 * Checking kthread availability is a possible race because the
+	 * kthread printers can become permanently disabled during runtime.
+	 * However, doing that requires holding the console_lock, so any
+	 * pending messages will be direct printed by console_unlock().
+	 */
+	if (!printk_kthreads_available)
+		return true;
+
+	/*
+	 * Prefer direct printing when the system is in a problematic state.
+	 * The context that sets this state will always see the updated value.
+	 * The other contexts do not care. Anyway, direct printing is just a
+	 * best effort. The direct output is only possible when console_lock
+	 * is not already taken and no kthread printers are actively printing.
+	 */
+	return (system_state > SYSTEM_RUNNING ||
+		oops_in_progress ||
+		atomic_read(&printk_prefer_direct));
+}
+
 DECLARE_WAIT_QUEUE_HEAD(log_wait);
 /* All 3 protected by @syslog_lock. */
 /* the next printk record to read by syslog(READ) or /proc/kmsg */
@@ -2280,10 +2320,10 @@ asmlinkage int vprintk_emit(int facility, int level,
 	printed_len = vprintk_store(facility, level, dev_info, fmt, args);
 
 	/* If called from the scheduler, we can not call up(). */
-	if (!in_sched) {
+	if (!in_sched && allow_direct_printing()) {
 		/*
 		 * The caller may be holding system-critical or
-		 * timing-sensitive locks. Disable preemption during
+		 * timing-sensitive locks. Disable preemption during direct
 		 * printing of all remaining records to all consoles so that
 		 * this context can return as soon as possible. Hopefully
 		 * another printk() caller will take over the printing.
@@ -2326,6 +2366,8 @@ EXPORT_SYMBOL(_printk);
 
 static bool __pr_flush(struct console *con, int timeout_ms, bool reset_on_progress);
 
+static void printk_start_kthread(struct console *con);
+
 #else /* CONFIG_PRINTK */
 
 #define CONSOLE_LOG_MAX		0
@@ -2359,6 +2401,8 @@ static void call_console_driver(struct console *con, const char *text, size_t le
 }
 static bool suppress_message_printing(int level) { return false; }
 static bool __pr_flush(struct console *con, int timeout_ms, bool reset_on_progress) { return true; }
+static void printk_start_kthread(struct console *con) { }
+static bool allow_direct_printing(void) { return true; }
 
 #endif /* CONFIG_PRINTK */
 
@@ -2559,6 +2603,13 @@ void resume_console(void)
 	down_console_sem();
 	console_suspended = 0;
 	console_unlock();
+
+	/*
+	 * While suspended, new records may have been added to the
+	 * ringbuffer. Wake up the kthread printers to print them.
+	 */
+	wake_up_klogd();
+
 	pr_flush(1000, true);
 }
 
@@ -2577,6 +2628,9 @@ static int console_cpu_notify(unsigned int cpu)
 		/* If trylock fails, someone else is doing the printing */
 		if (console_trylock())
 			console_unlock();
+
+		/* Wake kthread printers. Some may have become usable. */
+		wake_up_klogd();
 	}
 	return 0;
 }
@@ -2648,18 +2702,9 @@ static bool abandon_console_lock_in_panic(void)
 	return atomic_read(&panic_cpu) != raw_smp_processor_id();
 }
 
-/*
- * Check if the given console is currently capable and allowed to print
- * records.
- *
- * Requires the console_lock.
- */
-static inline bool console_is_usable(struct console *con)
+static inline bool __console_is_usable(short flags)
 {
-	if (!(con->flags & CON_ENABLED))
-		return false;
-
-	if (!con->write)
+	if (!(flags & CON_ENABLED))
 		return false;
 
 	/*
@@ -2668,12 +2713,26 @@ static inline bool console_is_usable(struct console *con)
 	 * cope (CON_ANYTIME) don't call them until this CPU is officially up.
 	 */
 	if (!cpu_online(raw_smp_processor_id()) &&
-	    !(con->flags & CON_ANYTIME))
+	    !(flags & CON_ANYTIME))
 		return false;
 
 	return true;
 }
 
+/*
+ * Check if the given console is currently capable and allowed to print
+ * records.
+ *
+ * Requires the console_lock.
+ */
+static inline bool console_is_usable(struct console *con)
+{
+	if (!con->write)
+		return false;
+
+	return __console_is_usable(con->flags);
+}
+
 static void __console_unlock(void)
 {
 	console_locked = 0;
@@ -2786,8 +2845,8 @@ static bool console_emit_next_record(struct console *con, char *text, char *ext_
  * were flushed to all usable consoles. A returned false informs the caller
  * that everything was not flushed (either there were no usable consoles or
  * another context has taken over printing or it is a panic situation and this
- * is not the panic CPU). Regardless the reason, the caller should assume it
- * is not useful to immediately try again.
+ * is not the panic CPU or direct printing is not preferred). Regardless the
+ * reason, the caller should assume it is not useful to immediately try again.
  *
  * Requires the console_lock.
  */
@@ -2804,6 +2863,10 @@ static bool console_flush_all(bool do_cond_resched, u64 *next_seq, bool *handove
 	*handover = false;
 
 	do {
+		/* Let the kthread printers do the work if they can. */
+		if (!allow_direct_printing())
+			return false;
+
 		any_progress = false;
 
 		for_each_console(con) {
@@ -3018,6 +3081,10 @@ void console_start(struct console *console)
 	console_lock();
 	console->flags |= CON_ENABLED;
 	console_unlock();
+
+	/* Wake the newly enabled kthread printer. */
+	wake_up_klogd();
+
 	__pr_flush(console, 1000, true);
 }
 EXPORT_SYMBOL(console_start);
@@ -3218,6 +3285,8 @@ void register_console(struct console *newcon)
 		nr_ext_console_drivers++;
 
 	newcon->dropped = 0;
+	newcon->thread = NULL;
+
 	if (newcon->flags & CON_PRINTBUFFER) {
 		/* Get a consistent copy of @syslog_seq. */
 		mutex_lock(&syslog_lock);
@@ -3227,6 +3296,10 @@ void register_console(struct console *newcon)
 		/* Begin with next message. */
 		newcon->seq = prb_next_seq(prb);
 	}
+
+	if (printk_kthreads_available)
+		printk_start_kthread(newcon);
+
 	console_unlock();
 	console_sysfs_notify();
 
@@ -3253,6 +3326,7 @@ EXPORT_SYMBOL(register_console);
 
 int unregister_console(struct console *console)
 {
+	struct task_struct *thd;
 	struct console *con;
 	int res;
 
@@ -3293,7 +3367,20 @@ int unregister_console(struct console *console)
 		console_drivers->flags |= CON_CONSDEV;
 
 	console->flags &= ~CON_ENABLED;
+
+	/*
+	 * console->thread can only be cleared under the console lock. But
+	 * stopping the thread must be done without the console lock. The
+	 * task that clears @thread is the task that stops the kthread.
+	 */
+	thd = console->thread;
+	console->thread = NULL;
+
 	console_unlock();
+
+	if (thd)
+		kthread_stop(thd);
+
 	console_sysfs_notify();
 
 	if (console->exit)
@@ -3389,6 +3476,20 @@ static int __init printk_late_init(void)
 }
 late_initcall(printk_late_init);
 
+static int __init printk_activate_kthreads(void)
+{
+	struct console *con;
+
+	console_lock();
+	printk_kthreads_available = true;
+	for_each_console(con)
+		printk_start_kthread(con);
+	console_unlock();
+
+	return 0;
+}
+early_initcall(printk_activate_kthreads);
+
 #if defined CONFIG_PRINTK
 /* If @con is specified, only wait for that console. Otherwise wait for all. */
 static bool __pr_flush(struct console *con, int timeout_ms, bool reset_on_progress)
@@ -3463,11 +3564,180 @@ bool pr_flush(int timeout_ms, bool reset_on_progress)
 }
 EXPORT_SYMBOL(pr_flush);
 
+static void __printk_fallback_preferred_direct(void)
+{
+	printk_prefer_direct_enter();
+	pr_err("falling back to preferred direct printing\n");
+	printk_kthreads_available = false;
+}
+
+/*
+ * Enter preferred direct printing, but never exit. Mark console threads as
+ * unavailable. The system is then forever in preferred direct printing and
+ * any printing threads will exit.
+ *
+ * Must *not* be called under console_lock. Use
+ * __printk_fallback_preferred_direct() if already holding console_lock.
+ */
+static void printk_fallback_preferred_direct(void)
+{
+	console_lock();
+	__printk_fallback_preferred_direct();
+	console_unlock();
+}
+
+static bool printer_should_wake(struct console *con, u64 seq)
+{
+	short flags;
+
+	if (kthread_should_stop() || !printk_kthreads_available)
+		return true;
+
+	if (console_suspended)
+		return false;
+
+	/*
+	 * This is an unsafe read from con->flags, but a false positive is
+	 * not a problem. Worst case it would allow the printer to wake up
+	 * although it is disabled. But the printer will notice that when
+	 * attempting to print and instead go back to sleep.
+	 */
+	flags = data_race(READ_ONCE(con->flags));
+
+	if (!__console_is_usable(flags))
+		return false;
+
+	return prb_read_valid(prb, seq, NULL);
+}
+
+static int printk_kthread_func(void *data)
+{
+	struct console *con = data;
+	char *dropped_text = NULL;
+	char *ext_text = NULL;
+	bool handover;
+	u64 seq = 0;
+	char *text;
+	int error;
+
+	text = kmalloc(CONSOLE_LOG_MAX, GFP_KERNEL);
+	if (!text) {
+		con_printk(KERN_ERR, con, "failed to allocate text buffer\n");
+		printk_fallback_preferred_direct();
+		goto out;
+	}
+
+	if (con->flags & CON_EXTENDED) {
+		ext_text = kmalloc(CONSOLE_EXT_LOG_MAX, GFP_KERNEL);
+		if (!ext_text) {
+			con_printk(KERN_ERR, con, "failed to allocate ext_text buffer\n");
+			printk_fallback_preferred_direct();
+			goto out;
+		}
+	} else {
+		dropped_text = kmalloc(DROPPED_TEXT_MAX, GFP_KERNEL);
+		if (!dropped_text) {
+			con_printk(KERN_ERR, con, "failed to allocate dropped_text buffer\n");
+			printk_fallback_preferred_direct();
+			goto out;
+		}
+	}
+
+	con_printk(KERN_INFO, con, "printing thread started\n");
+
+	for (;;) {
+		/*
+		 * Guarantee this task is visible on the waitqueue before
+		 * checking the wake condition.
+		 *
+		 * The full memory barrier within set_current_state() of
+		 * prepare_to_wait_event() pairs with the full memory barrier
+		 * within wq_has_sleeper().
+		 *
+		 * This pairs with __wake_up_klogd:A.
+		 */
+		error = wait_event_interruptible(log_wait,
+				printer_should_wake(con, seq)); /* LMM(printk_kthread_func:A) */
+
+		if (kthread_should_stop() || !printk_kthreads_available)
+			break;
+
+		if (error)
+			continue;
+
+		console_lock();
+
+		if (console_suspended) {
+			up_console_sem();
+			continue;
+		}
+
+		if (!console_is_usable(con)) {
+			__console_unlock();
+			continue;
+		}
+
+		/*
+		 * Even though the printk kthread is always preemptible, it is
+		 * still not allowed to call cond_resched() from within
+		 * console drivers. The task may become non-preemptible in the
+		 * console driver call chain. For example, vt_console_print()
+		 * takes a spinlock and then can call into fbcon_redraw(),
+		 * which can conditionally invoke cond_resched().
+		 */
+		console_may_schedule = 0;
+		console_emit_next_record(con, text, ext_text, dropped_text, &handover);
+		if (handover)
+			continue;
+
+		seq = con->seq;
+
+		__console_unlock();
+	}
+
+	con_printk(KERN_INFO, con, "printing thread stopped\n");
+out:
+	kfree(dropped_text);
+	kfree(ext_text);
+	kfree(text);
+
+	console_lock();
+	/*
+	 * If this kthread is being stopped by another task, con->thread will
+	 * already be NULL. That is fine. The important thing is that it is
+	 * NULL after the kthread exits.
+	 */
+	con->thread = NULL;
+	console_unlock();
+
+	return 0;
+}
+
+/* Must be called under console_lock. */
+static void printk_start_kthread(struct console *con)
+{
+	/*
+	 * Do not start a kthread if there is no write() callback. The
+	 * kthreads assume the write() callback exists.
+	 */
+	if (!con->write)
+		return;
+
+	con->thread = kthread_run(printk_kthread_func, con,
+				  "pr/%s%d", con->name, con->index);
+	if (IS_ERR(con->thread)) {
+		con->thread = NULL;
+		con_printk(KERN_ERR, con, "unable to start printing thread\n");
+		__printk_fallback_preferred_direct();
+		return;
+	}
+}
+
 /*
  * Delayed printk version, for scheduler-internal messages:
  */
-#define PRINTK_PENDING_WAKEUP	0x01
-#define PRINTK_PENDING_OUTPUT	0x02
+#define PRINTK_PENDING_WAKEUP		0x01
+#define PRINTK_PENDING_DIRECT_OUTPUT	0x02
 
 static DEFINE_PER_CPU(int, printk_pending);
 
@@ -3475,10 +3745,14 @@ static void wake_up_klogd_work_func(struct irq_work *irq_work)
 {
 	int pending = this_cpu_xchg(printk_pending, 0);
 
-	if (pending & PRINTK_PENDING_OUTPUT) {
+	if (pending & PRINTK_PENDING_DIRECT_OUTPUT) {
+		printk_prefer_direct_enter();
+
 		/* If trylock fails, someone else is doing the printing */
 		if (console_trylock())
 			console_unlock();
+
+		printk_prefer_direct_exit();
 	}
 
 	if (pending & PRINTK_PENDING_WAKEUP)
@@ -3503,10 +3777,11 @@ static void __wake_up_klogd(int val)
 	 * prepare_to_wait_event(), which is called after ___wait_event() adds
 	 * the waiter but before it has checked the wait condition.
 	 *
-	 * This pairs with devkmsg_read:A and syslog_print:A.
+	 * This pairs with devkmsg_read:A, syslog_print:A, and
+	 * printk_kthread_func:A.
 	 */
 	if (wq_has_sleeper(&log_wait) || /* LMM(__wake_up_klogd:A) */
-	    (val & PRINTK_PENDING_OUTPUT)) {
+	    (val & PRINTK_PENDING_DIRECT_OUTPUT)) {
 		this_cpu_or(printk_pending, val);
 		irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
 	}
@@ -3524,7 +3799,17 @@ void defer_console_output(void)
 	 * New messages may have been added directly to the ringbuffer
 	 * using vprintk_store(), so wake any waiters as well.
 	 */
-	__wake_up_klogd(PRINTK_PENDING_WAKEUP | PRINTK_PENDING_OUTPUT);
+	int val = PRINTK_PENDING_WAKEUP;
+
+	/*
+	 * Make sure that some context will print the messages when direct
+	 * printing is allowed. This happens in situations when the kthreads
+	 * may not be as reliable or perhaps unusable.
+	 */
+	if (allow_direct_printing())
+		val |= PRINTK_PENDING_DIRECT_OUTPUT;
+
+	__wake_up_klogd(val);
 }
 
 void printk_trigger_flush(void)
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH printk v4 14/15] printk: extend console_lock for proper kthread support
  2022-04-21 21:22 [PATCH printk v4 00/15] implement threaded console printing John Ogness
                   ` (12 preceding siblings ...)
  2022-04-21 21:22 ` [PATCH printk v4 13/15] printk: add kthread console printers John Ogness
@ 2022-04-21 21:22 ` John Ogness
  2022-04-21 21:40   ` John Ogness
                     ` (2 more replies)
  2022-04-21 21:22 ` [PATCH printk v4 15/15] printk: remove @console_locked John Ogness
  2022-04-22  9:39 ` [PATCH printk v4 00/15] implement threaded console printing Petr Mladek
  15 siblings, 3 replies; 99+ messages in thread
From: John Ogness @ 2022-04-21 21:22 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman

Currently threaded console printers synchronize against each
other using console_lock(). However, different console drivers
are unrelated and do not require any synchronization between
each other. Removing the synchronization between the threaded
console printers will allow each console to print at its own
speed.

But the threaded consoles printers do still need to synchronize
against console_lock() callers. Introduce a per-console mutex
and a new console flag CON_THD_BLOCKED to provide this
synchronization.

console_lock() is modified so that it must acquire the mutex
of each console in order to set the CON_THD_BLOCKED flag.
Console printing threads will acquire their mutex while
printing a record. If CON_THD_BLOCKED was set, the thread will
go back to sleep instead of printing.

The reason for the CON_THD_BLOCKED flag is so that
console_lock() callers do not need to acquire multiple console
mutexes simultaneously, which would introduce unnecessary
complexity due to nested mutex locking.

Threaded console printers also need to synchronize against
console_trylock() callers. Since console_trylock() may be
called from any context, the per-console mutex cannot be used
for this synchronization. (mutex_trylock() cannot be called
from atomic contexts.) Introduce a global atomic counter to
identify if any threaded printers are active. The threaded
printers will also check the atomic counter to identify if the
console has been locked by another task via console_trylock().

Note that @console_sem is still used to provide synchronization
between console_lock() and console_trylock() callers.

A locking overview for console_lock(), console_trylock(), and the
threaded printers is as follows (pseudo code):

console_lock()
{
        down(&console_sem);
        for_each_console(con) {
                mutex_lock(&con->lock);
                con->flags |= CON_THD_BLOCKED;
                mutex_unlock(&con->lock);
        }
        /* console_lock acquired */
}

console_trylock()
{
        if (down_trylock(&console_sem) == 0) {
                if (atomic_cmpxchg(&console_kthreads_active, 0, -1) == 0) {
                        /* console_lock acquired */
                }
        }
}

threaded_printer()
{
        mutex_lock(&con->lock);
        if (!(con->flags & CON_THD_BLOCKED)) {
		/* console_lock() callers blocked */

                if (atomic_inc_unless_negative(&console_kthreads_active)) {
                        /* console_trylock() callers blocked */

                        con->write();

                        atomic_dec(&console_lock_count);
                }
        }
        mutex_unlock(&con->lock);
}

The console owner and waiter logic now only applies between contexts
that have taken the console_lock via console_trylock(). Threaded
printers never take the console_lock, so they do not have a
console_lock to handover. Tasks that have used console_lock() will
block the threaded printers using a mutex and if the console_lock
is handed over to an atomic context, it would be unable to unblock
the threaded printers. However, the console_trylock() case is
really the only scenario that is interesting for handovers anyway.

@panic_console_dropped must change to atomic_t since it is no longer
protected exclusively by the console_lock.

Since threaded printers remain asleep if they see that the console
is locked, they now must be explicitly woken in __console_unlock().
This means wake_up_klogd() calls following a console_unlock() are
no longer necessary and are removed.

Also note that threaded printers no longer need to check
@console_suspended. The check for the CON_THD_BLOCKED flag
implicitly covers the suspended console case.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
---
 include/linux/console.h |  15 ++
 kernel/printk/printk.c  | 296 +++++++++++++++++++++++++++++++---------
 2 files changed, 248 insertions(+), 63 deletions(-)

diff --git a/include/linux/console.h b/include/linux/console.h
index 9a251e70c090..c1fd4f41c547 100644
--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -16,6 +16,7 @@
 
 #include <linux/atomic.h>
 #include <linux/types.h>
+#include <linux/mutex.h>
 
 struct vc_data;
 struct console_font_op;
@@ -136,6 +137,7 @@ static inline int con_debug_leave(void)
 #define CON_ANYTIME	(16) /* Safe to call when cpu is offline */
 #define CON_BRL		(32) /* Used for a braille device */
 #define CON_EXTENDED	(64) /* Use the extended output format a la /dev/kmsg */
+#define CON_THD_BLOCKED	(128) /* Thread blocked because console is locked */
 
 struct console {
 	char	name[16];
@@ -155,6 +157,19 @@ struct console {
 	unsigned long dropped;
 	struct task_struct *thread;
 
+	/*
+	 * The per-console lock is used by printing kthreads to synchronize
+	 * this console with callers of console_lock(). This is necessary in
+	 * order to allow printing kthreads to run in parallel to each other,
+	 * while each safely accessing their own @flags and synchronizing
+	 * against direct printing via console_lock/console_unlock.
+	 *
+	 * Note: For synchronizing against direct printing via
+	 *       console_trylock/console_unlock, see the static global
+	 *       variable @console_kthreads_active.
+	 */
+	struct mutex lock;
+
 	void	*data;
 	struct	 console *next;
 };
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index e4cdc424c826..7243a85564ef 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -223,6 +223,33 @@ int devkmsg_sysctl_set_loglvl(struct ctl_table *table, int write,
 /* Number of registered extended console drivers. */
 static int nr_ext_console_drivers;
 
+/*
+ * Used to synchronize printing kthreads against direct printing via
+ * console_trylock/console_unlock.
+ *
+ * Values:
+ * -1 = console kthreads atomically blocked (via global trylock)
+ *  0 = no kthread printing, console not locked (via trylock)
+ * >0 = kthread(s) actively printing
+ *
+ * Note: For synchronizing against direct printing via
+ *       console_lock/console_unlock, see the @lock variable in
+ *       struct console.
+ */
+static atomic_t console_kthreads_active = ATOMIC_INIT(0);
+
+#define console_kthreads_atomic_tryblock() \
+	(atomic_cmpxchg(&console_kthreads_active, 0, -1) == 0)
+#define console_kthreads_atomic_unblock() \
+	atomic_cmpxchg(&console_kthreads_active, -1, 0)
+#define console_kthreads_atomically_blocked() \
+	(atomic_read(&console_kthreads_active) == -1)
+
+#define console_kthread_printing_tryenter() \
+	atomic_inc_unless_negative(&console_kthreads_active)
+#define console_kthread_printing_exit() \
+	atomic_dec(&console_kthreads_active)
+
 /*
  * Helper macros to handle lockdep when locking/unlocking console_sem. We use
  * macros instead of functions so that _RET_IP_ contains useful information.
@@ -270,6 +297,49 @@ static bool panic_in_progress(void)
 	return unlikely(atomic_read(&panic_cpu) != PANIC_CPU_INVALID);
 }
 
+/*
+ * Tracks whether kthread printers are all blocked. A value of true implies
+ * that the console is locked via console_lock() or the console is suspended.
+ * Reading and writing to this variable requires holding @console_sem.
+ */
+static bool console_kthreads_blocked;
+
+/*
+ * Block all kthread printers from a schedulable context.
+ *
+ * Requires holding @console_sem.
+ */
+static void console_kthreads_block(void)
+{
+	struct console *con;
+
+	for_each_console(con) {
+		mutex_lock(&con->lock);
+		con->flags |= CON_THD_BLOCKED;
+		mutex_unlock(&con->lock);
+	}
+
+	console_kthreads_blocked = true;
+}
+
+/*
+ * Unblock all kthread printers from a schedulable context.
+ *
+ * Requires holding @console_sem.
+ */
+static void console_kthreads_unblock(void)
+{
+	struct console *con;
+
+	for_each_console(con) {
+		mutex_lock(&con->lock);
+		con->flags &= ~CON_THD_BLOCKED;
+		mutex_unlock(&con->lock);
+	}
+
+	console_kthreads_blocked = false;
+}
+
 /*
  * This is used for debugging the mess that is the VT code by
  * keeping track if we have the console semaphore held. It's
@@ -2603,13 +2673,6 @@ void resume_console(void)
 	down_console_sem();
 	console_suspended = 0;
 	console_unlock();
-
-	/*
-	 * While suspended, new records may have been added to the
-	 * ringbuffer. Wake up the kthread printers to print them.
-	 */
-	wake_up_klogd();
-
 	pr_flush(1000, true);
 }
 
@@ -2628,9 +2691,14 @@ static int console_cpu_notify(unsigned int cpu)
 		/* If trylock fails, someone else is doing the printing */
 		if (console_trylock())
 			console_unlock();
-
-		/* Wake kthread printers. Some may have become usable. */
-		wake_up_klogd();
+		else {
+			/*
+			 * If a new CPU comes online, the conditions for
+			 * printer_should_wake() may have changed for some
+			 * kthread printer with !CON_ANYTIME.
+			 */
+			wake_up_klogd();
+		}
 	}
 	return 0;
 }
@@ -2650,6 +2718,7 @@ void console_lock(void)
 	down_console_sem();
 	if (console_suspended)
 		return;
+	console_kthreads_block();
 	console_locked = 1;
 	console_may_schedule = 1;
 }
@@ -2671,6 +2740,10 @@ int console_trylock(void)
 		up_console_sem();
 		return 0;
 	}
+	if (!console_kthreads_atomic_tryblock()) {
+		up_console_sem();
+		return 0;
+	}
 	console_locked = 1;
 	console_may_schedule = 0;
 	return 1;
@@ -2679,7 +2752,7 @@ EXPORT_SYMBOL(console_trylock);
 
 int is_console_locked(void)
 {
-	return console_locked;
+	return (console_locked || atomic_read(&console_kthreads_active));
 }
 EXPORT_SYMBOL(is_console_locked);
 
@@ -2723,7 +2796,7 @@ static inline bool __console_is_usable(short flags)
  * Check if the given console is currently capable and allowed to print
  * records.
  *
- * Requires the console_lock.
+ * Requires holding the console_lock or con->lock.
  */
 static inline bool console_is_usable(struct console *con)
 {
@@ -2736,6 +2809,22 @@ static inline bool console_is_usable(struct console *con)
 static void __console_unlock(void)
 {
 	console_locked = 0;
+
+	/*
+	 * Depending on whether console_lock() or console_trylock() was used,
+	 * appropriately allow the kthread printers to continue.
+	 */
+	if (console_kthreads_blocked)
+		console_kthreads_unblock();
+	else
+		console_kthreads_atomic_unblock();
+
+	/*
+	 * New records may have arrived while the console was locked.
+	 * Wake the kthread printers to print them.
+	 */
+	wake_up_klogd();
+
 	up_console_sem();
 }
 
@@ -2753,17 +2842,19 @@ static void __console_unlock(void)
  *
  * @handover will be set to true if a printk waiter has taken over the
  * console_lock, in which case the caller is no longer holding the
- * console_lock. Otherwise it is set to false.
+ * console_lock. Otherwise it is set to false. A NULL pointer may be provided
+ * to disable allowing the console_lock to be taken over by a printk waiter.
  *
  * Returns false if the given console has no next record to print, otherwise
  * true.
  *
- * Requires the console_lock.
+ * Requires the console_lock if @handover is non-NULL.
+ * Requires con->lock otherwise.
  */
-static bool console_emit_next_record(struct console *con, char *text, char *ext_text,
-				     char *dropped_text, bool *handover)
+static bool __console_emit_next_record(struct console *con, char *text, char *ext_text,
+				       char *dropped_text, bool *handover)
 {
-	static int panic_console_dropped;
+	static atomic_t panic_console_dropped = ATOMIC_INIT(0);
 	struct printk_info info;
 	struct printk_record r;
 	unsigned long flags;
@@ -2772,7 +2863,8 @@ static bool console_emit_next_record(struct console *con, char *text, char *ext_
 
 	prb_rec_init_rd(&r, &info, text, CONSOLE_LOG_MAX);
 
-	*handover = false;
+	if (handover)
+		*handover = false;
 
 	if (!prb_read_valid(prb, con->seq, &r))
 		return false;
@@ -2780,7 +2872,8 @@ static bool console_emit_next_record(struct console *con, char *text, char *ext_
 	if (con->seq != r.info->seq) {
 		con->dropped += r.info->seq - con->seq;
 		con->seq = r.info->seq;
-		if (panic_in_progress() && panic_console_dropped++ > 10) {
+		if (panic_in_progress() &&
+		    atomic_fetch_inc_relaxed(&panic_console_dropped) > 10) {
 			suppress_panic_printk = 1;
 			pr_warn_once("Too many dropped messages. Suppress messages on non-panic CPUs to prevent livelock.\n");
 		}
@@ -2802,31 +2895,61 @@ static bool console_emit_next_record(struct console *con, char *text, char *ext_
 		len = record_print_text(&r, console_msg_format & MSG_FORMAT_SYSLOG, printk_time);
 	}
 
-	/*
-	 * While actively printing out messages, if another printk()
-	 * were to occur on another CPU, it may wait for this one to
-	 * finish. This task can not be preempted if there is a
-	 * waiter waiting to take over.
-	 *
-	 * Interrupts are disabled because the hand over to a waiter
-	 * must not be interrupted until the hand over is completed
-	 * (@console_waiter is cleared).
-	 */
-	printk_safe_enter_irqsave(flags);
-	console_lock_spinning_enable();
+	if (handover) {
+		/*
+		 * While actively printing out messages, if another printk()
+		 * were to occur on another CPU, it may wait for this one to
+		 * finish. This task can not be preempted if there is a
+		 * waiter waiting to take over.
+		 *
+		 * Interrupts are disabled because the hand over to a waiter
+		 * must not be interrupted until the hand over is completed
+		 * (@console_waiter is cleared).
+		 */
+		printk_safe_enter_irqsave(flags);
+		console_lock_spinning_enable();
+
+		/* don't trace irqsoff print latency */
+		stop_critical_timings();
+	}
 
-	stop_critical_timings();	/* don't trace print latency */
 	call_console_driver(con, write_text, len, dropped_text);
-	start_critical_timings();
 
 	con->seq++;
 
-	*handover = console_lock_spinning_disable_and_check();
-	printk_safe_exit_irqrestore(flags);
+	if (handover) {
+		start_critical_timings();
+		*handover = console_lock_spinning_disable_and_check();
+		printk_safe_exit_irqrestore(flags);
+	}
 skip:
 	return true;
 }
 
+/*
+ * Print a record for a given console, but allow another printk() caller to
+ * take over the console_lock and continue printing.
+ *
+ * Requires the console_lock, but depending on @handover after the call, the
+ * caller may no longer have the console_lock.
+ *
+ * See __console_emit_next_record() for argument and return details.
+ */
+static bool console_emit_next_record_transferable(struct console *con, char *text, char *ext_text,
+						  char *dropped_text, bool *handover)
+{
+	/*
+	 * Handovers are only supported if threaded printers are atomically
+	 * blocked. The context taking over the console_lock may be atomic.
+	 */
+	if (!console_kthreads_atomically_blocked()) {
+		*handover = false;
+		handover = NULL;
+	}
+
+	return __console_emit_next_record(con, text, ext_text, dropped_text, handover);
+}
+
 /*
  * Print out all remaining records to all consoles.
  *
@@ -2878,13 +3001,11 @@ static bool console_flush_all(bool do_cond_resched, u64 *next_seq, bool *handove
 
 			if (con->flags & CON_EXTENDED) {
 				/* Extended consoles do not print "dropped messages". */
-				progress = console_emit_next_record(con, &text[0],
-								    &ext_text[0], NULL,
-								    handover);
+				progress = console_emit_next_record_transferable(con, &text[0],
+								&ext_text[0], NULL, handover);
 			} else {
-				progress = console_emit_next_record(con, &text[0],
-								    NULL, &dropped_text[0],
-								    handover);
+				progress = console_emit_next_record_transferable(con, &text[0],
+								NULL, &dropped_text[0], handover);
 			}
 			if (*handover)
 				return false;
@@ -2999,6 +3120,10 @@ void console_unblank(void)
 	if (oops_in_progress) {
 		if (down_trylock_console_sem() != 0)
 			return;
+		if (!console_kthreads_atomic_tryblock()) {
+			up_console_sem();
+			return;
+		}
 	} else
 		console_lock();
 
@@ -3062,6 +3187,16 @@ struct tty_driver *console_device(int *index)
 	return driver;
 }
 
+/*
+ * Since the kthread printers do not acquire the console_lock but do need to
+ * access @flags, they could experience races because other tasks
+ * (synchronizing using the console_lock) can modify @flags. These macros are
+ * available to at least provide atomic variable updates so that the kthread
+ * printers can see consistent values.
+ */
+#define console_flags_set(var, flag)	WRITE_ONCE(var, READ_ONCE(var) | flag)
+#define console_flags_clear(var, flag)	WRITE_ONCE(var, READ_ONCE(var) & ~flag)
+
 /*
  * Prevent further output on the passed console device so that (for example)
  * serial drivers can disable console output before suspending a port, and can
@@ -3071,20 +3206,23 @@ void console_stop(struct console *console)
 {
 	__pr_flush(console, 1000, true);
 	console_lock();
-	console->flags &= ~CON_ENABLED;
+
+	/* Can cause races for printk_kthread_func(). */
+	console_flags_clear(console->flags, CON_ENABLED);
+
 	console_unlock();
 }
 EXPORT_SYMBOL(console_stop);
 
+
 void console_start(struct console *console)
 {
 	console_lock();
-	console->flags |= CON_ENABLED;
-	console_unlock();
 
-	/* Wake the newly enabled kthread printer. */
-	wake_up_klogd();
+	/* Can cause races for printk_kthread_func(). */
+	console_flags_set(console->flags, CON_ENABLED);
 
+	console_unlock();
 	__pr_flush(console, 1000, true);
 }
 EXPORT_SYMBOL(console_start);
@@ -3286,6 +3424,8 @@ void register_console(struct console *newcon)
 
 	newcon->dropped = 0;
 	newcon->thread = NULL;
+	newcon->flags |= CON_THD_BLOCKED;
+	mutex_init(&newcon->lock);
 
 	if (newcon->flags & CON_PRINTBUFFER) {
 		/* Get a consistent copy of @syslog_seq. */
@@ -3363,10 +3503,13 @@ int unregister_console(struct console *console)
 	 * If this isn't the last console and it has CON_CONSDEV set, we
 	 * need to set it on the next preferred console.
 	 */
-	if (console_drivers != NULL && console->flags & CON_CONSDEV)
-		console_drivers->flags |= CON_CONSDEV;
+	if (console_drivers != NULL && console->flags & CON_CONSDEV) {
+		/* Can cause races for printk_kthread_func(). */
+		console_flags_set(console_drivers->flags, CON_CONSDEV);
+	}
 
-	console->flags &= ~CON_ENABLED;
+	/* Can cause races for printk_kthread_func(). */
+	console_flags_clear(console->flags, CON_ENABLED);
 
 	/*
 	 * console->thread can only be cleared under the console lock. But
@@ -3389,7 +3532,9 @@ int unregister_console(struct console *console)
 	return res;
 
 out_disable_unlock:
-	console->flags &= ~CON_ENABLED;
+	/* Can cause races for printk_kthread_func(). */
+	console_flags_clear(console->flags, CON_ENABLED);
+
 	console_unlock();
 
 	return res;
@@ -3586,6 +3731,19 @@ static void printk_fallback_preferred_direct(void)
 	console_unlock();
 }
 
+/*
+ * Print a record for a given console, not allowing another printk() caller
+ * to take over. This is appropriate for contexts that do not have the
+ * console_lock.
+ *
+ * See __console_emit_next_record() for argument and return details.
+ */
+static bool console_emit_next_record(struct console *con, char *text, char *ext_text,
+				     char *dropped_text)
+{
+	return __console_emit_next_record(con, text, ext_text, dropped_text, NULL);
+}
+
 static bool printer_should_wake(struct console *con, u64 seq)
 {
 	short flags;
@@ -3593,9 +3751,6 @@ static bool printer_should_wake(struct console *con, u64 seq)
 	if (kthread_should_stop() || !printk_kthreads_available)
 		return true;
 
-	if (console_suspended)
-		return false;
-
 	/*
 	 * This is an unsafe read from con->flags, but a false positive is
 	 * not a problem. Worst case it would allow the printer to wake up
@@ -3607,6 +3762,11 @@ static bool printer_should_wake(struct console *con, u64 seq)
 	if (!__console_is_usable(flags))
 		return false;
 
+	if ((flags & CON_THD_BLOCKED) ||
+	    console_kthreads_atomically_blocked()) {
+		return false;
+	}
+
 	return prb_read_valid(prb, seq, NULL);
 }
 
@@ -3615,7 +3775,7 @@ static int printk_kthread_func(void *data)
 	struct console *con = data;
 	char *dropped_text = NULL;
 	char *ext_text = NULL;
-	bool handover;
+	short flags;
 	u64 seq = 0;
 	char *text;
 	int error;
@@ -3665,15 +3825,25 @@ static int printk_kthread_func(void *data)
 		if (error)
 			continue;
 
-		console_lock();
+		error = mutex_lock_interruptible(&con->lock);
+		if (error)
+			continue;
 
-		if (console_suspended) {
-			up_console_sem();
+		/*
+		 * Reading @flags could race with console_stop(),
+		 * console_start(), or console_unregister(). READ_ONCE() is
+		 * used so that there will be a consistent value.
+		 */
+		flags = data_race(READ_ONCE(con->flags));
+
+		if (!__console_is_usable(flags)) {
+			mutex_unlock(&con->lock);
 			continue;
 		}
 
-		if (!console_is_usable(con)) {
-			__console_unlock();
+		if ((flags & CON_THD_BLOCKED) ||
+		    !console_kthread_printing_tryenter()) {
+			mutex_unlock(&con->lock);
 			continue;
 		}
 
@@ -3686,13 +3856,13 @@ static int printk_kthread_func(void *data)
 		 * which can conditionally invoke cond_resched().
 		 */
 		console_may_schedule = 0;
-		console_emit_next_record(con, text, ext_text, dropped_text, &handover);
-		if (handover)
-			continue;
+		console_emit_next_record(con, text, ext_text, dropped_text);
 
 		seq = con->seq;
 
-		__console_unlock();
+		console_kthread_printing_exit();
+
+		mutex_unlock(&con->lock);
 	}
 
 	con_printk(KERN_INFO, con, "printing thread stopped\n");
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH printk v4 15/15] printk: remove @console_locked
  2022-04-21 21:22 [PATCH printk v4 00/15] implement threaded console printing John Ogness
                   ` (13 preceding siblings ...)
  2022-04-21 21:22 ` [PATCH printk v4 14/15] printk: extend console_lock for proper kthread support John Ogness
@ 2022-04-21 21:22 ` John Ogness
  2022-04-22  9:39 ` [PATCH printk v4 00/15] implement threaded console printing Petr Mladek
  15 siblings, 0 replies; 99+ messages in thread
From: John Ogness @ 2022-04-21 21:22 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel

The static global variable @console_locked is used to help debug
VT code to make sure that certain code paths are running with
the console_lock held. However, this information is also available
with the static global variable @console_kthreads_blocked (for
locking via console_lock()), and the static global variable
@console_kthreads_active (for locking via console_trylock()).

Remove @console_locked and update is_console_locked() to use the
alternative variables.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
---
 kernel/printk/printk.c | 29 ++++++++++++++---------------
 1 file changed, 14 insertions(+), 15 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 7243a85564ef..f4a939304a12 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -340,15 +340,7 @@ static void console_kthreads_unblock(void)
 	console_kthreads_blocked = false;
 }
 
-/*
- * This is used for debugging the mess that is the VT code by
- * keeping track if we have the console semaphore held. It's
- * definitely not the perfect debug tool (we don't know if _WE_
- * hold it and are racing, but it helps tracking those weird code
- * paths in the console code where we end up in places I want
- * locked without the console semaphore held).
- */
-static int console_locked, console_suspended;
+static int console_suspended;
 
 /*
  *	Array of consoles built from command line options (console=)
@@ -2719,7 +2711,6 @@ void console_lock(void)
 	if (console_suspended)
 		return;
 	console_kthreads_block();
-	console_locked = 1;
 	console_may_schedule = 1;
 }
 EXPORT_SYMBOL(console_lock);
@@ -2744,15 +2735,26 @@ int console_trylock(void)
 		up_console_sem();
 		return 0;
 	}
-	console_locked = 1;
 	console_may_schedule = 0;
 	return 1;
 }
 EXPORT_SYMBOL(console_trylock);
 
+/*
+ * This is used to help to make sure that certain paths within the VT code are
+ * running with the console lock held. It is definitely not the perfect debug
+ * tool (it is not known if the VT code is the task holding the console lock),
+ * but it helps tracking those weird code paths in the console code such as
+ * when the console is suspended: where the console is not locked but no
+ * console printing may occur.
+ *
+ * Note: This returns true when the console is suspended but is not locked.
+ *       This is intentional because the VT code must consider that situation
+ *       the same as if the console was locked.
+ */
 int is_console_locked(void)
 {
-	return (console_locked || atomic_read(&console_kthreads_active));
+	return (console_kthreads_blocked || atomic_read(&console_kthreads_active));
 }
 EXPORT_SYMBOL(is_console_locked);
 
@@ -2808,8 +2810,6 @@ static inline bool console_is_usable(struct console *con)
 
 static void __console_unlock(void)
 {
-	console_locked = 0;
-
 	/*
 	 * Depending on whether console_lock() or console_trylock() was used,
 	 * appropriately allow the kthread printers to continue.
@@ -3127,7 +3127,6 @@ void console_unblank(void)
 	} else
 		console_lock();
 
-	console_locked = 1;
 	console_may_schedule = 0;
 	for_each_console(c)
 		if ((c->flags & CON_ENABLED) && c->unblank)
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* Re: [PATCH printk v4 14/15] printk: extend console_lock for proper kthread support
  2022-04-21 21:22 ` [PATCH printk v4 14/15] printk: extend console_lock for proper kthread support John Ogness
@ 2022-04-21 21:40   ` John Ogness
  2022-04-22  9:21   ` Petr Mladek
  2022-04-25 20:58   ` [PATCH printk v5 1/1] printk: extend console_lock for per-console locking John Ogness
  2 siblings, 0 replies; 99+ messages in thread
From: John Ogness @ 2022-04-21 21:40 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman

Hi Petr,

If v4 ends up being acceptable for linux-next, I would request you fold
a couple cosmetic changes into this patch.

On 2022-04-21, John Ogness <john.ogness@linutronix.de> wrote:
> +/*
> + * Since the kthread printers do not acquire the console_lock but do need to
> + * access @flags, they could experience races because other tasks
> + * (synchronizing using the console_lock) can modify @flags. These macros are
> + * available to at least provide atomic variable updates so that the kthread
> + * printers can see consistent values.

This last sentence is bad. It should not use the words "atomic" and
"updates". Please change it to:

    These macros are available to store the new value in a way that will
    provide consistent load values for kthread printers. Tasks using
    these macros must still do so under the console_lock.

[...]

>  EXPORT_SYMBOL(console_stop);
>  
> +

Please remove this accidental blank line.

>  void console_start(struct console *console)
>  {
>  	console_lock();
> -	console->flags |= CON_ENABLED;
> -	console_unlock();
>  
> -	/* Wake the newly enabled kthread printer. */
> -	wake_up_klogd();
> +	/* Can cause races for printk_kthread_func(). */
> +	console_flags_set(console->flags, CON_ENABLED);
>  
> +	console_unlock();
>  	__pr_flush(console, 1000, true);
>  }
>  EXPORT_SYMBOL(console_start);

John

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH printk v4 13/15] printk: add kthread console printers
  2022-04-21 21:22 ` [PATCH printk v4 13/15] printk: add kthread console printers John Ogness
@ 2022-04-22  7:48   ` Petr Mladek
  0 siblings, 0 replies; 99+ messages in thread
From: Petr Mladek @ 2022-04-22  7:48 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman

On Thu 2022-04-21 23:28:48, John Ogness wrote:
> Create a kthread for each console to perform console printing. During
> normal operation (@system_state == SYSTEM_RUNNING), the kthread
> printers are responsible for all printing on their respective
> consoles.
> 
> During non-normal operation, console printing is done as it has been:
> within the context of the printk caller or within irqwork triggered
> by the printk caller, referred to as direct printing.
> 
> Since threaded console printers are responsible for all printing
> during normal operation, this also includes messages generated via
> deferred printk calls. If direct printing is in effect during a
> deferred printk call, the queued irqwork will perform the direct
> printing. To make it clear that this is the only time that the
> irqwork will perform direct printing, rename the flag
> PRINTK_PENDING_OUTPUT to PRINTK_PENDING_DIRECT_OUTPUT.
> 
> Threaded console printers synchronize against each other and against
> console lockers by taking the console lock for each message that is
> printed.
> 
> Note that the kthread printers do not care about direct printing.
> They will always try to print if new records are available. They can
> be blocked by direct printing, but will be woken again once direct
> printing is finished.
> 
> Console unregistration is a bit tricky because the associated
> kthread printer cannot be stopped while the console lock is held.
> A policy is implemented that states: whichever task clears
> con->thread (under the console lock) is responsible for stopping
> the kthread. unregister_console() will clear con->thread while
> the console lock is held and then stop the kthread after releasing
> the console lock.
> 
> For consoles that have implemented the exit() callback, the kthread
> is stopped before exit() is called.
> 
> Signed-off-by: John Ogness <john.ogness@linutronix.de>

Looks good to me.

Reviewed-by: Petr Mladek <pmladek@suse.com>

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH printk v4 14/15] printk: extend console_lock for proper kthread support
  2022-04-21 21:22 ` [PATCH printk v4 14/15] printk: extend console_lock for proper kthread support John Ogness
  2022-04-21 21:40   ` John Ogness
@ 2022-04-22  9:21   ` Petr Mladek
  2022-04-25 20:58   ` [PATCH printk v5 1/1] printk: extend console_lock for per-console locking John Ogness
  2 siblings, 0 replies; 99+ messages in thread
From: Petr Mladek @ 2022-04-22  9:21 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman

On Thu 2022-04-21 23:28:49, John Ogness wrote:
> Currently threaded console printers synchronize against each
> other using console_lock(). However, different console drivers
> are unrelated and do not require any synchronization between
> each other. Removing the synchronization between the threaded
> console printers will allow each console to print at its own
> speed.
> 
> But the threaded consoles printers do still need to synchronize
> against console_lock() callers. Introduce a per-console mutex
> and a new console flag CON_THD_BLOCKED to provide this
> synchronization.
> 
> console_lock() is modified so that it must acquire the mutex
> of each console in order to set the CON_THD_BLOCKED flag.
> Console printing threads will acquire their mutex while
> printing a record. If CON_THD_BLOCKED was set, the thread will
> go back to sleep instead of printing.
> 
> The reason for the CON_THD_BLOCKED flag is so that
> console_lock() callers do not need to acquire multiple console
> mutexes simultaneously, which would introduce unnecessary
> complexity due to nested mutex locking.
> 
> Threaded console printers also need to synchronize against
> console_trylock() callers. Since console_trylock() may be
> called from any context, the per-console mutex cannot be used
> for this synchronization. (mutex_trylock() cannot be called
> from atomic contexts.) Introduce a global atomic counter to
> identify if any threaded printers are active. The threaded
> printers will also check the atomic counter to identify if the
> console has been locked by another task via console_trylock().
> 
> Note that @console_sem is still used to provide synchronization
> between console_lock() and console_trylock() callers.
> 
> A locking overview for console_lock(), console_trylock(), and the
> threaded printers is as follows (pseudo code):
> 
> console_lock()
> {
>         down(&console_sem);
>         for_each_console(con) {
>                 mutex_lock(&con->lock);
>                 con->flags |= CON_THD_BLOCKED;
>                 mutex_unlock(&con->lock);
>         }
>         /* console_lock acquired */
> }
> 
> console_trylock()
> {
>         if (down_trylock(&console_sem) == 0) {
>                 if (atomic_cmpxchg(&console_kthreads_active, 0, -1) == 0) {
>                         /* console_lock acquired */
>                 }
>         }
> }
> 
> threaded_printer()
> {
>         mutex_lock(&con->lock);
>         if (!(con->flags & CON_THD_BLOCKED)) {
> 		/* console_lock() callers blocked */
> 
>                 if (atomic_inc_unless_negative(&console_kthreads_active)) {
>                         /* console_trylock() callers blocked */
> 
>                         con->write();
> 
>                         atomic_dec(&console_lock_count);
>                 }
>         }
>         mutex_unlock(&con->lock);
> }
> 
> The console owner and waiter logic now only applies between contexts
> that have taken the console_lock via console_trylock(). Threaded
> printers never take the console_lock, so they do not have a
> console_lock to handover. Tasks that have used console_lock() will
> block the threaded printers using a mutex and if the console_lock
> is handed over to an atomic context, it would be unable to unblock
> the threaded printers. However, the console_trylock() case is
> really the only scenario that is interesting for handovers anyway.
> 
> @panic_console_dropped must change to atomic_t since it is no longer
> protected exclusively by the console_lock.

I have finally understood why console_lock_single_hold() solved
the problem with con->flags. We should describe it in the commit
message as well. Something like:

    @con->flags must be updated WRITE_ONCE() under console_lock
    when the related kthread is running. The kthread printers
    read the flags only under con->mutex. They have to see
    the CON_THD_BLOCKED flag when the value might not
    be consistent.

Sigh, I agree that this approach is error prone and kind of ugly.
The approach with console_lock_single_hold() was not ideal either.
I still have to think about it.

Anyway, the approach with READ_ONCE()/WRITE_ONCE() looks good enough
for now.


> Since threaded printers remain asleep if they see that the console
> is locked, they now must be explicitly woken in __console_unlock().
> This means wake_up_klogd() calls following a console_unlock() are
> no longer necessary and are removed.
> 
> Also note that threaded printers no longer need to check
> @console_suspended. The check for the CON_THD_BLOCKED flag
> implicitly covers the suspended console case.
> 
> Signed-off-by: John Ogness <john.ogness@linutronix.de>

I do not see any further problem with this patch. So, with
the updated commit message and comment above the macros:

Reviewed-by: Petr Mladek <pmladek@suse.com>

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH printk v4 00/15] implement threaded console printing
  2022-04-21 21:22 [PATCH printk v4 00/15] implement threaded console printing John Ogness
                   ` (14 preceding siblings ...)
  2022-04-21 21:22 ` [PATCH printk v4 15/15] printk: remove @console_locked John Ogness
@ 2022-04-22  9:39 ` Petr Mladek
  2022-04-22 20:29   ` Petr Mladek
  15 siblings, 1 reply; 99+ messages in thread
From: Petr Mladek @ 2022-04-22  9:39 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Andrew Morton, Randy Dunlap, Marco Elver,
	Stephen Boyd, Alexander Potapenko, Nicholas Piggin,
	Greg Kroah-Hartman, Jiri Slaby, Paul E. McKenney,
	Frederic Weisbecker, Neeraj Upadhyay, Josh Triplett,
	Mathieu Desnoyers, Lai Jiangshan, Joel Fernandes, Kees Cook,
	Luis Chamberlain, Xiaoming Ni, Peter Zijlstra, Andy Shevchenko,
	Corey Minyard, Bjorn Andersson, Sebastian Andrzej Siewior,
	Mark Brown, Daniel Lezcano, Matti Vaittinen, Dmitry Torokhov,
	Eric W. Biederman, Shawn Guo, Wang Qing, rcu

On Thu 2022-04-21 23:28:35, John Ogness wrote:
> This is v4 of a series to implement a kthread for each registered
> console. v3 is here [0]. The kthreads locklessly retrieve the
> records from the printk ringbuffer and also do not cause any lock
> contention between each other. This allows consoles to run at full
> speed. For example, a netconsole is able to dump records much
> faster than a serial or vt console. Also, during normal operation,
> printk() callers are completely decoupled from console printing.
> 
> There are situations where kthread printing is not sufficient. For
> example, during panic situations, where the kthreads may not get a
> chance to schedule. In such cases, the current method of attempting
> to print directly within the printk() caller context is used. New
> functions printk_prefer_direct_enter() and
> printk_prefer_direct_exit() are made available to mark areas of the
> kernel where direct printing is preferred. (These should only be
> areas that do not occur during normal operation.)
> 
> This series also introduces pr_flush(): a might_sleep() function
> that will block until all active printing threads have caught up
> to the latest record at the time of the pr_flush() call. This
> function is useful, for example, to wait until pending records
> are flushed to consoles before suspending.
> 
> Note that this series does *not* increase the reliability of console
> printing. Rather it focuses on the non-interference aspect of
> printk() by decoupling printk() callers from printing (during normal
> operation). Nonetheless, the reliability aspect should not worsen
> due to this series.

This version looks good enough for linux-next. I do not see any
functional problem and it should work as designed. It is time to
see how it works in various "real life" work loads.

I am going to push it later today unless anyone (John) complains ;-)

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH printk v4 00/15] implement threaded console printing
  2022-04-22  9:39 ` [PATCH printk v4 00/15] implement threaded console printing Petr Mladek
@ 2022-04-22 20:29   ` Petr Mladek
  0 siblings, 0 replies; 99+ messages in thread
From: Petr Mladek @ 2022-04-22 20:29 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Andrew Morton, Randy Dunlap, Marco Elver,
	Stephen Boyd, Alexander Potapenko, Nicholas Piggin,
	Greg Kroah-Hartman, Jiri Slaby, Paul E. McKenney,
	Frederic Weisbecker, Neeraj Upadhyay, Josh Triplett,
	Mathieu Desnoyers, Lai Jiangshan, Joel Fernandes, Kees Cook,
	Luis Chamberlain, Xiaoming Ni, Peter Zijlstra, Andy Shevchenko,
	Corey Minyard, Bjorn Andersson, Sebastian Andrzej Siewior,
	Mark Brown, Daniel Lezcano, Matti Vaittinen, Dmitry Torokhov,
	Eric W. Biederman, Shawn Guo, Wang Qing, rcu

On Fri 2022-04-22 11:39:59, Petr Mladek wrote:
> On Thu 2022-04-21 23:28:35, John Ogness wrote:
> > This is v4 of a series to implement a kthread for each registered
> > console. v3 is here [0]. The kthreads locklessly retrieve the
> > records from the printk ringbuffer and also do not cause any lock
> > contention between each other. This allows consoles to run at full
> > speed. For example, a netconsole is able to dump records much
> > faster than a serial or vt console. Also, during normal operation,
> > printk() callers are completely decoupled from console printing.
> > 
> > There are situations where kthread printing is not sufficient. For
> > example, during panic situations, where the kthreads may not get a
> > chance to schedule. In such cases, the current method of attempting
> > to print directly within the printk() caller context is used. New
> > functions printk_prefer_direct_enter() and
> > printk_prefer_direct_exit() are made available to mark areas of the
> > kernel where direct printing is preferred. (These should only be
> > areas that do not occur during normal operation.)
> > 
> > This series also introduces pr_flush(): a might_sleep() function
> > that will block until all active printing threads have caught up
> > to the latest record at the time of the pr_flush() call. This
> > function is useful, for example, to wait until pending records
> > are flushed to consoles before suspending.
> > 
> > Note that this series does *not* increase the reliability of console
> > printing. Rather it focuses on the non-interference aspect of
> > printk() by decoupling printk() callers from printing (during normal
> > operation). Nonetheless, the reliability aspect should not worsen
> > due to this series.
> 
> This version looks good enough for linux-next. I do not see any
> functional problem and it should work as designed. It is time to
> see how it works in various "real life" work loads.
> 
> I am going to push it later today unless anyone (John) complains ;-)

I have pushed the patchset into printk/linux.git, branch
rework/kthreads. Also I merged it into for-next branch.

We are still discussing better solution of the complicated locking
scheme[0]. The main purpose is to make it easier and more safe to use.
Anyway, the current code looks safe. Any potential improvements
should not affect the behavior.

So, it is time to test it in linux-next. Let's see how survives
hammering of various robots and people testing on linux-next.
I keep my fingers crossed.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 99+ messages in thread

* [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
  2022-04-21 21:22 ` [PATCH printk v4 14/15] printk: extend console_lock for proper kthread support John Ogness
  2022-04-21 21:40   ` John Ogness
  2022-04-22  9:21   ` Petr Mladek
@ 2022-04-25 20:58   ` John Ogness
  2022-04-26 12:07     ` Petr Mladek
  2022-06-22  9:03       ` Geert Uytterhoeven
  2 siblings, 2 replies; 99+ messages in thread
From: John Ogness @ 2022-04-25 20:58 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman

Currently threaded console printers synchronize against each
other using console_lock(). However, different console drivers
are unrelated and do not require any synchronization between
each other. Removing the synchronization between the threaded
console printers will allow each console to print at its own
speed.

But the threaded consoles printers do still need to synchronize
against console_lock() callers. Introduce a per-console mutex
and a new console boolean field @blocked to provide this
synchronization.

console_lock() is modified so that it must acquire the mutex
of each console in order to set the @blocked field. Console
printing threads will acquire their mutex while printing a
record. If @blocked was set, the thread will go back to sleep
instead of printing.

The reason for the @blocked boolean field is so that
console_lock() callers do not need to acquire multiple console
mutexes simultaneously, which would introduce unnecessary
complexity due to nested mutex locking. Also, a new field
was chosen instead of adding a new @flags value so that the
blocked status could be checked without concern of reading
inconsistent values due to @flags updates from other contexts.

Threaded console printers also need to synchronize against
console_trylock() callers. Since console_trylock() may be
called from any context, the per-console mutex cannot be used
for this synchronization. (mutex_trylock() cannot be called
from atomic contexts.) Introduce a global atomic counter to
identify if any threaded printers are active. The threaded
printers will also check the atomic counter to identify if the
console has been locked by another task via console_trylock().

Note that @console_sem is still used to provide synchronization
between console_lock() and console_trylock() callers.

A locking overview for console_lock(), console_trylock(), and the
threaded printers is as follows (pseudo code):

console_lock()
{
        down(&console_sem);
        for_each_console(con) {
                mutex_lock(&con->lock);
                con->blocked = true;
                mutex_unlock(&con->lock);
        }
        /* console_lock acquired */
}

console_trylock()
{
        if (down_trylock(&console_sem) == 0) {
                if (atomic_cmpxchg(&console_kthreads_active, 0, -1) == 0) {
                        /* console_lock acquired */
                }
        }
}

threaded_printer()
{
        mutex_lock(&con->lock);
        if (!con->blocked) {
		/* console_lock() callers blocked */

                if (atomic_inc_unless_negative(&console_kthreads_active)) {
                        /* console_trylock() callers blocked */

                        con->write();

                        atomic_dec(&console_lock_count);
                }
        }
        mutex_unlock(&con->lock);
}

The console owner and waiter logic now only applies between contexts
that have taken the console_lock via console_trylock(). Threaded
printers never take the console_lock, so they do not have a
console_lock to handover. Tasks that have used console_lock() will
block the threaded printers using a mutex and if the console_lock
is handed over to an atomic context, it would be unable to unblock
the threaded printers. However, the console_trylock() case is
really the only scenario that is interesting for handovers anyway.

@panic_console_dropped must change to atomic_t since it is no longer
protected exclusively by the console_lock.

Since threaded printers remain asleep if they see that the console
is locked, they now must be explicitly woken in __console_unlock().
This means wake_up_klogd() calls following a console_unlock() are
no longer necessary and are removed.

Also note that threaded printers no longer need to check
@console_suspended. The check for the @blocked field implicitly
covers the suspended console case.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
---

 Changes since v4 of this patch:

 - Use new @blocked field instead of CON_THD_BLOCKED flag.

 - Remove console_flags_set()/console_flags_clear() macros for
   updating @flags (and remove their race comments).

 - For printer_should_wake() and printk_kthread_func(), check
   @blocked before checking @flags.

 - Update commit message and comments appropriately.

 include/linux/console.h |  15 +++
 kernel/printk/printk.c  | 261 +++++++++++++++++++++++++++++++---------
 2 files changed, 220 insertions(+), 56 deletions(-)

diff --git a/include/linux/console.h b/include/linux/console.h
index 9a251e70c090..143653090c48 100644
--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -16,6 +16,7 @@
 
 #include <linux/atomic.h>
 #include <linux/types.h>
+#include <linux/mutex.h>
 
 struct vc_data;
 struct console_font_op;
@@ -154,6 +155,20 @@ struct console {
 	u64	seq;
 	unsigned long dropped;
 	struct task_struct *thread;
+	bool	blocked;
+
+	/*
+	 * The per-console lock is used by printing kthreads to synchronize
+	 * this console with callers of console_lock(). This is necessary in
+	 * order to allow printing kthreads to run in parallel to each other,
+	 * while each safely accessing the @blocked field and synchronizing
+	 * against direct printing via console_lock/console_unlock.
+	 *
+	 * Note: For synchronizing against direct printing via
+	 *       console_trylock/console_unlock, see the static global
+	 *       variable @console_kthreads_active.
+	 */
+	struct mutex lock;
 
 	void	*data;
 	struct	 console *next;
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index e4cdc424c826..750d1229cc11 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -223,6 +223,33 @@ int devkmsg_sysctl_set_loglvl(struct ctl_table *table, int write,
 /* Number of registered extended console drivers. */
 static int nr_ext_console_drivers;
 
+/*
+ * Used to synchronize printing kthreads against direct printing via
+ * console_trylock/console_unlock.
+ *
+ * Values:
+ * -1 = console kthreads atomically blocked (via global trylock)
+ *  0 = no kthread printing, console not locked (via trylock)
+ * >0 = kthread(s) actively printing
+ *
+ * Note: For synchronizing against direct printing via
+ *       console_lock/console_unlock, see the @lock variable in
+ *       struct console.
+ */
+static atomic_t console_kthreads_active = ATOMIC_INIT(0);
+
+#define console_kthreads_atomic_tryblock() \
+	(atomic_cmpxchg(&console_kthreads_active, 0, -1) == 0)
+#define console_kthreads_atomic_unblock() \
+	atomic_cmpxchg(&console_kthreads_active, -1, 0)
+#define console_kthreads_atomically_blocked() \
+	(atomic_read(&console_kthreads_active) == -1)
+
+#define console_kthread_printing_tryenter() \
+	atomic_inc_unless_negative(&console_kthreads_active)
+#define console_kthread_printing_exit() \
+	atomic_dec(&console_kthreads_active)
+
 /*
  * Helper macros to handle lockdep when locking/unlocking console_sem. We use
  * macros instead of functions so that _RET_IP_ contains useful information.
@@ -270,6 +297,49 @@ static bool panic_in_progress(void)
 	return unlikely(atomic_read(&panic_cpu) != PANIC_CPU_INVALID);
 }
 
+/*
+ * Tracks whether kthread printers are all blocked. A value of true implies
+ * that the console is locked via console_lock() or the console is suspended.
+ * Writing to this variable requires holding @console_sem.
+ */
+static bool console_kthreads_blocked;
+
+/*
+ * Block all kthread printers from a schedulable context.
+ *
+ * Requires holding @console_sem.
+ */
+static void console_kthreads_block(void)
+{
+	struct console *con;
+
+	for_each_console(con) {
+		mutex_lock(&con->lock);
+		con->blocked = true;
+		mutex_unlock(&con->lock);
+	}
+
+	console_kthreads_blocked = true;
+}
+
+/*
+ * Unblock all kthread printers from a schedulable context.
+ *
+ * Requires holding @console_sem.
+ */
+static void console_kthreads_unblock(void)
+{
+	struct console *con;
+
+	for_each_console(con) {
+		mutex_lock(&con->lock);
+		con->blocked = false;
+		mutex_unlock(&con->lock);
+	}
+
+	console_kthreads_blocked = false;
+}
+
 /*
  * This is used for debugging the mess that is the VT code by
  * keeping track if we have the console semaphore held. It's
@@ -2603,13 +2673,6 @@ void resume_console(void)
 	down_console_sem();
 	console_suspended = 0;
 	console_unlock();
-
-	/*
-	 * While suspended, new records may have been added to the
-	 * ringbuffer. Wake up the kthread printers to print them.
-	 */
-	wake_up_klogd();
-
 	pr_flush(1000, true);
 }
 
@@ -2628,9 +2691,14 @@ static int console_cpu_notify(unsigned int cpu)
 		/* If trylock fails, someone else is doing the printing */
 		if (console_trylock())
 			console_unlock();
-
-		/* Wake kthread printers. Some may have become usable. */
-		wake_up_klogd();
+		else {
+			/*
+			 * If a new CPU comes online, the conditions for
+			 * printer_should_wake() may have changed for some
+			 * kthread printer with !CON_ANYTIME.
+			 */
+			wake_up_klogd();
+		}
 	}
 	return 0;
 }
@@ -2650,6 +2718,7 @@ void console_lock(void)
 	down_console_sem();
 	if (console_suspended)
 		return;
+	console_kthreads_block();
 	console_locked = 1;
 	console_may_schedule = 1;
 }
@@ -2671,6 +2740,10 @@ int console_trylock(void)
 		up_console_sem();
 		return 0;
 	}
+	if (!console_kthreads_atomic_tryblock()) {
+		up_console_sem();
+		return 0;
+	}
 	console_locked = 1;
 	console_may_schedule = 0;
 	return 1;
@@ -2679,7 +2752,7 @@ EXPORT_SYMBOL(console_trylock);
 
 int is_console_locked(void)
 {
-	return console_locked;
+	return (console_locked || atomic_read(&console_kthreads_active));
 }
 EXPORT_SYMBOL(is_console_locked);
 
@@ -2723,7 +2796,7 @@ static inline bool __console_is_usable(short flags)
  * Check if the given console is currently capable and allowed to print
  * records.
  *
- * Requires the console_lock.
+ * Requires holding the console_lock.
  */
 static inline bool console_is_usable(struct console *con)
 {
@@ -2736,6 +2809,22 @@ static inline bool console_is_usable(struct console *con)
 static void __console_unlock(void)
 {
 	console_locked = 0;
+
+	/*
+	 * Depending on whether console_lock() or console_trylock() was used,
+	 * appropriately allow the kthread printers to continue.
+	 */
+	if (console_kthreads_blocked)
+		console_kthreads_unblock();
+	else
+		console_kthreads_atomic_unblock();
+
+	/*
+	 * New records may have arrived while the console was locked.
+	 * Wake the kthread printers to print them.
+	 */
+	wake_up_klogd();
+
 	up_console_sem();
 }
 
@@ -2753,17 +2842,19 @@ static void __console_unlock(void)
  *
  * @handover will be set to true if a printk waiter has taken over the
  * console_lock, in which case the caller is no longer holding the
- * console_lock. Otherwise it is set to false.
+ * console_lock. Otherwise it is set to false. A NULL pointer may be provided
+ * to disable allowing the console_lock to be taken over by a printk waiter.
  *
  * Returns false if the given console has no next record to print, otherwise
  * true.
  *
- * Requires the console_lock.
+ * Requires the console_lock if @handover is non-NULL.
+ * Requires con->lock otherwise.
  */
-static bool console_emit_next_record(struct console *con, char *text, char *ext_text,
-				     char *dropped_text, bool *handover)
+static bool __console_emit_next_record(struct console *con, char *text, char *ext_text,
+				       char *dropped_text, bool *handover)
 {
-	static int panic_console_dropped;
+	static atomic_t panic_console_dropped = ATOMIC_INIT(0);
 	struct printk_info info;
 	struct printk_record r;
 	unsigned long flags;
@@ -2772,7 +2863,8 @@ static bool console_emit_next_record(struct console *con, char *text, char *ext_
 
 	prb_rec_init_rd(&r, &info, text, CONSOLE_LOG_MAX);
 
-	*handover = false;
+	if (handover)
+		*handover = false;
 
 	if (!prb_read_valid(prb, con->seq, &r))
 		return false;
@@ -2780,7 +2872,8 @@ static bool console_emit_next_record(struct console *con, char *text, char *ext_
 	if (con->seq != r.info->seq) {
 		con->dropped += r.info->seq - con->seq;
 		con->seq = r.info->seq;
-		if (panic_in_progress() && panic_console_dropped++ > 10) {
+		if (panic_in_progress() &&
+		    atomic_fetch_inc_relaxed(&panic_console_dropped) > 10) {
 			suppress_panic_printk = 1;
 			pr_warn_once("Too many dropped messages. Suppress messages on non-panic CPUs to prevent livelock.\n");
 		}
@@ -2802,31 +2895,61 @@ static bool console_emit_next_record(struct console *con, char *text, char *ext_
 		len = record_print_text(&r, console_msg_format & MSG_FORMAT_SYSLOG, printk_time);
 	}
 
-	/*
-	 * While actively printing out messages, if another printk()
-	 * were to occur on another CPU, it may wait for this one to
-	 * finish. This task can not be preempted if there is a
-	 * waiter waiting to take over.
-	 *
-	 * Interrupts are disabled because the hand over to a waiter
-	 * must not be interrupted until the hand over is completed
-	 * (@console_waiter is cleared).
-	 */
-	printk_safe_enter_irqsave(flags);
-	console_lock_spinning_enable();
+	if (handover) {
+		/*
+		 * While actively printing out messages, if another printk()
+		 * were to occur on another CPU, it may wait for this one to
+		 * finish. This task can not be preempted if there is a
+		 * waiter waiting to take over.
+		 *
+		 * Interrupts are disabled because the hand over to a waiter
+		 * must not be interrupted until the hand over is completed
+		 * (@console_waiter is cleared).
+		 */
+		printk_safe_enter_irqsave(flags);
+		console_lock_spinning_enable();
+
+		/* don't trace irqsoff print latency */
+		stop_critical_timings();
+	}
 
-	stop_critical_timings();	/* don't trace print latency */
 	call_console_driver(con, write_text, len, dropped_text);
-	start_critical_timings();
 
 	con->seq++;
 
-	*handover = console_lock_spinning_disable_and_check();
-	printk_safe_exit_irqrestore(flags);
+	if (handover) {
+		start_critical_timings();
+		*handover = console_lock_spinning_disable_and_check();
+		printk_safe_exit_irqrestore(flags);
+	}
 skip:
 	return true;
 }
 
+/*
+ * Print a record for a given console, but allow another printk() caller to
+ * take over the console_lock and continue printing.
+ *
+ * Requires the console_lock, but depending on @handover after the call, the
+ * caller may no longer have the console_lock.
+ *
+ * See __console_emit_next_record() for argument and return details.
+ */
+static bool console_emit_next_record_transferable(struct console *con, char *text, char *ext_text,
+						  char *dropped_text, bool *handover)
+{
+	/*
+	 * Handovers are only supported if threaded printers are atomically
+	 * blocked. The context taking over the console_lock may be atomic.
+	 */
+	if (!console_kthreads_atomically_blocked()) {
+		*handover = false;
+		handover = NULL;
+	}
+
+	return __console_emit_next_record(con, text, ext_text, dropped_text, handover);
+}
+
 /*
  * Print out all remaining records to all consoles.
  *
@@ -2878,13 +3001,11 @@ static bool console_flush_all(bool do_cond_resched, u64 *next_seq, bool *handove
 
 			if (con->flags & CON_EXTENDED) {
 				/* Extended consoles do not print "dropped messages". */
-				progress = console_emit_next_record(con, &text[0],
-								    &ext_text[0], NULL,
-								    handover);
+				progress = console_emit_next_record_transferable(con, &text[0],
+								&ext_text[0], NULL, handover);
 			} else {
-				progress = console_emit_next_record(con, &text[0],
-								    NULL, &dropped_text[0],
-								    handover);
+				progress = console_emit_next_record_transferable(con, &text[0],
+								NULL, &dropped_text[0], handover);
 			}
 			if (*handover)
 				return false;
@@ -2999,6 +3120,10 @@ void console_unblank(void)
 	if (oops_in_progress) {
 		if (down_trylock_console_sem() != 0)
 			return;
+		if (!console_kthreads_atomic_tryblock()) {
+			up_console_sem();
+			return;
+		}
 	} else
 		console_lock();
 
@@ -3081,10 +3206,6 @@ void console_start(struct console *console)
 	console_lock();
 	console->flags |= CON_ENABLED;
 	console_unlock();
-
-	/* Wake the newly enabled kthread printer. */
-	wake_up_klogd();
-
 	__pr_flush(console, 1000, true);
 }
 EXPORT_SYMBOL(console_start);
@@ -3286,6 +3407,8 @@ void register_console(struct console *newcon)
 
 	newcon->dropped = 0;
 	newcon->thread = NULL;
+	newcon->blocked = true;
+	mutex_init(&newcon->lock);
 
 	if (newcon->flags & CON_PRINTBUFFER) {
 		/* Get a consistent copy of @syslog_seq. */
@@ -3586,6 +3709,19 @@ static void printk_fallback_preferred_direct(void)
 	console_unlock();
 }
 
+/*
+ * Print a record for a given console, not allowing another printk() caller
+ * to take over. This is appropriate for contexts that do not have the
+ * console_lock.
+ *
+ * See __console_emit_next_record() for argument and return details.
+ */
+static bool console_emit_next_record(struct console *con, char *text, char *ext_text,
+				     char *dropped_text)
+{
+	return __console_emit_next_record(con, text, ext_text, dropped_text, NULL);
+}
+
 static bool printer_should_wake(struct console *con, u64 seq)
 {
 	short flags;
@@ -3593,8 +3729,10 @@ static bool printer_should_wake(struct console *con, u64 seq)
 	if (kthread_should_stop() || !printk_kthreads_available)
 		return true;
 
-	if (console_suspended)
+	if (con->blocked ||
+	    console_kthreads_atomically_blocked()) {
 		return false;
+	}
 
 	/*
 	 * This is an unsafe read from con->flags, but a false positive is
@@ -3615,7 +3753,6 @@ static int printk_kthread_func(void *data)
 	struct console *con = data;
 	char *dropped_text = NULL;
 	char *ext_text = NULL;
-	bool handover;
 	u64 seq = 0;
 	char *text;
 	int error;
@@ -3665,15 +3802,27 @@ static int printk_kthread_func(void *data)
 		if (error)
 			continue;
 
-		console_lock();
+		error = mutex_lock_interruptible(&con->lock);
+		if (error)
+			continue;
 
-		if (console_suspended) {
-			up_console_sem();
+		if (con->blocked ||
+		    !console_kthread_printing_tryenter()) {
+			/* Another context has locked the console_lock. */
+			mutex_unlock(&con->lock);
 			continue;
 		}
 
-		if (!console_is_usable(con)) {
-			__console_unlock();
+		/*
+		 * Although this context has not locked the console_lock, it
+		 * is known that the console_lock is not locked and it is not
+		 * possible for any other context to lock the console_lock.
+		 * Therefore it is safe to read con->flags.
+		 */
+
+		if (!__console_is_usable(con->flags)) {
+			console_kthread_printing_exit();
+			mutex_unlock(&con->lock);
 			continue;
 		}
 
@@ -3686,13 +3835,13 @@ static int printk_kthread_func(void *data)
 		 * which can conditionally invoke cond_resched().
 		 */
 		console_may_schedule = 0;
-		console_emit_next_record(con, text, ext_text, dropped_text, &handover);
-		if (handover)
-			continue;
+		console_emit_next_record(con, text, ext_text, dropped_text);
 
 		seq = con->seq;
 
-		__console_unlock();
+		console_kthread_printing_exit();
+
+		mutex_unlock(&con->lock);
 	}
 
 	con_printk(KERN_INFO, con, "printing thread stopped\n");

base-commit: 09c5ba0aa2fcfdadb17d045c3ee6f86d69270df7
-- 
2.30.2

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
  2022-04-25 20:58   ` [PATCH printk v5 1/1] printk: extend console_lock for per-console locking John Ogness
@ 2022-04-26 12:07     ` Petr Mladek
  2022-04-26 13:16       ` Petr Mladek
  2022-06-22  9:03       ` Geert Uytterhoeven
  1 sibling, 1 reply; 99+ messages in thread
From: Petr Mladek @ 2022-04-26 12:07 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman

On Mon 2022-04-25 23:04:28, John Ogness wrote:
> Currently threaded console printers synchronize against each
> other using console_lock(). However, different console drivers
> are unrelated and do not require any synchronization between
> each other. Removing the synchronization between the threaded
> console printers will allow each console to print at its own
> speed.
> 
> But the threaded consoles printers do still need to synchronize
> against console_lock() callers. Introduce a per-console mutex
> and a new console boolean field @blocked to provide this
> synchronization.
> 
> console_lock() is modified so that it must acquire the mutex
> of each console in order to set the @blocked field. Console
> printing threads will acquire their mutex while printing a
> record. If @blocked was set, the thread will go back to sleep
> instead of printing.
> 
> The reason for the @blocked boolean field is so that
> console_lock() callers do not need to acquire multiple console
> mutexes simultaneously, which would introduce unnecessary
> complexity due to nested mutex locking. Also, a new field
> was chosen instead of adding a new @flags value so that the
> blocked status could be checked without concern of reading
> inconsistent values due to @flags updates from other contexts.
> 
> Threaded console printers also need to synchronize against
> console_trylock() callers. Since console_trylock() may be
> called from any context, the per-console mutex cannot be used
> for this synchronization. (mutex_trylock() cannot be called
> from atomic contexts.) Introduce a global atomic counter to
> identify if any threaded printers are active. The threaded
> printers will also check the atomic counter to identify if the
> console has been locked by another task via console_trylock().
> 
> Note that @console_sem is still used to provide synchronization
> between console_lock() and console_trylock() callers.
> 
> A locking overview for console_lock(), console_trylock(), and the
> threaded printers is as follows (pseudo code):
> 
> console_lock()
> {
>         down(&console_sem);
>         for_each_console(con) {
>                 mutex_lock(&con->lock);
>                 con->blocked = true;
>                 mutex_unlock(&con->lock);
>         }
>         /* console_lock acquired */
> }
> 
> console_trylock()
> {
>         if (down_trylock(&console_sem) == 0) {
>                 if (atomic_cmpxchg(&console_kthreads_active, 0, -1) == 0) {
>                         /* console_lock acquired */
>                 }
>         }
> }
> 
> threaded_printer()
> {
>         mutex_lock(&con->lock);
>         if (!con->blocked) {
> 		/* console_lock() callers blocked */
> 
>                 if (atomic_inc_unless_negative(&console_kthreads_active)) {
>                         /* console_trylock() callers blocked */
> 
>                         con->write();
> 
>                         atomic_dec(&console_lock_count);
>                 }
>         }
>         mutex_unlock(&con->lock);
> }
> 
> The console owner and waiter logic now only applies between contexts
> that have taken the console_lock via console_trylock(). Threaded
> printers never take the console_lock, so they do not have a
> console_lock to handover. Tasks that have used console_lock() will
> block the threaded printers using a mutex and if the console_lock
> is handed over to an atomic context, it would be unable to unblock
> the threaded printers. However, the console_trylock() case is
> really the only scenario that is interesting for handovers anyway.
> 
> @panic_console_dropped must change to atomic_t since it is no longer
> protected exclusively by the console_lock.
> 
> Since threaded printers remain asleep if they see that the console
> is locked, they now must be explicitly woken in __console_unlock().
> This means wake_up_klogd() calls following a console_unlock() are
> no longer necessary and are removed.
> 
> Also note that threaded printers no longer need to check
> @console_suspended. The check for the @blocked field implicitly
> covers the suspended console case.
> 
> Signed-off-by: John Ogness <john.ogness@linutronix.de>

Nice, it it better than v4. I am going to push this for linux-next.

Reviewed-by: Petr Mladek <pmladek@suse.com>

See below a comment about the possible future direction.

> ---
> 
>  Changes since v4 of this patch:
> 
>  - Use new @blocked field instead of CON_THD_BLOCKED flag.
> 
>  - Remove console_flags_set()/console_flags_clear() macros for
>    updating @flags (and remove their race comments).
> 
>  - For printer_should_wake() and printk_kthread_func(), check
>    @blocked before checking @flags.
> 
>  - Update commit message and comments appropriately.

Excellent work!

> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -3665,15 +3802,27 @@ static int printk_kthread_func(void *data)
>  		if (error)
>  			continue;
>  
> -		console_lock();
> +		error = mutex_lock_interruptible(&con->lock);
> +		if (error)
> +			continue;
>  
> -		if (console_suspended) {
> -			up_console_sem();
> +		if (con->blocked ||
> +		    !console_kthread_printing_tryenter()) {

It is great that you moved both conditions. I have just realized how
much information and functionality is accumulated here:

    + "con->blocked" is set when anyone else took @console_sem via
      console_lock() or when the console is suspended.

    + console_kthread_printing_tryenter() has two functions. It fails
      when anyone else took @console_sem via console_trylock().
      Also it blocks console_trylock(). Note that console_lock() is
      blocked because it has to wait for con->lock.

I missed the trylock part when proposed the more safe API in the other
thread, see https://lore.kernel.org/r/YmKnp3Ccu7laW3E4@alley

The safe single console lock would need to do something like:

/*
 * Safe way to take con->lock. It makes sure that @console_sem is
 * not taken and blocks anyone from taking @console_sem.
 */
void single_console_lock(struct console *con)
{
try_again:
	error = wait_event_interruptible(con->lock_wait,
			(!con->blocked &&
			 !console_kthreads_atomically_blocked()));

	/* Spurious wakeup */
	if (error)
		goto try_again;

	mutex_lock(&con->lock);

	/*
	 * Check is the console is blocked by @console_sem taken via
	 * console_lock() or if it is suspended.
	 */
	if (con->blocked) {
		mutex_unlock(@con->lock); 
		goto try_again;
	}

	/*
	 * Try to block console_trylock(). Otherwise, we are blocked by
	 * @console_set taken via console_trylock().
	 */
	if (!console_kthread_printing_tryenter()) {
		mutex_unlock(@con->lock); 
		goto try_again;
	}

	/*
	 * Eureka! We own @con->lock and both console_lock() and
	 * console_trylock() are blocked.
	 */
}

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
  2022-04-26 12:07     ` Petr Mladek
@ 2022-04-26 13:16       ` Petr Mladek
       [not found]         ` <CGME20220427070833eucas1p27a32ce7c41c0da26f05bd52155f0031c@eucas1p2.samsung.com>
  0 siblings, 1 reply; 99+ messages in thread
From: Petr Mladek @ 2022-04-26 13:16 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman

On Tue 2022-04-26 14:07:42, Petr Mladek wrote:
> On Mon 2022-04-25 23:04:28, John Ogness wrote:
> > Currently threaded console printers synchronize against each
> > other using console_lock(). However, different console drivers
> > are unrelated and do not require any synchronization between
> > each other. Removing the synchronization between the threaded
> > console printers will allow each console to print at its own
> > speed.
> > 
> > But the threaded consoles printers do still need to synchronize
> > against console_lock() callers. Introduce a per-console mutex
> > and a new console boolean field @blocked to provide this
> > synchronization.
> > 
> > console_lock() is modified so that it must acquire the mutex
> > of each console in order to set the @blocked field. Console
> > printing threads will acquire their mutex while printing a
> > record. If @blocked was set, the thread will go back to sleep
> > instead of printing.
> > 
> > The reason for the @blocked boolean field is so that
> > console_lock() callers do not need to acquire multiple console
> > mutexes simultaneously, which would introduce unnecessary
> > complexity due to nested mutex locking. Also, a new field
> > was chosen instead of adding a new @flags value so that the
> > blocked status could be checked without concern of reading
> > inconsistent values due to @flags updates from other contexts.
> > 
> > Threaded console printers also need to synchronize against
> > console_trylock() callers. Since console_trylock() may be
> > called from any context, the per-console mutex cannot be used
> > for this synchronization. (mutex_trylock() cannot be called
> > from atomic contexts.) Introduce a global atomic counter to
> > identify if any threaded printers are active. The threaded
> > printers will also check the atomic counter to identify if the
> > console has been locked by another task via console_trylock().
> > 
> > Note that @console_sem is still used to provide synchronization
> > between console_lock() and console_trylock() callers.
> > 
> > A locking overview for console_lock(), console_trylock(), and the
> > threaded printers is as follows (pseudo code):
> > 
> > console_lock()
> > {
> >         down(&console_sem);
> >         for_each_console(con) {
> >                 mutex_lock(&con->lock);
> >                 con->blocked = true;
> >                 mutex_unlock(&con->lock);
> >         }
> >         /* console_lock acquired */
> > }
> > 
> > console_trylock()
> > {
> >         if (down_trylock(&console_sem) == 0) {
> >                 if (atomic_cmpxchg(&console_kthreads_active, 0, -1) == 0) {
> >                         /* console_lock acquired */
> >                 }
> >         }
> > }
> > 
> > threaded_printer()
> > {
> >         mutex_lock(&con->lock);
> >         if (!con->blocked) {
> > 		/* console_lock() callers blocked */
> > 
> >                 if (atomic_inc_unless_negative(&console_kthreads_active)) {
> >                         /* console_trylock() callers blocked */
> > 
> >                         con->write();
> > 
> >                         atomic_dec(&console_lock_count);
> >                 }
> >         }
> >         mutex_unlock(&con->lock);
> > }
> > 
> > The console owner and waiter logic now only applies between contexts
> > that have taken the console_lock via console_trylock(). Threaded
> > printers never take the console_lock, so they do not have a
> > console_lock to handover. Tasks that have used console_lock() will
> > block the threaded printers using a mutex and if the console_lock
> > is handed over to an atomic context, it would be unable to unblock
> > the threaded printers. However, the console_trylock() case is
> > really the only scenario that is interesting for handovers anyway.
> > 
> > @panic_console_dropped must change to atomic_t since it is no longer
> > protected exclusively by the console_lock.
> > 
> > Since threaded printers remain asleep if they see that the console
> > is locked, they now must be explicitly woken in __console_unlock().
> > This means wake_up_klogd() calls following a console_unlock() are
> > no longer necessary and are removed.
> > 
> > Also note that threaded printers no longer need to check
> > @console_suspended. The check for the @blocked field implicitly
> > covers the suspended console case.
> > 
> > Signed-off-by: John Ogness <john.ogness@linutronix.de>
> 
> Nice, it it better than v4. I am going to push this for linux-next.
> 
> Reviewed-by: Petr Mladek <pmladek@suse.com>

JFYI, I have just pushed this patch instead of the one
from v4 into printk/linux.git, branch rework/kthreads.

It means that this branch has been rebased. It will be
used in the next refresh of linux-next.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
       [not found]         ` <CGME20220427070833eucas1p27a32ce7c41c0da26f05bd52155f0031c@eucas1p2.samsung.com>
@ 2022-04-27  7:08             ` Marek Szyprowski
  0 siblings, 0 replies; 99+ messages in thread
From: Marek Szyprowski @ 2022-04-27  7:08 UTC (permalink / raw)
  To: Petr Mladek, John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman, linux-amlogic

Hi,

On 26.04.2022 15:16, Petr Mladek wrote:
> On Tue 2022-04-26 14:07:42, Petr Mladek wrote:
>> On Mon 2022-04-25 23:04:28, John Ogness wrote:
>>> Currently threaded console printers synchronize against each
>>> other using console_lock(). However, different console drivers
>>> are unrelated and do not require any synchronization between
>>> each other. Removing the synchronization between the threaded
>>> console printers will allow each console to print at its own
>>> speed.
>>>
>>> But the threaded consoles printers do still need to synchronize
>>> against console_lock() callers. Introduce a per-console mutex
>>> and a new console boolean field @blocked to provide this
>>> synchronization.
>>>
>>> console_lock() is modified so that it must acquire the mutex
>>> of each console in order to set the @blocked field. Console
>>> printing threads will acquire their mutex while printing a
>>> record. If @blocked was set, the thread will go back to sleep
>>> instead of printing.
>>>
>>> The reason for the @blocked boolean field is so that
>>> console_lock() callers do not need to acquire multiple console
>>> mutexes simultaneously, which would introduce unnecessary
>>> complexity due to nested mutex locking. Also, a new field
>>> was chosen instead of adding a new @flags value so that the
>>> blocked status could be checked without concern of reading
>>> inconsistent values due to @flags updates from other contexts.
>>>
>>> Threaded console printers also need to synchronize against
>>> console_trylock() callers. Since console_trylock() may be
>>> called from any context, the per-console mutex cannot be used
>>> for this synchronization. (mutex_trylock() cannot be called
>>> from atomic contexts.) Introduce a global atomic counter to
>>> identify if any threaded printers are active. The threaded
>>> printers will also check the atomic counter to identify if the
>>> console has been locked by another task via console_trylock().
>>>
>>> Note that @console_sem is still used to provide synchronization
>>> between console_lock() and console_trylock() callers.
>>>
>>> A locking overview for console_lock(), console_trylock(), and the
>>> threaded printers is as follows (pseudo code):
>>>
>>> console_lock()
>>> {
>>>          down(&console_sem);
>>>          for_each_console(con) {
>>>                  mutex_lock(&con->lock);
>>>                  con->blocked = true;
>>>                  mutex_unlock(&con->lock);
>>>          }
>>>          /* console_lock acquired */
>>> }
>>>
>>> console_trylock()
>>> {
>>>          if (down_trylock(&console_sem) == 0) {
>>>                  if (atomic_cmpxchg(&console_kthreads_active, 0, -1) == 0) {
>>>                          /* console_lock acquired */
>>>                  }
>>>          }
>>> }
>>>
>>> threaded_printer()
>>> {
>>>          mutex_lock(&con->lock);
>>>          if (!con->blocked) {
>>> 		/* console_lock() callers blocked */
>>>
>>>                  if (atomic_inc_unless_negative(&console_kthreads_active)) {
>>>                          /* console_trylock() callers blocked */
>>>
>>>                          con->write();
>>>
>>>                          atomic_dec(&console_lock_count);
>>>                  }
>>>          }
>>>          mutex_unlock(&con->lock);
>>> }
>>>
>>> The console owner and waiter logic now only applies between contexts
>>> that have taken the console_lock via console_trylock(). Threaded
>>> printers never take the console_lock, so they do not have a
>>> console_lock to handover. Tasks that have used console_lock() will
>>> block the threaded printers using a mutex and if the console_lock
>>> is handed over to an atomic context, it would be unable to unblock
>>> the threaded printers. However, the console_trylock() case is
>>> really the only scenario that is interesting for handovers anyway.
>>>
>>> @panic_console_dropped must change to atomic_t since it is no longer
>>> protected exclusively by the console_lock.
>>>
>>> Since threaded printers remain asleep if they see that the console
>>> is locked, they now must be explicitly woken in __console_unlock().
>>> This means wake_up_klogd() calls following a console_unlock() are
>>> no longer necessary and are removed.
>>>
>>> Also note that threaded printers no longer need to check
>>> @console_suspended. The check for the @blocked field implicitly
>>> covers the suspended console case.
>>>
>>> Signed-off-by: John Ogness <john.ogness@linutronix.de>
>> Nice, it it better than v4. I am going to push this for linux-next.
>>
>> Reviewed-by: Petr Mladek <pmladek@suse.com>
> JFYI, I have just pushed this patch instead of the one
> from v4 into printk/linux.git, branch rework/kthreads.
>
> It means that this branch has been rebased. It will be
> used in the next refresh of linux-next.

This patchset landed in linux next-20220426. In my tests I've found that 
it causes deadlock on all my Amlogic Meson G12B/SM1 based boards: Odroid 
C4/N2 and Khadas VIM3/VIM3l. The deadlock happens when system boots to 
userspace and getty (with automated login) is executed. I even see the 
bash prompt, but then the console is freezed. Reverting this patch 
(e00cc0e1cbf4) on top of linux-next (together with 6b3d71e87892 to make 
revert clean) fixes the issue.


Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
@ 2022-04-27  7:08             ` Marek Szyprowski
  0 siblings, 0 replies; 99+ messages in thread
From: Marek Szyprowski @ 2022-04-27  7:08 UTC (permalink / raw)
  To: Petr Mladek, John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman, linux-amlogic

Hi,

On 26.04.2022 15:16, Petr Mladek wrote:
> On Tue 2022-04-26 14:07:42, Petr Mladek wrote:
>> On Mon 2022-04-25 23:04:28, John Ogness wrote:
>>> Currently threaded console printers synchronize against each
>>> other using console_lock(). However, different console drivers
>>> are unrelated and do not require any synchronization between
>>> each other. Removing the synchronization between the threaded
>>> console printers will allow each console to print at its own
>>> speed.
>>>
>>> But the threaded consoles printers do still need to synchronize
>>> against console_lock() callers. Introduce a per-console mutex
>>> and a new console boolean field @blocked to provide this
>>> synchronization.
>>>
>>> console_lock() is modified so that it must acquire the mutex
>>> of each console in order to set the @blocked field. Console
>>> printing threads will acquire their mutex while printing a
>>> record. If @blocked was set, the thread will go back to sleep
>>> instead of printing.
>>>
>>> The reason for the @blocked boolean field is so that
>>> console_lock() callers do not need to acquire multiple console
>>> mutexes simultaneously, which would introduce unnecessary
>>> complexity due to nested mutex locking. Also, a new field
>>> was chosen instead of adding a new @flags value so that the
>>> blocked status could be checked without concern of reading
>>> inconsistent values due to @flags updates from other contexts.
>>>
>>> Threaded console printers also need to synchronize against
>>> console_trylock() callers. Since console_trylock() may be
>>> called from any context, the per-console mutex cannot be used
>>> for this synchronization. (mutex_trylock() cannot be called
>>> from atomic contexts.) Introduce a global atomic counter to
>>> identify if any threaded printers are active. The threaded
>>> printers will also check the atomic counter to identify if the
>>> console has been locked by another task via console_trylock().
>>>
>>> Note that @console_sem is still used to provide synchronization
>>> between console_lock() and console_trylock() callers.
>>>
>>> A locking overview for console_lock(), console_trylock(), and the
>>> threaded printers is as follows (pseudo code):
>>>
>>> console_lock()
>>> {
>>>          down(&console_sem);
>>>          for_each_console(con) {
>>>                  mutex_lock(&con->lock);
>>>                  con->blocked = true;
>>>                  mutex_unlock(&con->lock);
>>>          }
>>>          /* console_lock acquired */
>>> }
>>>
>>> console_trylock()
>>> {
>>>          if (down_trylock(&console_sem) == 0) {
>>>                  if (atomic_cmpxchg(&console_kthreads_active, 0, -1) == 0) {
>>>                          /* console_lock acquired */
>>>                  }
>>>          }
>>> }
>>>
>>> threaded_printer()
>>> {
>>>          mutex_lock(&con->lock);
>>>          if (!con->blocked) {
>>> 		/* console_lock() callers blocked */
>>>
>>>                  if (atomic_inc_unless_negative(&console_kthreads_active)) {
>>>                          /* console_trylock() callers blocked */
>>>
>>>                          con->write();
>>>
>>>                          atomic_dec(&console_lock_count);
>>>                  }
>>>          }
>>>          mutex_unlock(&con->lock);
>>> }
>>>
>>> The console owner and waiter logic now only applies between contexts
>>> that have taken the console_lock via console_trylock(). Threaded
>>> printers never take the console_lock, so they do not have a
>>> console_lock to handover. Tasks that have used console_lock() will
>>> block the threaded printers using a mutex and if the console_lock
>>> is handed over to an atomic context, it would be unable to unblock
>>> the threaded printers. However, the console_trylock() case is
>>> really the only scenario that is interesting for handovers anyway.
>>>
>>> @panic_console_dropped must change to atomic_t since it is no longer
>>> protected exclusively by the console_lock.
>>>
>>> Since threaded printers remain asleep if they see that the console
>>> is locked, they now must be explicitly woken in __console_unlock().
>>> This means wake_up_klogd() calls following a console_unlock() are
>>> no longer necessary and are removed.
>>>
>>> Also note that threaded printers no longer need to check
>>> @console_suspended. The check for the @blocked field implicitly
>>> covers the suspended console case.
>>>
>>> Signed-off-by: John Ogness <john.ogness@linutronix.de>
>> Nice, it it better than v4. I am going to push this for linux-next.
>>
>> Reviewed-by: Petr Mladek <pmladek@suse.com>
> JFYI, I have just pushed this patch instead of the one
> from v4 into printk/linux.git, branch rework/kthreads.
>
> It means that this branch has been rebased. It will be
> used in the next refresh of linux-next.

This patchset landed in linux next-20220426. In my tests I've found that 
it causes deadlock on all my Amlogic Meson G12B/SM1 based boards: Odroid 
C4/N2 and Khadas VIM3/VIM3l. The deadlock happens when system boots to 
userspace and getty (with automated login) is executed. I even see the 
bash prompt, but then the console is freezed. Reverting this patch 
(e00cc0e1cbf4) on top of linux-next (together with 6b3d71e87892 to make 
revert clean) fixes the issue.


Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
  2022-04-27  7:08             ` Marek Szyprowski
@ 2022-04-27  7:38               ` Petr Mladek
  -1 siblings, 0 replies; 99+ messages in thread
From: Petr Mladek @ 2022-04-27  7:38 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: John Ogness, Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman, linux-amlogic

On Wed 2022-04-27 09:08:33, Marek Szyprowski wrote:
> Hi,
> 
> On 26.04.2022 15:16, Petr Mladek wrote:
> > On Tue 2022-04-26 14:07:42, Petr Mladek wrote:
> >> On Mon 2022-04-25 23:04:28, John Ogness wrote:
> >>> Currently threaded console printers synchronize against each
> >>> other using console_lock(). However, different console drivers
> >>> are unrelated and do not require any synchronization between
> >>> each other. Removing the synchronization between the threaded
> >>> console printers will allow each console to print at its own
> >>> speed.
> >>>
> >>> But the threaded consoles printers do still need to synchronize
> >>> against console_lock() callers. Introduce a per-console mutex
> >>> and a new console boolean field @blocked to provide this
> >>> synchronization.
> >>>
> >>> console_lock() is modified so that it must acquire the mutex
> >>> of each console in order to set the @blocked field. Console
> >>> printing threads will acquire their mutex while printing a
> >>> record. If @blocked was set, the thread will go back to sleep
> >>> instead of printing.
> >>>
> >>> The reason for the @blocked boolean field is so that
> >>> console_lock() callers do not need to acquire multiple console
> >>> mutexes simultaneously, which would introduce unnecessary
> >>> complexity due to nested mutex locking. Also, a new field
> >>> was chosen instead of adding a new @flags value so that the
> >>> blocked status could be checked without concern of reading
> >>> inconsistent values due to @flags updates from other contexts.
> >>>
> >>> Threaded console printers also need to synchronize against
> >>> console_trylock() callers. Since console_trylock() may be
> >>> called from any context, the per-console mutex cannot be used
> >>> for this synchronization. (mutex_trylock() cannot be called
> >>> from atomic contexts.) Introduce a global atomic counter to
> >>> identify if any threaded printers are active. The threaded
> >>> printers will also check the atomic counter to identify if the
> >>> console has been locked by another task via console_trylock().
> >>>
> >>> Note that @console_sem is still used to provide synchronization
> >>> between console_lock() and console_trylock() callers.
> >>>
> >>> A locking overview for console_lock(), console_trylock(), and the
> >>> threaded printers is as follows (pseudo code):
> >>>
> >>> console_lock()
> >>> {
> >>>          down(&console_sem);
> >>>          for_each_console(con) {
> >>>                  mutex_lock(&con->lock);
> >>>                  con->blocked = true;
> >>>                  mutex_unlock(&con->lock);
> >>>          }
> >>>          /* console_lock acquired */
> >>> }
> >>>
> >>> console_trylock()
> >>> {
> >>>          if (down_trylock(&console_sem) == 0) {
> >>>                  if (atomic_cmpxchg(&console_kthreads_active, 0, -1) == 0) {
> >>>                          /* console_lock acquired */
> >>>                  }
> >>>          }
> >>> }
> >>>
> >>> threaded_printer()
> >>> {
> >>>          mutex_lock(&con->lock);
> >>>          if (!con->blocked) {
> >>> 		/* console_lock() callers blocked */
> >>>
> >>>                  if (atomic_inc_unless_negative(&console_kthreads_active)) {
> >>>                          /* console_trylock() callers blocked */
> >>>
> >>>                          con->write();
> >>>
> >>>                          atomic_dec(&console_lock_count);
> >>>                  }
> >>>          }
> >>>          mutex_unlock(&con->lock);
> >>> }
> >>>
> >>> The console owner and waiter logic now only applies between contexts
> >>> that have taken the console_lock via console_trylock(). Threaded
> >>> printers never take the console_lock, so they do not have a
> >>> console_lock to handover. Tasks that have used console_lock() will
> >>> block the threaded printers using a mutex and if the console_lock
> >>> is handed over to an atomic context, it would be unable to unblock
> >>> the threaded printers. However, the console_trylock() case is
> >>> really the only scenario that is interesting for handovers anyway.
> >>>
> >>> @panic_console_dropped must change to atomic_t since it is no longer
> >>> protected exclusively by the console_lock.
> >>>
> >>> Since threaded printers remain asleep if they see that the console
> >>> is locked, they now must be explicitly woken in __console_unlock().
> >>> This means wake_up_klogd() calls following a console_unlock() are
> >>> no longer necessary and are removed.
> >>>
> >>> Also note that threaded printers no longer need to check
> >>> @console_suspended. The check for the @blocked field implicitly
> >>> covers the suspended console case.
> >>>
> >>> Signed-off-by: John Ogness <john.ogness@linutronix.de>
> >> Nice, it it better than v4. I am going to push this for linux-next.
> >>
> >> Reviewed-by: Petr Mladek <pmladek@suse.com>
> > JFYI, I have just pushed this patch instead of the one
> > from v4 into printk/linux.git, branch rework/kthreads.
> >
> > It means that this branch has been rebased. It will be
> > used in the next refresh of linux-next.
> 
> This patchset landed in linux next-20220426. In my tests I've found that 
> it causes deadlock on all my Amlogic Meson G12B/SM1 based boards: Odroid 
> C4/N2 and Khadas VIM3/VIM3l. The deadlock happens when system boots to 
> userspace and getty (with automated login) is executed. I even see the 
> bash prompt, but then the console is freezed. Reverting this patch 
> (e00cc0e1cbf4) on top of linux-next (together with 6b3d71e87892 to make 
> revert clean) fixes the issue.

Thanks a lot for the report!

Just by chance, do you have the log from the dead-locked boot stored
in userspace and can you share it? I mean the log stored in
/var/log/dmesg or journaltctl.

In the worst case, it might help to see log from the boot with
the reverted patch. I would help us to see the ordering of various
console-related operations on your system.

And regarding the console. Is it the graphics console (ttyX)
or a serial one (ttyS) or yet another one?

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
@ 2022-04-27  7:38               ` Petr Mladek
  0 siblings, 0 replies; 99+ messages in thread
From: Petr Mladek @ 2022-04-27  7:38 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: John Ogness, Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman, linux-amlogic

On Wed 2022-04-27 09:08:33, Marek Szyprowski wrote:
> Hi,
> 
> On 26.04.2022 15:16, Petr Mladek wrote:
> > On Tue 2022-04-26 14:07:42, Petr Mladek wrote:
> >> On Mon 2022-04-25 23:04:28, John Ogness wrote:
> >>> Currently threaded console printers synchronize against each
> >>> other using console_lock(). However, different console drivers
> >>> are unrelated and do not require any synchronization between
> >>> each other. Removing the synchronization between the threaded
> >>> console printers will allow each console to print at its own
> >>> speed.
> >>>
> >>> But the threaded consoles printers do still need to synchronize
> >>> against console_lock() callers. Introduce a per-console mutex
> >>> and a new console boolean field @blocked to provide this
> >>> synchronization.
> >>>
> >>> console_lock() is modified so that it must acquire the mutex
> >>> of each console in order to set the @blocked field. Console
> >>> printing threads will acquire their mutex while printing a
> >>> record. If @blocked was set, the thread will go back to sleep
> >>> instead of printing.
> >>>
> >>> The reason for the @blocked boolean field is so that
> >>> console_lock() callers do not need to acquire multiple console
> >>> mutexes simultaneously, which would introduce unnecessary
> >>> complexity due to nested mutex locking. Also, a new field
> >>> was chosen instead of adding a new @flags value so that the
> >>> blocked status could be checked without concern of reading
> >>> inconsistent values due to @flags updates from other contexts.
> >>>
> >>> Threaded console printers also need to synchronize against
> >>> console_trylock() callers. Since console_trylock() may be
> >>> called from any context, the per-console mutex cannot be used
> >>> for this synchronization. (mutex_trylock() cannot be called
> >>> from atomic contexts.) Introduce a global atomic counter to
> >>> identify if any threaded printers are active. The threaded
> >>> printers will also check the atomic counter to identify if the
> >>> console has been locked by another task via console_trylock().
> >>>
> >>> Note that @console_sem is still used to provide synchronization
> >>> between console_lock() and console_trylock() callers.
> >>>
> >>> A locking overview for console_lock(), console_trylock(), and the
> >>> threaded printers is as follows (pseudo code):
> >>>
> >>> console_lock()
> >>> {
> >>>          down(&console_sem);
> >>>          for_each_console(con) {
> >>>                  mutex_lock(&con->lock);
> >>>                  con->blocked = true;
> >>>                  mutex_unlock(&con->lock);
> >>>          }
> >>>          /* console_lock acquired */
> >>> }
> >>>
> >>> console_trylock()
> >>> {
> >>>          if (down_trylock(&console_sem) == 0) {
> >>>                  if (atomic_cmpxchg(&console_kthreads_active, 0, -1) == 0) {
> >>>                          /* console_lock acquired */
> >>>                  }
> >>>          }
> >>> }
> >>>
> >>> threaded_printer()
> >>> {
> >>>          mutex_lock(&con->lock);
> >>>          if (!con->blocked) {
> >>> 		/* console_lock() callers blocked */
> >>>
> >>>                  if (atomic_inc_unless_negative(&console_kthreads_active)) {
> >>>                          /* console_trylock() callers blocked */
> >>>
> >>>                          con->write();
> >>>
> >>>                          atomic_dec(&console_lock_count);
> >>>                  }
> >>>          }
> >>>          mutex_unlock(&con->lock);
> >>> }
> >>>
> >>> The console owner and waiter logic now only applies between contexts
> >>> that have taken the console_lock via console_trylock(). Threaded
> >>> printers never take the console_lock, so they do not have a
> >>> console_lock to handover. Tasks that have used console_lock() will
> >>> block the threaded printers using a mutex and if the console_lock
> >>> is handed over to an atomic context, it would be unable to unblock
> >>> the threaded printers. However, the console_trylock() case is
> >>> really the only scenario that is interesting for handovers anyway.
> >>>
> >>> @panic_console_dropped must change to atomic_t since it is no longer
> >>> protected exclusively by the console_lock.
> >>>
> >>> Since threaded printers remain asleep if they see that the console
> >>> is locked, they now must be explicitly woken in __console_unlock().
> >>> This means wake_up_klogd() calls following a console_unlock() are
> >>> no longer necessary and are removed.
> >>>
> >>> Also note that threaded printers no longer need to check
> >>> @console_suspended. The check for the @blocked field implicitly
> >>> covers the suspended console case.
> >>>
> >>> Signed-off-by: John Ogness <john.ogness@linutronix.de>
> >> Nice, it it better than v4. I am going to push this for linux-next.
> >>
> >> Reviewed-by: Petr Mladek <pmladek@suse.com>
> > JFYI, I have just pushed this patch instead of the one
> > from v4 into printk/linux.git, branch rework/kthreads.
> >
> > It means that this branch has been rebased. It will be
> > used in the next refresh of linux-next.
> 
> This patchset landed in linux next-20220426. In my tests I've found that 
> it causes deadlock on all my Amlogic Meson G12B/SM1 based boards: Odroid 
> C4/N2 and Khadas VIM3/VIM3l. The deadlock happens when system boots to 
> userspace and getty (with automated login) is executed. I even see the 
> bash prompt, but then the console is freezed. Reverting this patch 
> (e00cc0e1cbf4) on top of linux-next (together with 6b3d71e87892 to make 
> revert clean) fixes the issue.

Thanks a lot for the report!

Just by chance, do you have the log from the dead-locked boot stored
in userspace and can you share it? I mean the log stored in
/var/log/dmesg or journaltctl.

In the worst case, it might help to see log from the boot with
the reverted patch. I would help us to see the ordering of various
console-related operations on your system.

And regarding the console. Is it the graphics console (ttyX)
or a serial one (ttyS) or yet another one?

Best Regards,
Petr

_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
  2022-04-27  7:38               ` Petr Mladek
@ 2022-04-27 11:44                 ` Marek Szyprowski
  -1 siblings, 0 replies; 99+ messages in thread
From: Marek Szyprowski @ 2022-04-27 11:44 UTC (permalink / raw)
  To: Petr Mladek
  Cc: John Ogness, Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman, linux-amlogic

Hi,

On 27.04.2022 09:38, Petr Mladek wrote:
> On Wed 2022-04-27 09:08:33, Marek Szyprowski wrote:
>> On 26.04.2022 15:16, Petr Mladek wrote:
>>> On Tue 2022-04-26 14:07:42, Petr Mladek wrote:
>>>> On Mon 2022-04-25 23:04:28, John Ogness wrote:
>>>>> Currently threaded console printers synchronize against each
>>>>> other using console_lock(). However, different console drivers
>>>>> are unrelated and do not require any synchronization between
>>>>> each other. Removing the synchronization between the threaded
>>>>> console printers will allow each console to print at its own
>>>>> speed.
>>>>>
>>>>> But the threaded consoles printers do still need to synchronize
>>>>> against console_lock() callers. Introduce a per-console mutex
>>>>> and a new console boolean field @blocked to provide this
>>>>> synchronization.
>>>>>
>>>>> console_lock() is modified so that it must acquire the mutex
>>>>> of each console in order to set the @blocked field. Console
>>>>> printing threads will acquire their mutex while printing a
>>>>> record. If @blocked was set, the thread will go back to sleep
>>>>> instead of printing.
>>>>>
>>>>> The reason for the @blocked boolean field is so that
>>>>> console_lock() callers do not need to acquire multiple console
>>>>> mutexes simultaneously, which would introduce unnecessary
>>>>> complexity due to nested mutex locking. Also, a new field
>>>>> was chosen instead of adding a new @flags value so that the
>>>>> blocked status could be checked without concern of reading
>>>>> inconsistent values due to @flags updates from other contexts.
>>>>>
>>>>> Threaded console printers also need to synchronize against
>>>>> console_trylock() callers. Since console_trylock() may be
>>>>> called from any context, the per-console mutex cannot be used
>>>>> for this synchronization. (mutex_trylock() cannot be called
>>>>> from atomic contexts.) Introduce a global atomic counter to
>>>>> identify if any threaded printers are active. The threaded
>>>>> printers will also check the atomic counter to identify if the
>>>>> console has been locked by another task via console_trylock().
>>>>>
>>>>> Note that @console_sem is still used to provide synchronization
>>>>> between console_lock() and console_trylock() callers.
>>>>>
>>>>> A locking overview for console_lock(), console_trylock(), and the
>>>>> threaded printers is as follows (pseudo code):
>>>>>
>>>>> console_lock()
>>>>> {
>>>>>           down(&console_sem);
>>>>>           for_each_console(con) {
>>>>>                   mutex_lock(&con->lock);
>>>>>                   con->blocked = true;
>>>>>                   mutex_unlock(&con->lock);
>>>>>           }
>>>>>           /* console_lock acquired */
>>>>> }
>>>>>
>>>>> console_trylock()
>>>>> {
>>>>>           if (down_trylock(&console_sem) == 0) {
>>>>>                   if (atomic_cmpxchg(&console_kthreads_active, 0, -1) == 0) {
>>>>>                           /* console_lock acquired */
>>>>>                   }
>>>>>           }
>>>>> }
>>>>>
>>>>> threaded_printer()
>>>>> {
>>>>>           mutex_lock(&con->lock);
>>>>>           if (!con->blocked) {
>>>>> 		/* console_lock() callers blocked */
>>>>>
>>>>>                   if (atomic_inc_unless_negative(&console_kthreads_active)) {
>>>>>                           /* console_trylock() callers blocked */
>>>>>
>>>>>                           con->write();
>>>>>
>>>>>                           atomic_dec(&console_lock_count);
>>>>>                   }
>>>>>           }
>>>>>           mutex_unlock(&con->lock);
>>>>> }
>>>>>
>>>>> The console owner and waiter logic now only applies between contexts
>>>>> that have taken the console_lock via console_trylock(). Threaded
>>>>> printers never take the console_lock, so they do not have a
>>>>> console_lock to handover. Tasks that have used console_lock() will
>>>>> block the threaded printers using a mutex and if the console_lock
>>>>> is handed over to an atomic context, it would be unable to unblock
>>>>> the threaded printers. However, the console_trylock() case is
>>>>> really the only scenario that is interesting for handovers anyway.
>>>>>
>>>>> @panic_console_dropped must change to atomic_t since it is no longer
>>>>> protected exclusively by the console_lock.
>>>>>
>>>>> Since threaded printers remain asleep if they see that the console
>>>>> is locked, they now must be explicitly woken in __console_unlock().
>>>>> This means wake_up_klogd() calls following a console_unlock() are
>>>>> no longer necessary and are removed.
>>>>>
>>>>> Also note that threaded printers no longer need to check
>>>>> @console_suspended. The check for the @blocked field implicitly
>>>>> covers the suspended console case.
>>>>>
>>>>> Signed-off-by: John Ogness <john.ogness@linutronix.de>
>>>> Nice, it it better than v4. I am going to push this for linux-next.
>>>>
>>>> Reviewed-by: Petr Mladek <pmladek@suse.com>
>>> JFYI, I have just pushed this patch instead of the one
>>> from v4 into printk/linux.git, branch rework/kthreads.
>>>
>>> It means that this branch has been rebased. It will be
>>> used in the next refresh of linux-next.
>> This patchset landed in linux next-20220426. In my tests I've found that
>> it causes deadlock on all my Amlogic Meson G12B/SM1 based boards: Odroid
>> C4/N2 and Khadas VIM3/VIM3l. The deadlock happens when system boots to
>> userspace and getty (with automated login) is executed. I even see the
>> bash prompt, but then the console is freezed. Reverting this patch
>> (e00cc0e1cbf4) on top of linux-next (together with 6b3d71e87892 to make
>> revert clean) fixes the issue.
> Thanks a lot for the report!
>
> Just by chance, do you have the log from the dead-locked boot stored
> in userspace and can you share it? I mean the log stored in
> /var/log/dmesg or journaltctl.

If there would be any messages, I expect them to be visible on the 
serial kernel console.

> In the worst case, it might help to see log from the boot with
> the reverted patch. I would help us to see the ordering of various
> console-related operations on your system.
>
> And regarding the console. Is it the graphics console (ttyX)
> or a serial one (ttyS) or yet another one?

Serial console, /dev/ttyAML0 with kernel console enabled. Later a DRM 
driver is loaded (meson_drm), which initializes its fbdev emulation with 
its console.

However it looks that I've trusted automatic bisect a bit too much and 
had a bit of luck while checking the reverts on top of linux-next. The 
issue is not 100% reproducible, so I've did this bisection again 
manually with more tries. The real commit causing the issue is 
09c5ba0aa2fc ("printk: add kthread console printers"). Reverting the 
following 3 commits 6b3d71e878920b085dd823bc422951bb6f143505, 
e00cc0e1cbf4ea5a63d66c8de8d79519855fb231 and 
09c5ba0aa2fcfdadb17d045c3ee6f86d69270df7 on top of linux-next makes the 
system fully operational again.

I've also tried to disable the DRM driver and its fbdev and console (by 
adding modprobe.blacklist=meson_drm to kernel cmdline), but this didn't 
help. Here is the full serial console log:

https://pastebin.com/E5CDH88L

If there is anything you would like me to try, let me know.


Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
@ 2022-04-27 11:44                 ` Marek Szyprowski
  0 siblings, 0 replies; 99+ messages in thread
From: Marek Szyprowski @ 2022-04-27 11:44 UTC (permalink / raw)
  To: Petr Mladek
  Cc: John Ogness, Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman, linux-amlogic

Hi,

On 27.04.2022 09:38, Petr Mladek wrote:
> On Wed 2022-04-27 09:08:33, Marek Szyprowski wrote:
>> On 26.04.2022 15:16, Petr Mladek wrote:
>>> On Tue 2022-04-26 14:07:42, Petr Mladek wrote:
>>>> On Mon 2022-04-25 23:04:28, John Ogness wrote:
>>>>> Currently threaded console printers synchronize against each
>>>>> other using console_lock(). However, different console drivers
>>>>> are unrelated and do not require any synchronization between
>>>>> each other. Removing the synchronization between the threaded
>>>>> console printers will allow each console to print at its own
>>>>> speed.
>>>>>
>>>>> But the threaded consoles printers do still need to synchronize
>>>>> against console_lock() callers. Introduce a per-console mutex
>>>>> and a new console boolean field @blocked to provide this
>>>>> synchronization.
>>>>>
>>>>> console_lock() is modified so that it must acquire the mutex
>>>>> of each console in order to set the @blocked field. Console
>>>>> printing threads will acquire their mutex while printing a
>>>>> record. If @blocked was set, the thread will go back to sleep
>>>>> instead of printing.
>>>>>
>>>>> The reason for the @blocked boolean field is so that
>>>>> console_lock() callers do not need to acquire multiple console
>>>>> mutexes simultaneously, which would introduce unnecessary
>>>>> complexity due to nested mutex locking. Also, a new field
>>>>> was chosen instead of adding a new @flags value so that the
>>>>> blocked status could be checked without concern of reading
>>>>> inconsistent values due to @flags updates from other contexts.
>>>>>
>>>>> Threaded console printers also need to synchronize against
>>>>> console_trylock() callers. Since console_trylock() may be
>>>>> called from any context, the per-console mutex cannot be used
>>>>> for this synchronization. (mutex_trylock() cannot be called
>>>>> from atomic contexts.) Introduce a global atomic counter to
>>>>> identify if any threaded printers are active. The threaded
>>>>> printers will also check the atomic counter to identify if the
>>>>> console has been locked by another task via console_trylock().
>>>>>
>>>>> Note that @console_sem is still used to provide synchronization
>>>>> between console_lock() and console_trylock() callers.
>>>>>
>>>>> A locking overview for console_lock(), console_trylock(), and the
>>>>> threaded printers is as follows (pseudo code):
>>>>>
>>>>> console_lock()
>>>>> {
>>>>>           down(&console_sem);
>>>>>           for_each_console(con) {
>>>>>                   mutex_lock(&con->lock);
>>>>>                   con->blocked = true;
>>>>>                   mutex_unlock(&con->lock);
>>>>>           }
>>>>>           /* console_lock acquired */
>>>>> }
>>>>>
>>>>> console_trylock()
>>>>> {
>>>>>           if (down_trylock(&console_sem) == 0) {
>>>>>                   if (atomic_cmpxchg(&console_kthreads_active, 0, -1) == 0) {
>>>>>                           /* console_lock acquired */
>>>>>                   }
>>>>>           }
>>>>> }
>>>>>
>>>>> threaded_printer()
>>>>> {
>>>>>           mutex_lock(&con->lock);
>>>>>           if (!con->blocked) {
>>>>> 		/* console_lock() callers blocked */
>>>>>
>>>>>                   if (atomic_inc_unless_negative(&console_kthreads_active)) {
>>>>>                           /* console_trylock() callers blocked */
>>>>>
>>>>>                           con->write();
>>>>>
>>>>>                           atomic_dec(&console_lock_count);
>>>>>                   }
>>>>>           }
>>>>>           mutex_unlock(&con->lock);
>>>>> }
>>>>>
>>>>> The console owner and waiter logic now only applies between contexts
>>>>> that have taken the console_lock via console_trylock(). Threaded
>>>>> printers never take the console_lock, so they do not have a
>>>>> console_lock to handover. Tasks that have used console_lock() will
>>>>> block the threaded printers using a mutex and if the console_lock
>>>>> is handed over to an atomic context, it would be unable to unblock
>>>>> the threaded printers. However, the console_trylock() case is
>>>>> really the only scenario that is interesting for handovers anyway.
>>>>>
>>>>> @panic_console_dropped must change to atomic_t since it is no longer
>>>>> protected exclusively by the console_lock.
>>>>>
>>>>> Since threaded printers remain asleep if they see that the console
>>>>> is locked, they now must be explicitly woken in __console_unlock().
>>>>> This means wake_up_klogd() calls following a console_unlock() are
>>>>> no longer necessary and are removed.
>>>>>
>>>>> Also note that threaded printers no longer need to check
>>>>> @console_suspended. The check for the @blocked field implicitly
>>>>> covers the suspended console case.
>>>>>
>>>>> Signed-off-by: John Ogness <john.ogness@linutronix.de>
>>>> Nice, it it better than v4. I am going to push this for linux-next.
>>>>
>>>> Reviewed-by: Petr Mladek <pmladek@suse.com>
>>> JFYI, I have just pushed this patch instead of the one
>>> from v4 into printk/linux.git, branch rework/kthreads.
>>>
>>> It means that this branch has been rebased. It will be
>>> used in the next refresh of linux-next.
>> This patchset landed in linux next-20220426. In my tests I've found that
>> it causes deadlock on all my Amlogic Meson G12B/SM1 based boards: Odroid
>> C4/N2 and Khadas VIM3/VIM3l. The deadlock happens when system boots to
>> userspace and getty (with automated login) is executed. I even see the
>> bash prompt, but then the console is freezed. Reverting this patch
>> (e00cc0e1cbf4) on top of linux-next (together with 6b3d71e87892 to make
>> revert clean) fixes the issue.
> Thanks a lot for the report!
>
> Just by chance, do you have the log from the dead-locked boot stored
> in userspace and can you share it? I mean the log stored in
> /var/log/dmesg or journaltctl.

If there would be any messages, I expect them to be visible on the 
serial kernel console.

> In the worst case, it might help to see log from the boot with
> the reverted patch. I would help us to see the ordering of various
> console-related operations on your system.
>
> And regarding the console. Is it the graphics console (ttyX)
> or a serial one (ttyS) or yet another one?

Serial console, /dev/ttyAML0 with kernel console enabled. Later a DRM 
driver is loaded (meson_drm), which initializes its fbdev emulation with 
its console.

However it looks that I've trusted automatic bisect a bit too much and 
had a bit of luck while checking the reverts on top of linux-next. The 
issue is not 100% reproducible, so I've did this bisection again 
manually with more tries. The real commit causing the issue is 
09c5ba0aa2fc ("printk: add kthread console printers"). Reverting the 
following 3 commits 6b3d71e878920b085dd823bc422951bb6f143505, 
e00cc0e1cbf4ea5a63d66c8de8d79519855fb231 and 
09c5ba0aa2fcfdadb17d045c3ee6f86d69270df7 on top of linux-next makes the 
system fully operational again.

I've also tried to disable the DRM driver and its fbdev and console (by 
adding modprobe.blacklist=meson_drm to kernel cmdline), but this didn't 
help. Here is the full serial console log:

https://pastebin.com/E5CDH88L

If there is anything you would like me to try, let me know.


Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
  2022-04-27 11:44                 ` Marek Szyprowski
@ 2022-04-27 16:15                   ` John Ogness
  -1 siblings, 0 replies; 99+ messages in thread
From: John Ogness @ 2022-04-27 16:15 UTC (permalink / raw)
  To: Marek Szyprowski, Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman, linux-amlogic

Hi Marek,

On 2022-04-27, Marek Szyprowski <m.szyprowski@samsung.com> wrote:
> Here is the full serial console log:
>
> https://pastebin.com/E5CDH88L

Here are a few ideas from me:

1. For next-20220427 the printk-threaded series was slightly changed. I
do not expect it to work any different, but I would prefer we are
debugging the current version. If possible, could you move to
next-20220427?

2. I noticed you boot with the kernel boot arguments "earlycon" and
"no_console_suspend". Could you try booting without this? I expect this
will make no difference.

3. It looks like the problem happens quite late in the boot process. I
expect it is due to some userspace process that is running that is
interacting with printk (either /dev/kmsg or /proc/kmsg) and is causing
problems. If you boot with init=/bin/sh then I expect the system is
running fine. (You don't have much of a system running, but it should
not hang.) We need to isolate which userspace process is triggering the
issue.

4. Have you tried issuing magic sysrq commands on the serial line? (For
example, sending a break signal and then the letter 't' or sending a
break signal and then the letter 'c'?) That might trigger various dumps
so that we can see the system state.

5. You are not running a VT console, so the graphics driver should not
be affecting the printk subsystem at all. I expect your autologin is
also starting various services and programs. If you disable the
automatic login and instead manually login (perhaps as another user) can
you manually start those services one at a time to see at what point the
system hangs?

Thanks for you help with this!

John Ogness

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
@ 2022-04-27 16:15                   ` John Ogness
  0 siblings, 0 replies; 99+ messages in thread
From: John Ogness @ 2022-04-27 16:15 UTC (permalink / raw)
  To: Marek Szyprowski, Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman, linux-amlogic

Hi Marek,

On 2022-04-27, Marek Szyprowski <m.szyprowski@samsung.com> wrote:
> Here is the full serial console log:
>
> https://pastebin.com/E5CDH88L

Here are a few ideas from me:

1. For next-20220427 the printk-threaded series was slightly changed. I
do not expect it to work any different, but I would prefer we are
debugging the current version. If possible, could you move to
next-20220427?

2. I noticed you boot with the kernel boot arguments "earlycon" and
"no_console_suspend". Could you try booting without this? I expect this
will make no difference.

3. It looks like the problem happens quite late in the boot process. I
expect it is due to some userspace process that is running that is
interacting with printk (either /dev/kmsg or /proc/kmsg) and is causing
problems. If you boot with init=/bin/sh then I expect the system is
running fine. (You don't have much of a system running, but it should
not hang.) We need to isolate which userspace process is triggering the
issue.

4. Have you tried issuing magic sysrq commands on the serial line? (For
example, sending a break signal and then the letter 't' or sending a
break signal and then the letter 'c'?) That might trigger various dumps
so that we can see the system state.

5. You are not running a VT console, so the graphics driver should not
be affecting the printk subsystem at all. I expect your autologin is
also starting various services and programs. If you disable the
automatic login and instead manually login (perhaps as another user) can
you manually start those services one at a time to see at what point the
system hangs?

Thanks for you help with this!

John Ogness

_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
  2022-04-27 16:15                   ` John Ogness
@ 2022-04-27 16:48                     ` Petr Mladek
  -1 siblings, 0 replies; 99+ messages in thread
From: Petr Mladek @ 2022-04-27 16:48 UTC (permalink / raw)
  To: John Ogness
  Cc: Marek Szyprowski, Sergey Senozhatsky, Steven Rostedt,
	Thomas Gleixner, linux-kernel, Greg Kroah-Hartman, linux-amlogic

On Wed 2022-04-27 18:21:16, John Ogness wrote:
> Hi Marek,
> 
> On 2022-04-27, Marek Szyprowski <m.szyprowski@samsung.com> wrote:
> > Here is the full serial console log:
> >
> > https://pastebin.com/E5CDH88L
> 
> Here are a few ideas from me:
> 
> 1. For next-20220427 the printk-threaded series was slightly changed. I
> do not expect it to work any different, but I would prefer we are
> debugging the current version. If possible, could you move to
> next-20220427?
> 
> 2. I noticed you boot with the kernel boot arguments "earlycon" and
> "no_console_suspend". Could you try booting without this? I expect this
> will make no difference.
> 
> 3. It looks like the problem happens quite late in the boot process. I
> expect it is due to some userspace process that is running that is
> interacting with printk (either /dev/kmsg or /proc/kmsg) and is causing
> problems. If you boot with init=/bin/sh then I expect the system is
> running fine. (You don't have much of a system running, but it should
> not hang.) We need to isolate which userspace process is triggering the
> issue.

Interesting idea.


> 4. Have you tried issuing magic sysrq commands on the serial line? (For
> example, sending a break signal and then the letter 't' or sending a
> break signal and then the letter 'c'?) That might trigger various dumps
> so that we can see the system state.

I see that sshd is started. If you are able to connect the system
with the frozen login via ssh then it might be easier to trigger
sysrq via procfs, for example:

  #> echo t >/proc/sysrq-trigger

"sysrq t" should print state of all processes. It might show what process
is hanging and where.


> 5. You are not running a VT console, so the graphics driver should not
> be affecting the printk subsystem at all. I expect your autologin is
> also starting various services and programs. If you disable the
> automatic login and instead manually login (perhaps as another user) can
> you manually start those services one at a time to see at what point the
> system hangs?

Yeah, I am not able to reproduce it and some more clues would help.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
@ 2022-04-27 16:48                     ` Petr Mladek
  0 siblings, 0 replies; 99+ messages in thread
From: Petr Mladek @ 2022-04-27 16:48 UTC (permalink / raw)
  To: John Ogness
  Cc: Marek Szyprowski, Sergey Senozhatsky, Steven Rostedt,
	Thomas Gleixner, linux-kernel, Greg Kroah-Hartman, linux-amlogic

On Wed 2022-04-27 18:21:16, John Ogness wrote:
> Hi Marek,
> 
> On 2022-04-27, Marek Szyprowski <m.szyprowski@samsung.com> wrote:
> > Here is the full serial console log:
> >
> > https://pastebin.com/E5CDH88L
> 
> Here are a few ideas from me:
> 
> 1. For next-20220427 the printk-threaded series was slightly changed. I
> do not expect it to work any different, but I would prefer we are
> debugging the current version. If possible, could you move to
> next-20220427?
> 
> 2. I noticed you boot with the kernel boot arguments "earlycon" and
> "no_console_suspend". Could you try booting without this? I expect this
> will make no difference.
> 
> 3. It looks like the problem happens quite late in the boot process. I
> expect it is due to some userspace process that is running that is
> interacting with printk (either /dev/kmsg or /proc/kmsg) and is causing
> problems. If you boot with init=/bin/sh then I expect the system is
> running fine. (You don't have much of a system running, but it should
> not hang.) We need to isolate which userspace process is triggering the
> issue.

Interesting idea.


> 4. Have you tried issuing magic sysrq commands on the serial line? (For
> example, sending a break signal and then the letter 't' or sending a
> break signal and then the letter 'c'?) That might trigger various dumps
> so that we can see the system state.

I see that sshd is started. If you are able to connect the system
with the frozen login via ssh then it might be easier to trigger
sysrq via procfs, for example:

  #> echo t >/proc/sysrq-trigger

"sysrq t" should print state of all processes. It might show what process
is hanging and where.


> 5. You are not running a VT console, so the graphics driver should not
> be affecting the printk subsystem at all. I expect your autologin is
> also starting various services and programs. If you disable the
> automatic login and instead manually login (perhaps as another user) can
> you manually start those services one at a time to see at what point the
> system hangs?

Yeah, I am not able to reproduce it and some more clues would help.

Best Regards,
Petr

_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
  2022-04-27 16:15                   ` John Ogness
@ 2022-04-28 14:54                     ` Petr Mladek
  -1 siblings, 0 replies; 99+ messages in thread
From: Petr Mladek @ 2022-04-28 14:54 UTC (permalink / raw)
  To: John Ogness
  Cc: Marek Szyprowski, Sergey Senozhatsky, Steven Rostedt,
	Thomas Gleixner, linux-kernel, Greg Kroah-Hartman, linux-amlogic

On Wed 2022-04-27 18:21:16, John Ogness wrote:
> Hi Marek,
> 
> On 2022-04-27, Marek Szyprowski <m.szyprowski@samsung.com> wrote:
> > Here is the full serial console log:
> >
> > https://pastebin.com/E5CDH88L
> 
> Here are a few ideas from me:
> 
> 3. It looks like the problem happens quite late in the boot process. I
> expect it is due to some userspace process that is running that is
> interacting with printk (either /dev/kmsg or /proc/kmsg) and is causing
> problems.

I did not find an real issue in the code handling /dev/kmsg,
/proc/kmsg, or syslog sycall API. There might be just few
small changes:

    1. There is an increased number of spurious wakeups because
       log_wait is shared between upstream readers and printk kthreads.
       And we newly wake up waiters from both vprintk_emit()
       and __console_unlock() code paths.

       It might affect especially the pooling APIs: kmsg_pool(),
       devkmsg_pool()). They might return 0 more often than before.


    2. 4th patch replaced wake_up_interruptible(&log_wait) with
       wake_up_interruptible_all(&log_wait). As a result, all
       readers are woken at the same time.

       It is perfectly fine because the log buffer is lockless.
       And all readers should be either independent or synchronized
       against each other.


Any of the above changes should not introduce new problems. But
they might make some old problem (race) more visible.

I spent quite some time reviewing the code and testing. But I neither
see any problem nor I am able to reproduce it. Some more clues
from Marek would be needed.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
@ 2022-04-28 14:54                     ` Petr Mladek
  0 siblings, 0 replies; 99+ messages in thread
From: Petr Mladek @ 2022-04-28 14:54 UTC (permalink / raw)
  To: John Ogness
  Cc: Marek Szyprowski, Sergey Senozhatsky, Steven Rostedt,
	Thomas Gleixner, linux-kernel, Greg Kroah-Hartman, linux-amlogic

On Wed 2022-04-27 18:21:16, John Ogness wrote:
> Hi Marek,
> 
> On 2022-04-27, Marek Szyprowski <m.szyprowski@samsung.com> wrote:
> > Here is the full serial console log:
> >
> > https://pastebin.com/E5CDH88L
> 
> Here are a few ideas from me:
> 
> 3. It looks like the problem happens quite late in the boot process. I
> expect it is due to some userspace process that is running that is
> interacting with printk (either /dev/kmsg or /proc/kmsg) and is causing
> problems.

I did not find an real issue in the code handling /dev/kmsg,
/proc/kmsg, or syslog sycall API. There might be just few
small changes:

    1. There is an increased number of spurious wakeups because
       log_wait is shared between upstream readers and printk kthreads.
       And we newly wake up waiters from both vprintk_emit()
       and __console_unlock() code paths.

       It might affect especially the pooling APIs: kmsg_pool(),
       devkmsg_pool()). They might return 0 more often than before.


    2. 4th patch replaced wake_up_interruptible(&log_wait) with
       wake_up_interruptible_all(&log_wait). As a result, all
       readers are woken at the same time.

       It is perfectly fine because the log buffer is lockless.
       And all readers should be either independent or synchronized
       against each other.


Any of the above changes should not introduce new problems. But
they might make some old problem (race) more visible.

I spent quite some time reviewing the code and testing. But I neither
see any problem nor I am able to reproduce it. Some more clues
from Marek would be needed.

Best Regards,
Petr

_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
  2022-04-27 16:15                   ` John Ogness
@ 2022-04-29 13:53                     ` Marek Szyprowski
  -1 siblings, 0 replies; 99+ messages in thread
From: Marek Szyprowski @ 2022-04-29 13:53 UTC (permalink / raw)
  To: John Ogness, Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman, linux-amlogic

Hi John,

On 27.04.2022 18:15, John Ogness wrote:
> On 2022-04-27, Marek Szyprowski <m.szyprowski@samsung.com> wrote:
>> Here is the full serial console log:
>>
>> https://protect2.fireeye.com/v1/url?k=087c101e-57e728e3-087d9b51-000babff317b-69d8576a8b9d481f&q=1&e=5f72c413-9d23-4e64-98e4-377fcc2038de&u=https%3A%2F%2Fpastebin.com%2FE5CDH88L
> Here are a few ideas from me:
>
> 1. For next-20220427 the printk-threaded series was slightly changed. I
> do not expect it to work any different, but I would prefer we are
> debugging the current version. If possible, could you move to
> next-20220427?

I've moved to next-20220429. Nothing changed compared to next-20220427.


> 2. I noticed you boot with the kernel boot arguments "earlycon" and
> "no_console_suspend". Could you try booting without this? I expect this
> will make no difference.

Well, nothing changed.


> 3. It looks like the problem happens quite late in the boot process. I
> expect it is due to some userspace process that is running that is
> interacting with printk (either /dev/kmsg or /proc/kmsg) and is causing
> problems. If you boot with init=/bin/sh then I expect the system is
> running fine. (You don't have much of a system running, but it should
> not hang.) We need to isolate which userspace process is triggering the
> issue.

The same issue happens if I boot with init=/bin/bash


> 4. Have you tried issuing magic sysrq commands on the serial line? (For
> example, sending a break signal and then the letter 't' or sending a
> break signal and then the letter 'c'?) That might trigger various dumps
> so that we can see the system state.
>
> 5. You are not running a VT console, so the graphics driver should not
> be affecting the printk subsystem at all. I expect your autologin is
> also starting various services and programs. If you disable the
> automatic login and instead manually login (perhaps as another user) can
> you manually start those services one at a time to see at what point the
> system hangs?
>
> Thanks for you help with this!

I found something really interesting. When lockup happens, I'm still 
able to log via ssh and trigger any magic sysrq action via 
/proc/sysrq-trigger (triggering it from UART console via break doesn't 
work).

It turned out that the UART console is somehow blocked, but it receives 
and buffers all the input. For example after issuing "echo 
 >/proc/sysrq-trigger" from the ssh console, the UART console has been 
updated and I see the magic sysrq banner and then all the commands I 
blindly typed in the UART console! However this doesn't unblock the console.

Here is the output of 't' magic sys request:

https://pastebin.com/fjbRuy4f

If you have any more suggestion what to check let me know.

This issue must be somehow related to the way the UART driver works on 
the Amlogic Meson boards. The other boards based on different SoCs 
(Exynos, QCOM, BCM) I have in my test farm (with the same userspace and 
configuration) work fine with those patches.

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
@ 2022-04-29 13:53                     ` Marek Szyprowski
  0 siblings, 0 replies; 99+ messages in thread
From: Marek Szyprowski @ 2022-04-29 13:53 UTC (permalink / raw)
  To: John Ogness, Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman, linux-amlogic

Hi John,

On 27.04.2022 18:15, John Ogness wrote:
> On 2022-04-27, Marek Szyprowski <m.szyprowski@samsung.com> wrote:
>> Here is the full serial console log:
>>
>> https://protect2.fireeye.com/v1/url?k=087c101e-57e728e3-087d9b51-000babff317b-69d8576a8b9d481f&q=1&e=5f72c413-9d23-4e64-98e4-377fcc2038de&u=https%3A%2F%2Fpastebin.com%2FE5CDH88L
> Here are a few ideas from me:
>
> 1. For next-20220427 the printk-threaded series was slightly changed. I
> do not expect it to work any different, but I would prefer we are
> debugging the current version. If possible, could you move to
> next-20220427?

I've moved to next-20220429. Nothing changed compared to next-20220427.


> 2. I noticed you boot with the kernel boot arguments "earlycon" and
> "no_console_suspend". Could you try booting without this? I expect this
> will make no difference.

Well, nothing changed.


> 3. It looks like the problem happens quite late in the boot process. I
> expect it is due to some userspace process that is running that is
> interacting with printk (either /dev/kmsg or /proc/kmsg) and is causing
> problems. If you boot with init=/bin/sh then I expect the system is
> running fine. (You don't have much of a system running, but it should
> not hang.) We need to isolate which userspace process is triggering the
> issue.

The same issue happens if I boot with init=/bin/bash


> 4. Have you tried issuing magic sysrq commands on the serial line? (For
> example, sending a break signal and then the letter 't' or sending a
> break signal and then the letter 'c'?) That might trigger various dumps
> so that we can see the system state.
>
> 5. You are not running a VT console, so the graphics driver should not
> be affecting the printk subsystem at all. I expect your autologin is
> also starting various services and programs. If you disable the
> automatic login and instead manually login (perhaps as another user) can
> you manually start those services one at a time to see at what point the
> system hangs?
>
> Thanks for you help with this!

I found something really interesting. When lockup happens, I'm still 
able to log via ssh and trigger any magic sysrq action via 
/proc/sysrq-trigger (triggering it from UART console via break doesn't 
work).

It turned out that the UART console is somehow blocked, but it receives 
and buffers all the input. For example after issuing "echo 
 >/proc/sysrq-trigger" from the ssh console, the UART console has been 
updated and I see the magic sysrq banner and then all the commands I 
blindly typed in the UART console! However this doesn't unblock the console.

Here is the output of 't' magic sys request:

https://pastebin.com/fjbRuy4f

If you have any more suggestion what to check let me know.

This issue must be somehow related to the way the UART driver works on 
the Amlogic Meson boards. The other boards based on different SoCs 
(Exynos, QCOM, BCM) I have in my test farm (with the same userspace and 
configuration) work fine with those patches.

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
  2022-04-29 13:53                     ` Marek Szyprowski
@ 2022-04-30 16:00                       ` John Ogness
  -1 siblings, 0 replies; 99+ messages in thread
From: John Ogness @ 2022-04-30 16:00 UTC (permalink / raw)
  To: Marek Szyprowski, Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman, linux-amlogic

On 2022-04-29, Marek Szyprowski <m.szyprowski@samsung.com> wrote:
> The same issue happens if I boot with init=/bin/bash

Very interesting. Since you are seeing all the output up until you try
doing something, I guess receiving UART data is triggering the issue.

> I found something really interesting. When lockup happens, I'm still
> able to log via ssh and trigger any magic sysrq action via
> /proc/sysrq-trigger.

If you boot the system and directly login via ssh (without sending any
data via serial), can you trigger printk output on the UART? For
example, with:

    echo hello > /dev/kmsg

(You might need to increase your loglevel to see it.)

> It turned out that the UART console is somehow blocked, but it
> receives and buffers all the input. For example after issuing "echo
>  >/proc/sysrq-trigger" from the ssh console, the UART console has been 
> updated and I see the magic sysrq banner and then all the commands I 
> blindly typed in the UART console! However this doesn't unblock the
> console.

sysrq falls back to direct printing. This would imply that the kthread
printer is somehow unable to print.

> Here is the output of 't' magic sys request:
>
> https://pastebin.com/fjbRuy4f

It looks like the call trace for the printing kthread (pr/ttyAML0) is
corrupt.

Could you post your kernel config?

John

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
@ 2022-04-30 16:00                       ` John Ogness
  0 siblings, 0 replies; 99+ messages in thread
From: John Ogness @ 2022-04-30 16:00 UTC (permalink / raw)
  To: Marek Szyprowski, Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman, linux-amlogic

On 2022-04-29, Marek Szyprowski <m.szyprowski@samsung.com> wrote:
> The same issue happens if I boot with init=/bin/bash

Very interesting. Since you are seeing all the output up until you try
doing something, I guess receiving UART data is triggering the issue.

> I found something really interesting. When lockup happens, I'm still
> able to log via ssh and trigger any magic sysrq action via
> /proc/sysrq-trigger.

If you boot the system and directly login via ssh (without sending any
data via serial), can you trigger printk output on the UART? For
example, with:

    echo hello > /dev/kmsg

(You might need to increase your loglevel to see it.)

> It turned out that the UART console is somehow blocked, but it
> receives and buffers all the input. For example after issuing "echo
>  >/proc/sysrq-trigger" from the ssh console, the UART console has been 
> updated and I see the magic sysrq banner and then all the commands I 
> blindly typed in the UART console! However this doesn't unblock the
> console.

sysrq falls back to direct printing. This would imply that the kthread
printer is somehow unable to print.

> Here is the output of 't' magic sys request:
>
> https://pastebin.com/fjbRuy4f

It looks like the call trace for the printing kthread (pr/ttyAML0) is
corrupt.

Could you post your kernel config?

John

_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
  2022-04-30 16:00                       ` John Ogness
@ 2022-05-02  9:19                         ` Marek Szyprowski
  -1 siblings, 0 replies; 99+ messages in thread
From: Marek Szyprowski @ 2022-05-02  9:19 UTC (permalink / raw)
  To: John Ogness, Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman, linux-amlogic

Hi John,

On 30.04.2022 18:00, John Ogness wrote:
> On 2022-04-29, Marek Szyprowski <m.szyprowski@samsung.com> wrote:
>> The same issue happens if I boot with init=/bin/bash
> Very interesting. Since you are seeing all the output up until you try
> doing something, I guess receiving UART data is triggering the issue.

Right, this is how it looks like.

>> I found something really interesting. When lockup happens, I'm still
>> able to log via ssh and trigger any magic sysrq action via
>> /proc/sysrq-trigger.
> If you boot the system and directly login via ssh (without sending any
> data via serial), can you trigger printk output on the UART? For
> example, with:
>
>      echo hello > /dev/kmsg
>
> (You might need to increase your loglevel to see it.)

Data written to /dev/kmsg and all kernel logs were always displayed 
correctly. Also data written directly to /dev/ttyAML0 is displayed 
properly on the console. The latter doesn't however trigger the input 
related activity.

It looks that the data read from the uart is delivered only if other 
activity happens on the kernel console. If I type 'reboot' and press 
enter, nothing happens immediately. If I type 'date >/dev/ttyAML0' via 
ssh then, I only see the date printed on the console. However if I type 
'date >/dev/kmsg', the the date is printed and reboot happens.


>> It turned out that the UART console is somehow blocked, but it
>> receives and buffers all the input. For example after issuing "echo
>>   >/proc/sysrq-trigger" from the ssh console, the UART console has been
>> updated and I see the magic sysrq banner and then all the commands I
>> blindly typed in the UART console! However this doesn't unblock the
>> console.
> sysrq falls back to direct printing. This would imply that the kthread
> printer is somehow unable to print.
>
>> Here is the output of 't' magic sys request:
>>
>> https://protect2.fireeye.com/v1/url?k=8649f24d-e73258c4-86487902-74fe48600034-a2ca6bb18361467d&q=1&e=1bc4226f-a422-42b9-95e8-128845b8609f&u=https%3A%2F%2Fpastebin.com%2FfjbRuy4f
> It looks like the call trace for the printing kthread (pr/ttyAML0) is
> corrupt.

Right, good catch. For comparison, here is a 't' sysrq result from the 
'working' serial console (next-20220429), which happens usually 1 of 4 
boots:

https://pastebin.com/mp8zGFbW


> Could you post your kernel config?

https://pastebin.com/GUWGdCHX

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
@ 2022-05-02  9:19                         ` Marek Szyprowski
  0 siblings, 0 replies; 99+ messages in thread
From: Marek Szyprowski @ 2022-05-02  9:19 UTC (permalink / raw)
  To: John Ogness, Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman, linux-amlogic

Hi John,

On 30.04.2022 18:00, John Ogness wrote:
> On 2022-04-29, Marek Szyprowski <m.szyprowski@samsung.com> wrote:
>> The same issue happens if I boot with init=/bin/bash
> Very interesting. Since you are seeing all the output up until you try
> doing something, I guess receiving UART data is triggering the issue.

Right, this is how it looks like.

>> I found something really interesting. When lockup happens, I'm still
>> able to log via ssh and trigger any magic sysrq action via
>> /proc/sysrq-trigger.
> If you boot the system and directly login via ssh (without sending any
> data via serial), can you trigger printk output on the UART? For
> example, with:
>
>      echo hello > /dev/kmsg
>
> (You might need to increase your loglevel to see it.)

Data written to /dev/kmsg and all kernel logs were always displayed 
correctly. Also data written directly to /dev/ttyAML0 is displayed 
properly on the console. The latter doesn't however trigger the input 
related activity.

It looks that the data read from the uart is delivered only if other 
activity happens on the kernel console. If I type 'reboot' and press 
enter, nothing happens immediately. If I type 'date >/dev/ttyAML0' via 
ssh then, I only see the date printed on the console. However if I type 
'date >/dev/kmsg', the the date is printed and reboot happens.


>> It turned out that the UART console is somehow blocked, but it
>> receives and buffers all the input. For example after issuing "echo
>>   >/proc/sysrq-trigger" from the ssh console, the UART console has been
>> updated and I see the magic sysrq banner and then all the commands I
>> blindly typed in the UART console! However this doesn't unblock the
>> console.
> sysrq falls back to direct printing. This would imply that the kthread
> printer is somehow unable to print.
>
>> Here is the output of 't' magic sys request:
>>
>> https://protect2.fireeye.com/v1/url?k=8649f24d-e73258c4-86487902-74fe48600034-a2ca6bb18361467d&q=1&e=1bc4226f-a422-42b9-95e8-128845b8609f&u=https%3A%2F%2Fpastebin.com%2FfjbRuy4f
> It looks like the call trace for the printing kthread (pr/ttyAML0) is
> corrupt.

Right, good catch. For comparison, here is a 't' sysrq result from the 
'working' serial console (next-20220429), which happens usually 1 of 4 
boots:

https://pastebin.com/mp8zGFbW


> Could you post your kernel config?

https://pastebin.com/GUWGdCHX

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
  2022-05-02  9:19                         ` Marek Szyprowski
@ 2022-05-02 13:11                           ` John Ogness
  -1 siblings, 0 replies; 99+ messages in thread
From: John Ogness @ 2022-05-02 13:11 UTC (permalink / raw)
  To: Marek Szyprowski, Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman, linux-amlogic

On 2022-05-02, Marek Szyprowski <m.szyprowski@samsung.com> wrote:
> Data written to /dev/kmsg and all kernel logs were always displayed
> correctly. Also data written directly to /dev/ttyAML0 is displayed
> properly on the console. The latter doesn't however trigger the input
> related activity.
>
> It looks that the data read from the uart is delivered only if other
> activity happens on the kernel console. If I type 'reboot' and press
> enter, nothing happens immediately. If I type 'date >/dev/ttyAML0' via
> ssh then, I only see the date printed on the console. However if I
> type 'date >/dev/kmsg', the the date is printed and reboot happens.

I suppose if you login via ssh and check /proc/interrupts, then type
some things over serial, then check /proc/interrupts again, you will see
there have been no interrupts for the uart. But interrupts for other
devices are happening. Is this correct?

> For comparison, here is a 't' sysrq result from the 'working' serial
> console (next-20220429), which happens usually 1 of 4 boots:
>
> https://pastebin.com/mp8zGFbW

This still looks odd to me. We should be seeing a trace originating from
ret_from_fork+0x10/0x20 and kthread+0x118/0x11c.

I wonder if the early creation of the thread is somehow causing
problems. Could you try the following patch to see if it makes a
difference? I would also like to see the sysrq-t output with this patch
applied:

---------------- BEGIN PATCH ---------------
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 2311a0ad584a..c4362d25de22 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -3837,7 +3837,7 @@ static int __init printk_activate_kthreads(void)
 
 	return 0;
 }
-early_initcall(printk_activate_kthreads);
+late_initcall(printk_activate_kthreads);
 
 #if defined CONFIG_PRINTK
 /* If @con is specified, only wait for that console. Otherwise wait for all. */
---------------- END PATCH ---------------

Thanks for your help with this!

John

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
@ 2022-05-02 13:11                           ` John Ogness
  0 siblings, 0 replies; 99+ messages in thread
From: John Ogness @ 2022-05-02 13:11 UTC (permalink / raw)
  To: Marek Szyprowski, Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman, linux-amlogic

On 2022-05-02, Marek Szyprowski <m.szyprowski@samsung.com> wrote:
> Data written to /dev/kmsg and all kernel logs were always displayed
> correctly. Also data written directly to /dev/ttyAML0 is displayed
> properly on the console. The latter doesn't however trigger the input
> related activity.
>
> It looks that the data read from the uart is delivered only if other
> activity happens on the kernel console. If I type 'reboot' and press
> enter, nothing happens immediately. If I type 'date >/dev/ttyAML0' via
> ssh then, I only see the date printed on the console. However if I
> type 'date >/dev/kmsg', the the date is printed and reboot happens.

I suppose if you login via ssh and check /proc/interrupts, then type
some things over serial, then check /proc/interrupts again, you will see
there have been no interrupts for the uart. But interrupts for other
devices are happening. Is this correct?

> For comparison, here is a 't' sysrq result from the 'working' serial
> console (next-20220429), which happens usually 1 of 4 boots:
>
> https://pastebin.com/mp8zGFbW

This still looks odd to me. We should be seeing a trace originating from
ret_from_fork+0x10/0x20 and kthread+0x118/0x11c.

I wonder if the early creation of the thread is somehow causing
problems. Could you try the following patch to see if it makes a
difference? I would also like to see the sysrq-t output with this patch
applied:

---------------- BEGIN PATCH ---------------
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 2311a0ad584a..c4362d25de22 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -3837,7 +3837,7 @@ static int __init printk_activate_kthreads(void)
 
 	return 0;
 }
-early_initcall(printk_activate_kthreads);
+late_initcall(printk_activate_kthreads);
 
 #if defined CONFIG_PRINTK
 /* If @con is specified, only wait for that console. Otherwise wait for all. */
---------------- END PATCH ---------------

Thanks for your help with this!

John

_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
  2022-05-02  9:19                         ` Marek Szyprowski
@ 2022-05-02 13:17                           ` Petr Mladek
  -1 siblings, 0 replies; 99+ messages in thread
From: Petr Mladek @ 2022-05-02 13:17 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: John Ogness, Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman, linux-amlogic

On Mon 2022-05-02 11:19:07, Marek Szyprowski wrote:
> Hi John,
> 
> On 30.04.2022 18:00, John Ogness wrote:
> > On 2022-04-29, Marek Szyprowski <m.szyprowski@samsung.com> wrote:
> >> The same issue happens if I boot with init=/bin/bash
> > Very interesting. Since you are seeing all the output up until you try
> > doing something, I guess receiving UART data is triggering the issue.
> 
> Right, this is how it looks like.
> 
> >> I found something really interesting. When lockup happens, I'm still
> >> able to log via ssh and trigger any magic sysrq action via
> >> /proc/sysrq-trigger.
> > If you boot the system and directly login via ssh (without sending any
> > data via serial), can you trigger printk output on the UART? For
> > example, with:
> >
> >      echo hello > /dev/kmsg
> >
> > (You might need to increase your loglevel to see it.)
> 
> Data written to /dev/kmsg and all kernel logs were always displayed 
> correctly. Also data written directly to /dev/ttyAML0 is displayed 
> properly on the console. The latter doesn't however trigger the input 
> related activity.
> 
> It looks that the data read from the uart is delivered only if other 
> activity happens on the kernel console. If I type 'reboot' and press 
> enter, nothing happens immediately. If I type 'date >/dev/ttyAML0' via 
> ssh then, I only see the date printed on the console. However if I type 
> 'date >/dev/kmsg', the the date is printed and reboot happens.

This is really interesting.

'date >/dev/kmsg' should be handled like a normal printk().
It should get pushed to the console using printk kthread,
that calls call_console_driver() that calls con->write()
callback. In our case, it should be meson_serial_console_write().

I am not sure if meson_serial_console_write() is used also
when writing via /dev/ttyAML0.

> 
> >> It turned out that the UART console is somehow blocked, but it
> >> receives and buffers all the input. For example after issuing "echo
> >>   >/proc/sysrq-trigger" from the ssh console, the UART console has been
> >> updated and I see the magic sysrq banner and then all the commands I
> >> blindly typed in the UART console! However this doesn't unblock the
> >> console.
> > sysrq falls back to direct printing. This would imply that the kthread
> > printer is somehow unable to print.
> >
> >> Here is the output of 't' magic sys request:
> >>
> >> https://protect2.fireeye.com/v1/url?k=8649f24d-e73258c4-86487902-74fe48600034-a2ca6bb18361467d&q=1&e=1bc4226f-a422-42b9-95e8-128845b8609f&u=https%3A%2F%2Fpastebin.com%2FfjbRuy4f
> > It looks like the call trace for the printing kthread (pr/ttyAML0) is
> > corrupt.
> 
> Right, good catch. For comparison, here is a 't' sysrq result from the 
> 'working' serial console (next-20220429), which happens usually 1 of 4 
> boots:
> 
> https://pastebin.com/mp8zGFbW

Strange. The backtrace is weird here too:

[   50.514509] task:pr/ttyAML0      state:R  running task     stack:    0 pid:   65 ppid:     2 flags:0x00000008
[   50.514540] Call trace:
[   50.514548]  __switch_to+0xe8/0x160
[   50.514561]  meson_serial_console+0x78/0x118

There should be kthread() and printk_kthread_func() on the stack.

Hmm,  meson_serial_console+0x78/0x118 is weird on its own.
meson_serial_console is the name of the structure. I would
expect a name of the .write callback, like
meson_serial_console_write()

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
@ 2022-05-02 13:17                           ` Petr Mladek
  0 siblings, 0 replies; 99+ messages in thread
From: Petr Mladek @ 2022-05-02 13:17 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: John Ogness, Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman, linux-amlogic

On Mon 2022-05-02 11:19:07, Marek Szyprowski wrote:
> Hi John,
> 
> On 30.04.2022 18:00, John Ogness wrote:
> > On 2022-04-29, Marek Szyprowski <m.szyprowski@samsung.com> wrote:
> >> The same issue happens if I boot with init=/bin/bash
> > Very interesting. Since you are seeing all the output up until you try
> > doing something, I guess receiving UART data is triggering the issue.
> 
> Right, this is how it looks like.
> 
> >> I found something really interesting. When lockup happens, I'm still
> >> able to log via ssh and trigger any magic sysrq action via
> >> /proc/sysrq-trigger.
> > If you boot the system and directly login via ssh (without sending any
> > data via serial), can you trigger printk output on the UART? For
> > example, with:
> >
> >      echo hello > /dev/kmsg
> >
> > (You might need to increase your loglevel to see it.)
> 
> Data written to /dev/kmsg and all kernel logs were always displayed 
> correctly. Also data written directly to /dev/ttyAML0 is displayed 
> properly on the console. The latter doesn't however trigger the input 
> related activity.
> 
> It looks that the data read from the uart is delivered only if other 
> activity happens on the kernel console. If I type 'reboot' and press 
> enter, nothing happens immediately. If I type 'date >/dev/ttyAML0' via 
> ssh then, I only see the date printed on the console. However if I type 
> 'date >/dev/kmsg', the the date is printed and reboot happens.

This is really interesting.

'date >/dev/kmsg' should be handled like a normal printk().
It should get pushed to the console using printk kthread,
that calls call_console_driver() that calls con->write()
callback. In our case, it should be meson_serial_console_write().

I am not sure if meson_serial_console_write() is used also
when writing via /dev/ttyAML0.

> 
> >> It turned out that the UART console is somehow blocked, but it
> >> receives and buffers all the input. For example after issuing "echo
> >>   >/proc/sysrq-trigger" from the ssh console, the UART console has been
> >> updated and I see the magic sysrq banner and then all the commands I
> >> blindly typed in the UART console! However this doesn't unblock the
> >> console.
> > sysrq falls back to direct printing. This would imply that the kthread
> > printer is somehow unable to print.
> >
> >> Here is the output of 't' magic sys request:
> >>
> >> https://protect2.fireeye.com/v1/url?k=8649f24d-e73258c4-86487902-74fe48600034-a2ca6bb18361467d&q=1&e=1bc4226f-a422-42b9-95e8-128845b8609f&u=https%3A%2F%2Fpastebin.com%2FfjbRuy4f
> > It looks like the call trace for the printing kthread (pr/ttyAML0) is
> > corrupt.
> 
> Right, good catch. For comparison, here is a 't' sysrq result from the 
> 'working' serial console (next-20220429), which happens usually 1 of 4 
> boots:
> 
> https://pastebin.com/mp8zGFbW

Strange. The backtrace is weird here too:

[   50.514509] task:pr/ttyAML0      state:R  running task     stack:    0 pid:   65 ppid:     2 flags:0x00000008
[   50.514540] Call trace:
[   50.514548]  __switch_to+0xe8/0x160
[   50.514561]  meson_serial_console+0x78/0x118

There should be kthread() and printk_kthread_func() on the stack.

Hmm,  meson_serial_console+0x78/0x118 is weird on its own.
meson_serial_console is the name of the structure. I would
expect a name of the .write callback, like
meson_serial_console_write()

Best Regards,
Petr

_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
  2022-05-02 13:11                           ` John Ogness
@ 2022-05-02 22:29                             ` Marek Szyprowski
  -1 siblings, 0 replies; 99+ messages in thread
From: Marek Szyprowski @ 2022-05-02 22:29 UTC (permalink / raw)
  To: John Ogness, Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman, linux-amlogic

Hi John,

On 02.05.2022 15:11, John Ogness wrote:
> On 2022-05-02, Marek Szyprowski <m.szyprowski@samsung.com> wrote:
>> Data written to /dev/kmsg and all kernel logs were always displayed
>> correctly. Also data written directly to /dev/ttyAML0 is displayed
>> properly on the console. The latter doesn't however trigger the input
>> related activity.
>>
>> It looks that the data read from the uart is delivered only if other
>> activity happens on the kernel console. If I type 'reboot' and press
>> enter, nothing happens immediately. If I type 'date >/dev/ttyAML0' via
>> ssh then, I only see the date printed on the console. However if I
>> type 'date >/dev/kmsg', the the date is printed and reboot happens.
> I suppose if you login via ssh and check /proc/interrupts, then type
> some things over serial, then check /proc/interrupts again, you will see
> there have been no interrupts for the uart. But interrupts for other
> devices are happening. Is this correct?

Right. The counter for ttyAML0 is not increased when lockup happens and 
I type something to the uart console.

>> For comparison, here is a 't' sysrq result from the 'working' serial
>> console (next-20220429), which happens usually 1 of 4 boots:
>>
>> https://protect2.fireeye.com/v1/url?k=3ef0fd63-5f7be855-3ef1762c-000babff9b5d-2e40dc5adc30a14c&q=1&e=1469838f-8586-403e-bd4d-922675d8b658&u=https%3A%2F%2Fpastebin.com%2Fmp8zGFbW
> This still looks odd to me. We should be seeing a trace originating from
> ret_from_fork+0x10/0x20 and kthread+0x118/0x11c.
>
> I wonder if the early creation of the thread is somehow causing
> problems. Could you try the following patch to see if it makes a
> difference? I would also like to see the sysrq-t output with this patch
> applied:
>
> ---------------- BEGIN PATCH ---------------
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 2311a0ad584a..c4362d25de22 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -3837,7 +3837,7 @@ static int __init printk_activate_kthreads(void)
>   
>   	return 0;
>   }
> -early_initcall(printk_activate_kthreads);
> +late_initcall(printk_activate_kthreads);
>   
>   #if defined CONFIG_PRINTK
>   /* If @con is specified, only wait for that console. Otherwise wait for all. */
> ---------------- END PATCH ---------------
>
> Thanks for your help with this!

Well, nothing has changed. The lockup still happens.

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
@ 2022-05-02 22:29                             ` Marek Szyprowski
  0 siblings, 0 replies; 99+ messages in thread
From: Marek Szyprowski @ 2022-05-02 22:29 UTC (permalink / raw)
  To: John Ogness, Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman, linux-amlogic

Hi John,

On 02.05.2022 15:11, John Ogness wrote:
> On 2022-05-02, Marek Szyprowski <m.szyprowski@samsung.com> wrote:
>> Data written to /dev/kmsg and all kernel logs were always displayed
>> correctly. Also data written directly to /dev/ttyAML0 is displayed
>> properly on the console. The latter doesn't however trigger the input
>> related activity.
>>
>> It looks that the data read from the uart is delivered only if other
>> activity happens on the kernel console. If I type 'reboot' and press
>> enter, nothing happens immediately. If I type 'date >/dev/ttyAML0' via
>> ssh then, I only see the date printed on the console. However if I
>> type 'date >/dev/kmsg', the the date is printed and reboot happens.
> I suppose if you login via ssh and check /proc/interrupts, then type
> some things over serial, then check /proc/interrupts again, you will see
> there have been no interrupts for the uart. But interrupts for other
> devices are happening. Is this correct?

Right. The counter for ttyAML0 is not increased when lockup happens and 
I type something to the uart console.

>> For comparison, here is a 't' sysrq result from the 'working' serial
>> console (next-20220429), which happens usually 1 of 4 boots:
>>
>> https://protect2.fireeye.com/v1/url?k=3ef0fd63-5f7be855-3ef1762c-000babff9b5d-2e40dc5adc30a14c&q=1&e=1469838f-8586-403e-bd4d-922675d8b658&u=https%3A%2F%2Fpastebin.com%2Fmp8zGFbW
> This still looks odd to me. We should be seeing a trace originating from
> ret_from_fork+0x10/0x20 and kthread+0x118/0x11c.
>
> I wonder if the early creation of the thread is somehow causing
> problems. Could you try the following patch to see if it makes a
> difference? I would also like to see the sysrq-t output with this patch
> applied:
>
> ---------------- BEGIN PATCH ---------------
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 2311a0ad584a..c4362d25de22 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -3837,7 +3837,7 @@ static int __init printk_activate_kthreads(void)
>   
>   	return 0;
>   }
> -early_initcall(printk_activate_kthreads);
> +late_initcall(printk_activate_kthreads);
>   
>   #if defined CONFIG_PRINTK
>   /* If @con is specified, only wait for that console. Otherwise wait for all. */
> ---------------- END PATCH ---------------
>
> Thanks for your help with this!

Well, nothing has changed. The lockup still happens.

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
  2022-05-02 13:17                           ` Petr Mladek
@ 2022-05-02 23:13                             ` Marek Szyprowski
  -1 siblings, 0 replies; 99+ messages in thread
From: Marek Szyprowski @ 2022-05-02 23:13 UTC (permalink / raw)
  To: Petr Mladek
  Cc: John Ogness, Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman, linux-amlogic

Hi Petr,

On 02.05.2022 15:17, Petr Mladek wrote:
> On Mon 2022-05-02 11:19:07, Marek Szyprowski wrote:
>> On 30.04.2022 18:00, John Ogness wrote:
>>> On 2022-04-29, Marek Szyprowski <m.szyprowski@samsung.com> wrote:
>>>> The same issue happens if I boot with init=/bin/bash
>>> Very interesting. Since you are seeing all the output up until you try
>>> doing something, I guess receiving UART data is triggering the issue.
>> Right, this is how it looks like.
>>
>>>> I found something really interesting. When lockup happens, I'm still
>>>> able to log via ssh and trigger any magic sysrq action via
>>>> /proc/sysrq-trigger.
>>> If you boot the system and directly login via ssh (without sending any
>>> data via serial), can you trigger printk output on the UART? For
>>> example, with:
>>>
>>>       echo hello > /dev/kmsg
>>>
>>> (You might need to increase your loglevel to see it.)
>> Data written to /dev/kmsg and all kernel logs were always displayed
>> correctly. Also data written directly to /dev/ttyAML0 is displayed
>> properly on the console. The latter doesn't however trigger the input
>> related activity.
>>
>> It looks that the data read from the uart is delivered only if other
>> activity happens on the kernel console. If I type 'reboot' and press
>> enter, nothing happens immediately. If I type 'date >/dev/ttyAML0' via
>> ssh then, I only see the date printed on the console. However if I type
>> 'date >/dev/kmsg', the the date is printed and reboot happens.
> This is really interesting.
>
> 'date >/dev/kmsg' should be handled like a normal printk().
> It should get pushed to the console using printk kthread,
> that calls call_console_driver() that calls con->write()
> callback. In our case, it should be meson_serial_console_write().
>
> I am not sure if meson_serial_console_write() is used also
> when writing via /dev/ttyAML0.
>
>>>> It turned out that the UART console is somehow blocked, but it
>>>> receives and buffers all the input. For example after issuing "echo
>>>>    >/proc/sysrq-trigger" from the ssh console, the UART console has been
>>>> updated and I see the magic sysrq banner and then all the commands I
>>>> blindly typed in the UART console! However this doesn't unblock the
>>>> console.
>>> sysrq falls back to direct printing. This would imply that the kthread
>>> printer is somehow unable to print.
>>>
>>>> Here is the output of 't' magic sys request:
>>>>
>>>> https://protect2.fireeye.com/v1/url?k=8649f24d-e73258c4-86487902-74fe48600034-a2ca6bb18361467d&q=1&e=1bc4226f-a422-42b9-95e8-128845b8609f&u=https%3A%2F%2Fpastebin.com%2FfjbRuy4f
>>> It looks like the call trace for the printing kthread (pr/ttyAML0) is
>>> corrupt.
>> Right, good catch. For comparison, here is a 't' sysrq result from the
>> 'working' serial console (next-20220429), which happens usually 1 of 4
>> boots:
>>
>> https://protect2.fireeye.com/v1/url?k=610106f1-008a13b6-61008dbe-000babff99aa-11083c39c44861df&q=1&e=2eafad9e-c5d2-4696-9d78-f3b5885256dc&u=https%3A%2F%2Fpastebin.com%2Fmp8zGFbW
> Strange. The backtrace is weird here too:
>
> [   50.514509] task:pr/ttyAML0      state:R  running task     stack:    0 pid:   65 ppid:     2 flags:0x00000008
> [   50.514540] Call trace:
> [   50.514548]  __switch_to+0xe8/0x160
> [   50.514561]  meson_serial_console+0x78/0x118
>
> There should be kthread() and printk_kthread_func() on the stack.
>
> Hmm,  meson_serial_console+0x78/0x118 is weird on its own.
> meson_serial_console is the name of the structure. I would
> expect a name of the .write callback, like
> meson_serial_console_write()

Well, this sounds a bit like a stack corruption. For the reference, I've 
checked what magic sysrq 't' reports for my other test boards:

RaspberryPi4:

[  166.702431] task:pr/ttyS0        state:R stack:    0 pid:   64 
ppid:     2 flags:0x00000008
[  166.711069] Call trace:
[  166.713647]  __switch_to+0xe8/0x160
[  166.717216]  __schedule+0x2f4/0x9f0
[  166.720862]  log_wait+0x0/0x50
[  166.724081] task:vfio-irqfd-clea state:I stack:    0 pid:   65 
ppid:     2 flags:0x00000008
[  166.732698] Call trace:


ARM Juno R1:

[   74.356562] task:pr/ttyAMA0      state:R  running task stack:    0 
pid:   47 ppid:     2 flags:0x00000008
[   74.356605] Call trace:
[   74.356617]  __switch_to+0xe8/0x160
[   74.356637]  amba_console+0x78/0x118
[   74.356657] task:kworker/2:1     state:I stack:    0 pid:   48 
ppid:     2 flags:0x00000008
[   74.356695] Workqueue:  0x0 (mm_percpu_wq)
[   74.356738] Call trace:


QEMU virt/arm64:

[  174.155760] task:pr/ttyAMA0      state:S stack:    0 pid:   26 
ppid:     2 flags:0x00000008
[  174.156305] Call trace:
[  174.156529]  __switch_to+0xe8/0x160
[  174.157131]  0xffff5ebbbfdd62d8


In the last case it doesn't happen always. In the other runs I got the 
following log from QEMU virt/arm64:

[  200.537579] task:pr/ttyAMA0      state:S stack:    0 pid:   26 
ppid:     2 flags:0x00000008
[  200.538121] Call trace:
[  200.538341]  __switch_to+0xe8/0x160
[  200.538583]  __schedule+0x2f4/0x9f0
[  200.538822]  schedule+0x54/0xd0
[  200.539047]  printk_kthread_func+0x2d8/0x3bc
[  200.539301]  kthread+0x118/0x11c
[  200.539523]  ret_from_fork+0x10/0x20


I hope that at least the qemu case will let you to analyze it by 
yourself. I run my test system with the following command:

qemu-system-aarch64 -kernel virt/Image -append "console=ttyAMA0 
root=/dev/vda rootwait" -M virt -cpu cortex-a57 -smp 2 -m 1024 -device 
virtio-blk-device,drive=virtio-blk0 -drive 
file=qemu-virt-rootfs.raw,id=virtio-blk0,if=none,format=raw -netdev 
user,id=user -device virtio-net-device,netdev=user -display none


Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
@ 2022-05-02 23:13                             ` Marek Szyprowski
  0 siblings, 0 replies; 99+ messages in thread
From: Marek Szyprowski @ 2022-05-02 23:13 UTC (permalink / raw)
  To: Petr Mladek
  Cc: John Ogness, Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman, linux-amlogic

Hi Petr,

On 02.05.2022 15:17, Petr Mladek wrote:
> On Mon 2022-05-02 11:19:07, Marek Szyprowski wrote:
>> On 30.04.2022 18:00, John Ogness wrote:
>>> On 2022-04-29, Marek Szyprowski <m.szyprowski@samsung.com> wrote:
>>>> The same issue happens if I boot with init=/bin/bash
>>> Very interesting. Since you are seeing all the output up until you try
>>> doing something, I guess receiving UART data is triggering the issue.
>> Right, this is how it looks like.
>>
>>>> I found something really interesting. When lockup happens, I'm still
>>>> able to log via ssh and trigger any magic sysrq action via
>>>> /proc/sysrq-trigger.
>>> If you boot the system and directly login via ssh (without sending any
>>> data via serial), can you trigger printk output on the UART? For
>>> example, with:
>>>
>>>       echo hello > /dev/kmsg
>>>
>>> (You might need to increase your loglevel to see it.)
>> Data written to /dev/kmsg and all kernel logs were always displayed
>> correctly. Also data written directly to /dev/ttyAML0 is displayed
>> properly on the console. The latter doesn't however trigger the input
>> related activity.
>>
>> It looks that the data read from the uart is delivered only if other
>> activity happens on the kernel console. If I type 'reboot' and press
>> enter, nothing happens immediately. If I type 'date >/dev/ttyAML0' via
>> ssh then, I only see the date printed on the console. However if I type
>> 'date >/dev/kmsg', the the date is printed and reboot happens.
> This is really interesting.
>
> 'date >/dev/kmsg' should be handled like a normal printk().
> It should get pushed to the console using printk kthread,
> that calls call_console_driver() that calls con->write()
> callback. In our case, it should be meson_serial_console_write().
>
> I am not sure if meson_serial_console_write() is used also
> when writing via /dev/ttyAML0.
>
>>>> It turned out that the UART console is somehow blocked, but it
>>>> receives and buffers all the input. For example after issuing "echo
>>>>    >/proc/sysrq-trigger" from the ssh console, the UART console has been
>>>> updated and I see the magic sysrq banner and then all the commands I
>>>> blindly typed in the UART console! However this doesn't unblock the
>>>> console.
>>> sysrq falls back to direct printing. This would imply that the kthread
>>> printer is somehow unable to print.
>>>
>>>> Here is the output of 't' magic sys request:
>>>>
>>>> https://protect2.fireeye.com/v1/url?k=8649f24d-e73258c4-86487902-74fe48600034-a2ca6bb18361467d&q=1&e=1bc4226f-a422-42b9-95e8-128845b8609f&u=https%3A%2F%2Fpastebin.com%2FfjbRuy4f
>>> It looks like the call trace for the printing kthread (pr/ttyAML0) is
>>> corrupt.
>> Right, good catch. For comparison, here is a 't' sysrq result from the
>> 'working' serial console (next-20220429), which happens usually 1 of 4
>> boots:
>>
>> https://protect2.fireeye.com/v1/url?k=610106f1-008a13b6-61008dbe-000babff99aa-11083c39c44861df&q=1&e=2eafad9e-c5d2-4696-9d78-f3b5885256dc&u=https%3A%2F%2Fpastebin.com%2Fmp8zGFbW
> Strange. The backtrace is weird here too:
>
> [   50.514509] task:pr/ttyAML0      state:R  running task     stack:    0 pid:   65 ppid:     2 flags:0x00000008
> [   50.514540] Call trace:
> [   50.514548]  __switch_to+0xe8/0x160
> [   50.514561]  meson_serial_console+0x78/0x118
>
> There should be kthread() and printk_kthread_func() on the stack.
>
> Hmm,  meson_serial_console+0x78/0x118 is weird on its own.
> meson_serial_console is the name of the structure. I would
> expect a name of the .write callback, like
> meson_serial_console_write()

Well, this sounds a bit like a stack corruption. For the reference, I've 
checked what magic sysrq 't' reports for my other test boards:

RaspberryPi4:

[  166.702431] task:pr/ttyS0        state:R stack:    0 pid:   64 
ppid:     2 flags:0x00000008
[  166.711069] Call trace:
[  166.713647]  __switch_to+0xe8/0x160
[  166.717216]  __schedule+0x2f4/0x9f0
[  166.720862]  log_wait+0x0/0x50
[  166.724081] task:vfio-irqfd-clea state:I stack:    0 pid:   65 
ppid:     2 flags:0x00000008
[  166.732698] Call trace:


ARM Juno R1:

[   74.356562] task:pr/ttyAMA0      state:R  running task stack:    0 
pid:   47 ppid:     2 flags:0x00000008
[   74.356605] Call trace:
[   74.356617]  __switch_to+0xe8/0x160
[   74.356637]  amba_console+0x78/0x118
[   74.356657] task:kworker/2:1     state:I stack:    0 pid:   48 
ppid:     2 flags:0x00000008
[   74.356695] Workqueue:  0x0 (mm_percpu_wq)
[   74.356738] Call trace:


QEMU virt/arm64:

[  174.155760] task:pr/ttyAMA0      state:S stack:    0 pid:   26 
ppid:     2 flags:0x00000008
[  174.156305] Call trace:
[  174.156529]  __switch_to+0xe8/0x160
[  174.157131]  0xffff5ebbbfdd62d8


In the last case it doesn't happen always. In the other runs I got the 
following log from QEMU virt/arm64:

[  200.537579] task:pr/ttyAMA0      state:S stack:    0 pid:   26 
ppid:     2 flags:0x00000008
[  200.538121] Call trace:
[  200.538341]  __switch_to+0xe8/0x160
[  200.538583]  __schedule+0x2f4/0x9f0
[  200.538822]  schedule+0x54/0xd0
[  200.539047]  printk_kthread_func+0x2d8/0x3bc
[  200.539301]  kthread+0x118/0x11c
[  200.539523]  ret_from_fork+0x10/0x20


I hope that at least the qemu case will let you to analyze it by 
yourself. I run my test system with the following command:

qemu-system-aarch64 -kernel virt/Image -append "console=ttyAMA0 
root=/dev/vda rootwait" -M virt -cpu cortex-a57 -smp 2 -m 1024 -device 
virtio-blk-device,drive=virtio-blk0 -drive 
file=qemu-virt-rootfs.raw,id=virtio-blk0,if=none,format=raw -netdev 
user,id=user -device virtio-net-device,netdev=user -display none


Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
  2022-05-02 23:13                             ` Marek Szyprowski
@ 2022-05-03  6:49                               ` Petr Mladek
  -1 siblings, 0 replies; 99+ messages in thread
From: Petr Mladek @ 2022-05-03  6:49 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: John Ogness, Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman, linux-amlogic

On Tue 2022-05-03 01:13:19, Marek Szyprowski wrote:
> Hi Petr,
> 
> On 02.05.2022 15:17, Petr Mladek wrote:
> > On Mon 2022-05-02 11:19:07, Marek Szyprowski wrote:
> >> On 30.04.2022 18:00, John Ogness wrote:
> >>> On 2022-04-29, Marek Szyprowski <m.szyprowski@samsung.com> wrote:
> >>>> The same issue happens if I boot with init=/bin/bash
> >>> Very interesting. Since you are seeing all the output up until you try
> >>> doing something, I guess receiving UART data is triggering the issue.
> >> Right, this is how it looks like.
> >>
> >>>> I found something really interesting. When lockup happens, I'm still
> >>>> able to log via ssh and trigger any magic sysrq action via
> >>>> /proc/sysrq-trigger.
> >>> If you boot the system and directly login via ssh (without sending any
> >>> data via serial), can you trigger printk output on the UART? For
> >>> example, with:
> >>>
> >>>       echo hello > /dev/kmsg
> >>>
> >>> (You might need to increase your loglevel to see it.)
> >> Data written to /dev/kmsg and all kernel logs were always displayed
> >> correctly. Also data written directly to /dev/ttyAML0 is displayed
> >> properly on the console. The latter doesn't however trigger the input
> >> related activity.
> >>
> >> It looks that the data read from the uart is delivered only if other
> >> activity happens on the kernel console. If I type 'reboot' and press
> >> enter, nothing happens immediately. If I type 'date >/dev/ttyAML0' via
> >> ssh then, I only see the date printed on the console. However if I type
> >> 'date >/dev/kmsg', the the date is printed and reboot happens.
> > This is really interesting.
> >
> > 'date >/dev/kmsg' should be handled like a normal printk().
> > It should get pushed to the console using printk kthread,
> > that calls call_console_driver() that calls con->write()
> > callback. In our case, it should be meson_serial_console_write().
> >
> > I am not sure if meson_serial_console_write() is used also
> > when writing via /dev/ttyAML0.
> >
> >>>> It turned out that the UART console is somehow blocked, but it
> >>>> receives and buffers all the input. For example after issuing "echo
> >>>>    >/proc/sysrq-trigger" from the ssh console, the UART console has been
> >>>> updated and I see the magic sysrq banner and then all the commands I
> >>>> blindly typed in the UART console! However this doesn't unblock the
> >>>> console.
> >>> sysrq falls back to direct printing. This would imply that the kthread
> >>> printer is somehow unable to print.
> >>>
> >>>> Here is the output of 't' magic sys request:
> >>>>
> >>>> https://protect2.fireeye.com/v1/url?k=8649f24d-e73258c4-86487902-74fe48600034-a2ca6bb18361467d&q=1&e=1bc4226f-a422-42b9-95e8-128845b8609f&u=https%3A%2F%2Fpastebin.com%2FfjbRuy4f
> >>> It looks like the call trace for the printing kthread (pr/ttyAML0) is
> >>> corrupt.
> >> Right, good catch. For comparison, here is a 't' sysrq result from the
> >> 'working' serial console (next-20220429), which happens usually 1 of 4
> >> boots:
> >>
> >> https://protect2.fireeye.com/v1/url?k=610106f1-008a13b6-61008dbe-000babff99aa-11083c39c44861df&q=1&e=2eafad9e-c5d2-4696-9d78-f3b5885256dc&u=https%3A%2F%2Fpastebin.com%2Fmp8zGFbW
> > Strange. The backtrace is weird here too:
> >
> > [   50.514509] task:pr/ttyAML0      state:R  running task     stack:    0 pid:   65 ppid:     2 flags:0x00000008
> > [   50.514540] Call trace:
> > [   50.514548]  __switch_to+0xe8/0x160
> > [   50.514561]  meson_serial_console+0x78/0x118
> >
> > There should be kthread() and printk_kthread_func() on the stack.
> >
> > Hmm,  meson_serial_console+0x78/0x118 is weird on its own.
> > meson_serial_console is the name of the structure. I would
> > expect a name of the .write callback, like
> > meson_serial_console_write()
> 
> Well, this sounds a bit like a stack corruption. For the reference, I've 
> checked what magic sysrq 't' reports for my other test boards:
> 
> RaspberryPi4:
> 
> [  166.702431] task:pr/ttyS0        state:R stack:    0 pid:   64 
> ppid:     2 flags:0x00000008
> [  166.711069] Call trace:
> [  166.713647]  __switch_to+0xe8/0x160
> [  166.717216]  __schedule+0x2f4/0x9f0
> [  166.720862]  log_wait+0x0/0x50
> [  166.724081] task:vfio-irqfd-clea state:I stack:    0 pid:   65 
> ppid:     2 flags:0x00000008
> [  166.732698] Call trace:
> 
> 
> ARM Juno R1:
> 
> [   74.356562] task:pr/ttyAMA0      state:R  running task stack:    0 
> pid:   47 ppid:     2 flags:0x00000008
> [   74.356605] Call trace:
> [   74.356617]  __switch_to+0xe8/0x160
> [   74.356637]  amba_console+0x78/0x118
> [   74.356657] task:kworker/2:1     state:I stack:    0 pid:   48 
> ppid:     2 flags:0x00000008
> [   74.356695] Workqueue:  0x0 (mm_percpu_wq)
> [   74.356738] Call trace:
> 
> 
> QEMU virt/arm64:
> 
> [  174.155760] task:pr/ttyAMA0      state:S stack:    0 pid:   26 
> ppid:     2 flags:0x00000008
> [  174.156305] Call trace:
> [  174.156529]  __switch_to+0xe8/0x160
> [  174.157131]  0xffff5ebbbfdd62d8

You mentioned in the other mail that the other boards work as
expected. I mean that console gets stuck only on the meson board.
Is it true, please?

The stack looks really weird. But another weird thing is that
even the meson board is able to show the messages, for example,
using echo hello >/dev/kmsg. It suggests that the kthreads
somehow work.

There is also a possibility that this code path is optimized
some special way and the unwinder has troubles to show
the stack correctly.


> In the last case it doesn't happen always. In the other runs I got the 
> following log from QEMU virt/arm64:
> 
> [  200.537579] task:pr/ttyAMA0      state:S stack:    0 pid:   26 
> ppid:     2 flags:0x00000008
> [  200.538121] Call trace:
> [  200.538341]  __switch_to+0xe8/0x160
> [  200.538583]  __schedule+0x2f4/0x9f0
> [  200.538822]  schedule+0x54/0xd0
> [  200.539047]  printk_kthread_func+0x2d8/0x3bc
> [  200.539301]  kthread+0x118/0x11c
> [  200.539523]  ret_from_fork+0x10/0x20

This is what I would expect when the kthread is in an interruptible
sleep waiting for new strings to handle.

BTW: This is what I see on my x86_64 test system:

[61892.932242] task:pr/tty0         state:S stack:    0 pid:   14 ppid:     2 flags:0x00004000
[61892.932250] Call Trace:
[61892.932253]  <TASK>
[61892.932263]  __schedule+0x376/0xbb0
[61892.932284]  schedule+0x44/0xb0
[61892.932290]  printk_kthread_func+0x18f/0x370
[61892.932303]  ? schedstat_stop+0x10/0x10
[61892.932316]  ? console_start+0x30/0x30
[61892.932322]  kthread+0xf2/0x120
[61892.932327]  ? kthread_complete_and_exit+0x20/0x20
[61892.932338]  ret_from_fork+0x1f/0x30
[61892.932370]  </TASK>
[61892.932373] task:pr/ttyS0        state:R  running task     stack:    0 pid:   15 ppid:     2 flags:0x00004000
[61892.932391] Call Trace:
[61892.932398]  <TASK>
[61892.932427]  ? printk_kthread_func+0x15b/0x370
[61892.932436]  ? schedstat_stop+0x10/0x10
[61892.932449]  ? console_start+0x30/0x30
[61892.932455]  ? kthread+0xf2/0x120
[61892.932460]  ? kthread_complete_and_exit+0x20/0x20
[61892.932471]  ? ret_from_fork+0x1f/0x30
[61892.932502]  </TASK>


pr/tty0 is in the interruptible sleep and the stack looks reasonable.

pr/ttyS0 is in runnable state and the stack is weird. The '?' means
that this address was found on the stack, it belongs to some
function but the unwinder is not able to assign it to the current
call path by going back via the return addresses stored on the stack.


> I hope that at least the qemu case will let you to analyze it by 
> yourself. I run my test system with the following command:
> 
> qemu-system-aarch64 -kernel virt/Image -append "console=ttyAMA0 
> root=/dev/vda rootwait" -M virt -cpu cortex-a57 -smp 2 -m 1024 -device 
> virtio-blk-device,drive=virtio-blk0 -drive 
> file=qemu-virt-rootfs.raw,id=virtio-blk0,if=none,format=raw -netdev 
> user,id=user -device virtio-net-device,netdev=user -display none

Thanks a lot for all the information. It is really helpful.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
@ 2022-05-03  6:49                               ` Petr Mladek
  0 siblings, 0 replies; 99+ messages in thread
From: Petr Mladek @ 2022-05-03  6:49 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: John Ogness, Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman, linux-amlogic

On Tue 2022-05-03 01:13:19, Marek Szyprowski wrote:
> Hi Petr,
> 
> On 02.05.2022 15:17, Petr Mladek wrote:
> > On Mon 2022-05-02 11:19:07, Marek Szyprowski wrote:
> >> On 30.04.2022 18:00, John Ogness wrote:
> >>> On 2022-04-29, Marek Szyprowski <m.szyprowski@samsung.com> wrote:
> >>>> The same issue happens if I boot with init=/bin/bash
> >>> Very interesting. Since you are seeing all the output up until you try
> >>> doing something, I guess receiving UART data is triggering the issue.
> >> Right, this is how it looks like.
> >>
> >>>> I found something really interesting. When lockup happens, I'm still
> >>>> able to log via ssh and trigger any magic sysrq action via
> >>>> /proc/sysrq-trigger.
> >>> If you boot the system and directly login via ssh (without sending any
> >>> data via serial), can you trigger printk output on the UART? For
> >>> example, with:
> >>>
> >>>       echo hello > /dev/kmsg
> >>>
> >>> (You might need to increase your loglevel to see it.)
> >> Data written to /dev/kmsg and all kernel logs were always displayed
> >> correctly. Also data written directly to /dev/ttyAML0 is displayed
> >> properly on the console. The latter doesn't however trigger the input
> >> related activity.
> >>
> >> It looks that the data read from the uart is delivered only if other
> >> activity happens on the kernel console. If I type 'reboot' and press
> >> enter, nothing happens immediately. If I type 'date >/dev/ttyAML0' via
> >> ssh then, I only see the date printed on the console. However if I type
> >> 'date >/dev/kmsg', the the date is printed and reboot happens.
> > This is really interesting.
> >
> > 'date >/dev/kmsg' should be handled like a normal printk().
> > It should get pushed to the console using printk kthread,
> > that calls call_console_driver() that calls con->write()
> > callback. In our case, it should be meson_serial_console_write().
> >
> > I am not sure if meson_serial_console_write() is used also
> > when writing via /dev/ttyAML0.
> >
> >>>> It turned out that the UART console is somehow blocked, but it
> >>>> receives and buffers all the input. For example after issuing "echo
> >>>>    >/proc/sysrq-trigger" from the ssh console, the UART console has been
> >>>> updated and I see the magic sysrq banner and then all the commands I
> >>>> blindly typed in the UART console! However this doesn't unblock the
> >>>> console.
> >>> sysrq falls back to direct printing. This would imply that the kthread
> >>> printer is somehow unable to print.
> >>>
> >>>> Here is the output of 't' magic sys request:
> >>>>
> >>>> https://protect2.fireeye.com/v1/url?k=8649f24d-e73258c4-86487902-74fe48600034-a2ca6bb18361467d&q=1&e=1bc4226f-a422-42b9-95e8-128845b8609f&u=https%3A%2F%2Fpastebin.com%2FfjbRuy4f
> >>> It looks like the call trace for the printing kthread (pr/ttyAML0) is
> >>> corrupt.
> >> Right, good catch. For comparison, here is a 't' sysrq result from the
> >> 'working' serial console (next-20220429), which happens usually 1 of 4
> >> boots:
> >>
> >> https://protect2.fireeye.com/v1/url?k=610106f1-008a13b6-61008dbe-000babff99aa-11083c39c44861df&q=1&e=2eafad9e-c5d2-4696-9d78-f3b5885256dc&u=https%3A%2F%2Fpastebin.com%2Fmp8zGFbW
> > Strange. The backtrace is weird here too:
> >
> > [   50.514509] task:pr/ttyAML0      state:R  running task     stack:    0 pid:   65 ppid:     2 flags:0x00000008
> > [   50.514540] Call trace:
> > [   50.514548]  __switch_to+0xe8/0x160
> > [   50.514561]  meson_serial_console+0x78/0x118
> >
> > There should be kthread() and printk_kthread_func() on the stack.
> >
> > Hmm,  meson_serial_console+0x78/0x118 is weird on its own.
> > meson_serial_console is the name of the structure. I would
> > expect a name of the .write callback, like
> > meson_serial_console_write()
> 
> Well, this sounds a bit like a stack corruption. For the reference, I've 
> checked what magic sysrq 't' reports for my other test boards:
> 
> RaspberryPi4:
> 
> [  166.702431] task:pr/ttyS0        state:R stack:    0 pid:   64 
> ppid:     2 flags:0x00000008
> [  166.711069] Call trace:
> [  166.713647]  __switch_to+0xe8/0x160
> [  166.717216]  __schedule+0x2f4/0x9f0
> [  166.720862]  log_wait+0x0/0x50
> [  166.724081] task:vfio-irqfd-clea state:I stack:    0 pid:   65 
> ppid:     2 flags:0x00000008
> [  166.732698] Call trace:
> 
> 
> ARM Juno R1:
> 
> [   74.356562] task:pr/ttyAMA0      state:R  running task stack:    0 
> pid:   47 ppid:     2 flags:0x00000008
> [   74.356605] Call trace:
> [   74.356617]  __switch_to+0xe8/0x160
> [   74.356637]  amba_console+0x78/0x118
> [   74.356657] task:kworker/2:1     state:I stack:    0 pid:   48 
> ppid:     2 flags:0x00000008
> [   74.356695] Workqueue:  0x0 (mm_percpu_wq)
> [   74.356738] Call trace:
> 
> 
> QEMU virt/arm64:
> 
> [  174.155760] task:pr/ttyAMA0      state:S stack:    0 pid:   26 
> ppid:     2 flags:0x00000008
> [  174.156305] Call trace:
> [  174.156529]  __switch_to+0xe8/0x160
> [  174.157131]  0xffff5ebbbfdd62d8

You mentioned in the other mail that the other boards work as
expected. I mean that console gets stuck only on the meson board.
Is it true, please?

The stack looks really weird. But another weird thing is that
even the meson board is able to show the messages, for example,
using echo hello >/dev/kmsg. It suggests that the kthreads
somehow work.

There is also a possibility that this code path is optimized
some special way and the unwinder has troubles to show
the stack correctly.


> In the last case it doesn't happen always. In the other runs I got the 
> following log from QEMU virt/arm64:
> 
> [  200.537579] task:pr/ttyAMA0      state:S stack:    0 pid:   26 
> ppid:     2 flags:0x00000008
> [  200.538121] Call trace:
> [  200.538341]  __switch_to+0xe8/0x160
> [  200.538583]  __schedule+0x2f4/0x9f0
> [  200.538822]  schedule+0x54/0xd0
> [  200.539047]  printk_kthread_func+0x2d8/0x3bc
> [  200.539301]  kthread+0x118/0x11c
> [  200.539523]  ret_from_fork+0x10/0x20

This is what I would expect when the kthread is in an interruptible
sleep waiting for new strings to handle.

BTW: This is what I see on my x86_64 test system:

[61892.932242] task:pr/tty0         state:S stack:    0 pid:   14 ppid:     2 flags:0x00004000
[61892.932250] Call Trace:
[61892.932253]  <TASK>
[61892.932263]  __schedule+0x376/0xbb0
[61892.932284]  schedule+0x44/0xb0
[61892.932290]  printk_kthread_func+0x18f/0x370
[61892.932303]  ? schedstat_stop+0x10/0x10
[61892.932316]  ? console_start+0x30/0x30
[61892.932322]  kthread+0xf2/0x120
[61892.932327]  ? kthread_complete_and_exit+0x20/0x20
[61892.932338]  ret_from_fork+0x1f/0x30
[61892.932370]  </TASK>
[61892.932373] task:pr/ttyS0        state:R  running task     stack:    0 pid:   15 ppid:     2 flags:0x00004000
[61892.932391] Call Trace:
[61892.932398]  <TASK>
[61892.932427]  ? printk_kthread_func+0x15b/0x370
[61892.932436]  ? schedstat_stop+0x10/0x10
[61892.932449]  ? console_start+0x30/0x30
[61892.932455]  ? kthread+0xf2/0x120
[61892.932460]  ? kthread_complete_and_exit+0x20/0x20
[61892.932471]  ? ret_from_fork+0x1f/0x30
[61892.932502]  </TASK>


pr/tty0 is in the interruptible sleep and the stack looks reasonable.

pr/ttyS0 is in runnable state and the stack is weird. The '?' means
that this address was found on the stack, it belongs to some
function but the unwinder is not able to assign it to the current
call path by going back via the return addresses stored on the stack.


> I hope that at least the qemu case will let you to analyze it by 
> yourself. I run my test system with the following command:
> 
> qemu-system-aarch64 -kernel virt/Image -append "console=ttyAMA0 
> root=/dev/vda rootwait" -M virt -cpu cortex-a57 -smp 2 -m 1024 -device 
> virtio-blk-device,drive=virtio-blk0 -drive 
> file=qemu-virt-rootfs.raw,id=virtio-blk0,if=none,format=raw -netdev 
> user,id=user -device virtio-net-device,netdev=user -display none

Thanks a lot for all the information. It is really helpful.

Best Regards,
Petr

_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
  2022-05-02 22:29                             ` Marek Szyprowski
@ 2022-05-04  5:56                               ` John Ogness
  -1 siblings, 0 replies; 99+ messages in thread
From: John Ogness @ 2022-05-04  5:56 UTC (permalink / raw)
  To: Marek Szyprowski, Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman, linux-amlogic

On 2022-05-03, Marek Szyprowski <m.szyprowski@samsung.com> wrote:
>> I suppose if you login via ssh and check /proc/interrupts, then type
>> some things over serial, then check /proc/interrupts again, you will
>> see there have been no interrupts for the uart. But interrupts for
>> other devices are happening. Is this correct?
>
> Right. The counter for ttyAML0 is not increased when lockup happens
> and I type something to the uart console.

Hmmm. This would imply that the interrupts are disabled fo the UART.

Just to be sure that we haven't corrupted something in the driver, if
you make the following change, everything works, right?

--------- BEGIN PATCH ------
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index c7973266b176..1eaa323e335c 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -3578,7 +3578,7 @@ static int __init printk_activate_kthreads(void)
 	struct console *con;
 
 	console_lock();
-	printk_kthreads_available = true;
+	//printk_kthreads_available = true;
 	for_each_console(con)
 		printk_start_kthread(con);
 	console_unlock();
--------- END PATCH ------

The above change will cause the kthreads to not print and instead always
fallback to the direct method.

John

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
@ 2022-05-04  5:56                               ` John Ogness
  0 siblings, 0 replies; 99+ messages in thread
From: John Ogness @ 2022-05-04  5:56 UTC (permalink / raw)
  To: Marek Szyprowski, Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman, linux-amlogic

On 2022-05-03, Marek Szyprowski <m.szyprowski@samsung.com> wrote:
>> I suppose if you login via ssh and check /proc/interrupts, then type
>> some things over serial, then check /proc/interrupts again, you will
>> see there have been no interrupts for the uart. But interrupts for
>> other devices are happening. Is this correct?
>
> Right. The counter for ttyAML0 is not increased when lockup happens
> and I type something to the uart console.

Hmmm. This would imply that the interrupts are disabled fo the UART.

Just to be sure that we haven't corrupted something in the driver, if
you make the following change, everything works, right?

--------- BEGIN PATCH ------
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index c7973266b176..1eaa323e335c 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -3578,7 +3578,7 @@ static int __init printk_activate_kthreads(void)
 	struct console *con;
 
 	console_lock();
-	printk_kthreads_available = true;
+	//printk_kthreads_available = true;
 	for_each_console(con)
 		printk_start_kthread(con);
 	console_unlock();
--------- END PATCH ------

The above change will cause the kthreads to not print and instead always
fallback to the direct method.

John

_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
  2022-05-03  6:49                               ` Petr Mladek
@ 2022-05-04  6:05                                 ` Marek Szyprowski
  -1 siblings, 0 replies; 99+ messages in thread
From: Marek Szyprowski @ 2022-05-04  6:05 UTC (permalink / raw)
  To: Petr Mladek
  Cc: John Ogness, Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman, linux-amlogic

Hi,

On 03.05.2022 08:49, Petr Mladek wrote:
> On Tue 2022-05-03 01:13:19, Marek Szyprowski wrote:
>> On 02.05.2022 15:17, Petr Mladek wrote:
>>> On Mon 2022-05-02 11:19:07, Marek Szyprowski wrote:
>>>> On 30.04.2022 18:00, John Ogness wrote:
>>>>> On 2022-04-29, Marek Szyprowski <m.szyprowski@samsung.com> wrote:
>>>>>> The same issue happens if I boot with init=/bin/bash
>>>>> Very interesting. Since you are seeing all the output up until you try
>>>>> doing something, I guess receiving UART data is triggering the issue.
>>>> Right, this is how it looks like.
>>>>
>>>>>> I found something really interesting. When lockup happens, I'm still
>>>>>> able to log via ssh and trigger any magic sysrq action via
>>>>>> /proc/sysrq-trigger.
>>>>> If you boot the system and directly login via ssh (without sending any
>>>>> data via serial), can you trigger printk output on the UART? For
>>>>> example, with:
>>>>>
>>>>>        echo hello > /dev/kmsg
>>>>>
>>>>> (You might need to increase your loglevel to see it.)
>>>> Data written to /dev/kmsg and all kernel logs were always displayed
>>>> correctly. Also data written directly to /dev/ttyAML0 is displayed
>>>> properly on the console. The latter doesn't however trigger the input
>>>> related activity.
>>>>
>>>> It looks that the data read from the uart is delivered only if other
>>>> activity happens on the kernel console. If I type 'reboot' and press
>>>> enter, nothing happens immediately. If I type 'date >/dev/ttyAML0' via
>>>> ssh then, I only see the date printed on the console. However if I type
>>>> 'date >/dev/kmsg', the the date is printed and reboot happens.
>>> This is really interesting.
>>>
>>> 'date >/dev/kmsg' should be handled like a normal printk().
>>> It should get pushed to the console using printk kthread,
>>> that calls call_console_driver() that calls con->write()
>>> callback. In our case, it should be meson_serial_console_write().
>>>
>>> I am not sure if meson_serial_console_write() is used also
>>> when writing via /dev/ttyAML0.
>>>
>>>>>> It turned out that the UART console is somehow blocked, but it
>>>>>> receives and buffers all the input. For example after issuing "echo
>>>>>>     >/proc/sysrq-trigger" from the ssh console, the UART console has been
>>>>>> updated and I see the magic sysrq banner and then all the commands I
>>>>>> blindly typed in the UART console! However this doesn't unblock the
>>>>>> console.
>>>>> sysrq falls back to direct printing. This would imply that the kthread
>>>>> printer is somehow unable to print.
>>>>>
>>>>>> Here is the output of 't' magic sys request:
>>>>>>
>>>>>> https://protect2.fireeye.com/v1/url?k=8649f24d-e73258c4-86487902-74fe48600034-a2ca6bb18361467d&q=1&e=1bc4226f-a422-42b9-95e8-128845b8609f&u=https%3A%2F%2Fpastebin.com%2FfjbRuy4f
>>>>> It looks like the call trace for the printing kthread (pr/ttyAML0) is
>>>>> corrupt.
>>>> Right, good catch. For comparison, here is a 't' sysrq result from the
>>>> 'working' serial console (next-20220429), which happens usually 1 of 4
>>>> boots:
>>>>
>>>> https://protect2.fireeye.com/v1/url?k=610106f1-008a13b6-61008dbe-000babff99aa-11083c39c44861df&q=1&e=2eafad9e-c5d2-4696-9d78-f3b5885256dc&u=https%3A%2F%2Fpastebin.com%2Fmp8zGFbW
>>> Strange. The backtrace is weird here too:
>>>
>>> [   50.514509] task:pr/ttyAML0      state:R  running task     stack:    0 pid:   65 ppid:     2 flags:0x00000008
>>> [   50.514540] Call trace:
>>> [   50.514548]  __switch_to+0xe8/0x160
>>> [   50.514561]  meson_serial_console+0x78/0x118
>>>
>>> There should be kthread() and printk_kthread_func() on the stack.
>>>
>>> Hmm,  meson_serial_console+0x78/0x118 is weird on its own.
>>> meson_serial_console is the name of the structure. I would
>>> expect a name of the .write callback, like
>>> meson_serial_console_write()
>> Well, this sounds a bit like a stack corruption. For the reference, I've
>> checked what magic sysrq 't' reports for my other test boards:
>>
>> RaspberryPi4:
>>
>> [  166.702431] task:pr/ttyS0        state:R stack:    0 pid:   64
>> ppid:     2 flags:0x00000008
>> [  166.711069] Call trace:
>> [  166.713647]  __switch_to+0xe8/0x160
>> [  166.717216]  __schedule+0x2f4/0x9f0
>> [  166.720862]  log_wait+0x0/0x50
>> [  166.724081] task:vfio-irqfd-clea state:I stack:    0 pid:   65
>> ppid:     2 flags:0x00000008
>> [  166.732698] Call trace:
>>
>>
>> ARM Juno R1:
>>
>> [   74.356562] task:pr/ttyAMA0      state:R  running task stack:    0
>> pid:   47 ppid:     2 flags:0x00000008
>> [   74.356605] Call trace:
>> [   74.356617]  __switch_to+0xe8/0x160
>> [   74.356637]  amba_console+0x78/0x118
>> [   74.356657] task:kworker/2:1     state:I stack:    0 pid:   48
>> ppid:     2 flags:0x00000008
>> [   74.356695] Workqueue:  0x0 (mm_percpu_wq)
>> [   74.356738] Call trace:
>>
>>
>> QEMU virt/arm64:
>>
>> [  174.155760] task:pr/ttyAMA0      state:S stack:    0 pid:   26
>> ppid:     2 flags:0x00000008
>> [  174.156305] Call trace:
>> [  174.156529]  __switch_to+0xe8/0x160
>> [  174.157131]  0xffff5ebbbfdd62d8
> You mentioned in the other mail that the other boards work as
> expected. I mean that console gets stuck only on the meson board.
> Is it true, please?

Right. Even on Meson based boards the console is operational about 1 of 
4 boots.


> The stack looks really weird. But another weird thing is that
> even the meson board is able to show the messages, for example,
> using echo hello >/dev/kmsg. It suggests that the kthreads
> somehow work.
>
> There is also a possibility that this code path is optimized
> some special way and the unwinder has troubles to show
> the stack correctly.

I doubt that this is a result of the compiler's optimization. See my 
logs from QEMU's virt machine. I've managed to capture 2 states of 
ttyAMA0 task. One shows some kind of stack corruption imho. It doesn't 
happen always though.

 > ...

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
@ 2022-05-04  6:05                                 ` Marek Szyprowski
  0 siblings, 0 replies; 99+ messages in thread
From: Marek Szyprowski @ 2022-05-04  6:05 UTC (permalink / raw)
  To: Petr Mladek
  Cc: John Ogness, Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman, linux-amlogic

Hi,

On 03.05.2022 08:49, Petr Mladek wrote:
> On Tue 2022-05-03 01:13:19, Marek Szyprowski wrote:
>> On 02.05.2022 15:17, Petr Mladek wrote:
>>> On Mon 2022-05-02 11:19:07, Marek Szyprowski wrote:
>>>> On 30.04.2022 18:00, John Ogness wrote:
>>>>> On 2022-04-29, Marek Szyprowski <m.szyprowski@samsung.com> wrote:
>>>>>> The same issue happens if I boot with init=/bin/bash
>>>>> Very interesting. Since you are seeing all the output up until you try
>>>>> doing something, I guess receiving UART data is triggering the issue.
>>>> Right, this is how it looks like.
>>>>
>>>>>> I found something really interesting. When lockup happens, I'm still
>>>>>> able to log via ssh and trigger any magic sysrq action via
>>>>>> /proc/sysrq-trigger.
>>>>> If you boot the system and directly login via ssh (without sending any
>>>>> data via serial), can you trigger printk output on the UART? For
>>>>> example, with:
>>>>>
>>>>>        echo hello > /dev/kmsg
>>>>>
>>>>> (You might need to increase your loglevel to see it.)
>>>> Data written to /dev/kmsg and all kernel logs were always displayed
>>>> correctly. Also data written directly to /dev/ttyAML0 is displayed
>>>> properly on the console. The latter doesn't however trigger the input
>>>> related activity.
>>>>
>>>> It looks that the data read from the uart is delivered only if other
>>>> activity happens on the kernel console. If I type 'reboot' and press
>>>> enter, nothing happens immediately. If I type 'date >/dev/ttyAML0' via
>>>> ssh then, I only see the date printed on the console. However if I type
>>>> 'date >/dev/kmsg', the the date is printed and reboot happens.
>>> This is really interesting.
>>>
>>> 'date >/dev/kmsg' should be handled like a normal printk().
>>> It should get pushed to the console using printk kthread,
>>> that calls call_console_driver() that calls con->write()
>>> callback. In our case, it should be meson_serial_console_write().
>>>
>>> I am not sure if meson_serial_console_write() is used also
>>> when writing via /dev/ttyAML0.
>>>
>>>>>> It turned out that the UART console is somehow blocked, but it
>>>>>> receives and buffers all the input. For example after issuing "echo
>>>>>>     >/proc/sysrq-trigger" from the ssh console, the UART console has been
>>>>>> updated and I see the magic sysrq banner and then all the commands I
>>>>>> blindly typed in the UART console! However this doesn't unblock the
>>>>>> console.
>>>>> sysrq falls back to direct printing. This would imply that the kthread
>>>>> printer is somehow unable to print.
>>>>>
>>>>>> Here is the output of 't' magic sys request:
>>>>>>
>>>>>> https://protect2.fireeye.com/v1/url?k=8649f24d-e73258c4-86487902-74fe48600034-a2ca6bb18361467d&q=1&e=1bc4226f-a422-42b9-95e8-128845b8609f&u=https%3A%2F%2Fpastebin.com%2FfjbRuy4f
>>>>> It looks like the call trace for the printing kthread (pr/ttyAML0) is
>>>>> corrupt.
>>>> Right, good catch. For comparison, here is a 't' sysrq result from the
>>>> 'working' serial console (next-20220429), which happens usually 1 of 4
>>>> boots:
>>>>
>>>> https://protect2.fireeye.com/v1/url?k=610106f1-008a13b6-61008dbe-000babff99aa-11083c39c44861df&q=1&e=2eafad9e-c5d2-4696-9d78-f3b5885256dc&u=https%3A%2F%2Fpastebin.com%2Fmp8zGFbW
>>> Strange. The backtrace is weird here too:
>>>
>>> [   50.514509] task:pr/ttyAML0      state:R  running task     stack:    0 pid:   65 ppid:     2 flags:0x00000008
>>> [   50.514540] Call trace:
>>> [   50.514548]  __switch_to+0xe8/0x160
>>> [   50.514561]  meson_serial_console+0x78/0x118
>>>
>>> There should be kthread() and printk_kthread_func() on the stack.
>>>
>>> Hmm,  meson_serial_console+0x78/0x118 is weird on its own.
>>> meson_serial_console is the name of the structure. I would
>>> expect a name of the .write callback, like
>>> meson_serial_console_write()
>> Well, this sounds a bit like a stack corruption. For the reference, I've
>> checked what magic sysrq 't' reports for my other test boards:
>>
>> RaspberryPi4:
>>
>> [  166.702431] task:pr/ttyS0        state:R stack:    0 pid:   64
>> ppid:     2 flags:0x00000008
>> [  166.711069] Call trace:
>> [  166.713647]  __switch_to+0xe8/0x160
>> [  166.717216]  __schedule+0x2f4/0x9f0
>> [  166.720862]  log_wait+0x0/0x50
>> [  166.724081] task:vfio-irqfd-clea state:I stack:    0 pid:   65
>> ppid:     2 flags:0x00000008
>> [  166.732698] Call trace:
>>
>>
>> ARM Juno R1:
>>
>> [   74.356562] task:pr/ttyAMA0      state:R  running task stack:    0
>> pid:   47 ppid:     2 flags:0x00000008
>> [   74.356605] Call trace:
>> [   74.356617]  __switch_to+0xe8/0x160
>> [   74.356637]  amba_console+0x78/0x118
>> [   74.356657] task:kworker/2:1     state:I stack:    0 pid:   48
>> ppid:     2 flags:0x00000008
>> [   74.356695] Workqueue:  0x0 (mm_percpu_wq)
>> [   74.356738] Call trace:
>>
>>
>> QEMU virt/arm64:
>>
>> [  174.155760] task:pr/ttyAMA0      state:S stack:    0 pid:   26
>> ppid:     2 flags:0x00000008
>> [  174.156305] Call trace:
>> [  174.156529]  __switch_to+0xe8/0x160
>> [  174.157131]  0xffff5ebbbfdd62d8
> You mentioned in the other mail that the other boards work as
> expected. I mean that console gets stuck only on the meson board.
> Is it true, please?

Right. Even on Meson based boards the console is operational about 1 of 
4 boots.


> The stack looks really weird. But another weird thing is that
> even the meson board is able to show the messages, for example,
> using echo hello >/dev/kmsg. It suggests that the kthreads
> somehow work.
>
> There is also a possibility that this code path is optimized
> some special way and the unwinder has troubles to show
> the stack correctly.

I doubt that this is a result of the compiler's optimization. See my 
logs from QEMU's virt machine. I've managed to capture 2 states of 
ttyAMA0 task. One shows some kind of stack corruption imho. It doesn't 
happen always though.

 > ...

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
  2022-05-04  5:56                               ` John Ogness
@ 2022-05-04  6:52                                 ` Marek Szyprowski
  -1 siblings, 0 replies; 99+ messages in thread
From: Marek Szyprowski @ 2022-05-04  6:52 UTC (permalink / raw)
  To: John Ogness, Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman, linux-amlogic

Hi John,

On 04.05.2022 07:56, John Ogness wrote:
> On 2022-05-03, Marek Szyprowski <m.szyprowski@samsung.com> wrote:
>>> I suppose if you login via ssh and check /proc/interrupts, then type
>>> some things over serial, then check /proc/interrupts again, you will
>>> see there have been no interrupts for the uart. But interrupts for
>>> other devices are happening. Is this correct?
>> Right. The counter for ttyAML0 is not increased when lockup happens
>> and I type something to the uart console.
> Hmmm. This would imply that the interrupts are disabled fo the UART.
>
> Just to be sure that we haven't corrupted something in the driver, if
> you make the following change, everything works, right?
>
> --------- BEGIN PATCH ------
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index c7973266b176..1eaa323e335c 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -3578,7 +3578,7 @@ static int __init printk_activate_kthreads(void)
>   	struct console *con;
>   
>   	console_lock();
> -	printk_kthreads_available = true;
> +	//printk_kthreads_available = true;
>   	for_each_console(con)
>   		printk_start_kthread(con);
>   	console_unlock();
> --------- END PATCH ------
>
> The above change will cause the kthreads to not print and instead always
> fallback to the direct method.

With the above change console always works.

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
@ 2022-05-04  6:52                                 ` Marek Szyprowski
  0 siblings, 0 replies; 99+ messages in thread
From: Marek Szyprowski @ 2022-05-04  6:52 UTC (permalink / raw)
  To: John Ogness, Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman, linux-amlogic

Hi John,

On 04.05.2022 07:56, John Ogness wrote:
> On 2022-05-03, Marek Szyprowski <m.szyprowski@samsung.com> wrote:
>>> I suppose if you login via ssh and check /proc/interrupts, then type
>>> some things over serial, then check /proc/interrupts again, you will
>>> see there have been no interrupts for the uart. But interrupts for
>>> other devices are happening. Is this correct?
>> Right. The counter for ttyAML0 is not increased when lockup happens
>> and I type something to the uart console.
> Hmmm. This would imply that the interrupts are disabled fo the UART.
>
> Just to be sure that we haven't corrupted something in the driver, if
> you make the following change, everything works, right?
>
> --------- BEGIN PATCH ------
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index c7973266b176..1eaa323e335c 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -3578,7 +3578,7 @@ static int __init printk_activate_kthreads(void)
>   	struct console *con;
>   
>   	console_lock();
> -	printk_kthreads_available = true;
> +	//printk_kthreads_available = true;
>   	for_each_console(con)
>   		printk_start_kthread(con);
>   	console_unlock();
> --------- END PATCH ------
>
> The above change will cause the kthreads to not print and instead always
> fallback to the direct method.

With the above change console always works.

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
  2022-05-02 23:13                             ` Marek Szyprowski
@ 2022-05-04 21:11                               ` John Ogness
  -1 siblings, 0 replies; 99+ messages in thread
From: John Ogness @ 2022-05-04 21:11 UTC (permalink / raw)
  To: Marek Szyprowski, Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman, linux-amlogic

On 2022-05-03, Marek Szyprowski <m.szyprowski@samsung.com> wrote:
> QEMU virt/arm64:
>
> [  174.155760] task:pr/ttyAMA0      state:S stack:    0 pid:   26 
> ppid:     2 flags:0x00000008
> [  174.156305] Call trace:
> [  174.156529]  __switch_to+0xe8/0x160
> [  174.157131]  0xffff5ebbbfdd62d8

I can reproduce the apparent stack corruption with qemu:

[    5.545268] task:pr/ttyAMA0      state:S stack:    0 pid:   26 ppid:     2 flags:0x00000008
[    5.545520] Call trace:
[    5.545620]  __switch_to+0x104/0x160
[    5.545796]  __schedule+0x2f4/0x9f0
[    5.546122]  schedule+0x54/0xd0
[    5.546206]  0x0

When it happens, the printk-kthread is the only one with the corrupted
stack. It seems I am doing something wrong when creating the kthread? I
will investigate this.

Thanks Marek for helping us to narrow this down.

John

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
@ 2022-05-04 21:11                               ` John Ogness
  0 siblings, 0 replies; 99+ messages in thread
From: John Ogness @ 2022-05-04 21:11 UTC (permalink / raw)
  To: Marek Szyprowski, Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman, linux-amlogic

On 2022-05-03, Marek Szyprowski <m.szyprowski@samsung.com> wrote:
> QEMU virt/arm64:
>
> [  174.155760] task:pr/ttyAMA0      state:S stack:    0 pid:   26 
> ppid:     2 flags:0x00000008
> [  174.156305] Call trace:
> [  174.156529]  __switch_to+0xe8/0x160
> [  174.157131]  0xffff5ebbbfdd62d8

I can reproduce the apparent stack corruption with qemu:

[    5.545268] task:pr/ttyAMA0      state:S stack:    0 pid:   26 ppid:     2 flags:0x00000008
[    5.545520] Call trace:
[    5.545620]  __switch_to+0x104/0x160
[    5.545796]  __schedule+0x2f4/0x9f0
[    5.546122]  schedule+0x54/0xd0
[    5.546206]  0x0

When it happens, the printk-kthread is the only one with the corrupted
stack. It seems I am doing something wrong when creating the kthread? I
will investigate this.

Thanks Marek for helping us to narrow this down.

John

_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
  2022-05-04 21:11                               ` John Ogness
@ 2022-05-04 22:42                                 ` John Ogness
  -1 siblings, 0 replies; 99+ messages in thread
From: John Ogness @ 2022-05-04 22:42 UTC (permalink / raw)
  To: Marek Szyprowski, Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman, linux-amlogic

On 2022-05-04, John Ogness <john.ogness@linutronix.de> wrote:
> I can reproduce the apparent stack corruption with qemu:
>
> [    5.545268] task:pr/ttyAMA0      state:S stack:    0 pid:   26 ppid:     2 flags:0x00000008
> [    5.545520] Call trace:
> [    5.545620]  __switch_to+0x104/0x160
> [    5.545796]  __schedule+0x2f4/0x9f0
> [    5.546122]  schedule+0x54/0xd0
> [    5.546206]  0x0

I believe I am chasing a ghost. I can rather easily reproduce these
strange call traces, but if another sysrq-t is sent afterwards, the call
trace is OK. Also, I added trace_dump_stack() into the printk-kthread
main loop to dump the stack on every iteration. There I never see any
corruption, even though the timestamps are near the sysrq-t dump showing
corruption. Moving trace_dump_stack() into
amba-pl011:pl011_console_write() also showed no stack corruption at very
near times when sysrq-t did.

And it should be noted that the console-hanging issues reported in this
thread _cannot_ be reproduced with qemu.

So I will stop focussing on this "corrupt stack" thing and instead
investigate what the meson driver is doing that causes it to get
stuck. Since interrupts do not even fire, I'm guessing that the RX
interrupts are not being re-enabled (AML_UART_RX_INT_EN) for some code
path. This bit is only explicitly set once, in
meson_uart_startup(). Whenever the bit is cleared, later the previous
value is restored. This is assumed to mean the interrupt gets
re-enabled. But if there is some code path where multiple CPUs can
modify the register, then the interrupt could end up permanently
disabled.

I will go through and check if all access to AML_UART_CONTROL is
protected by port->lock.

John

_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
@ 2022-05-04 22:42                                 ` John Ogness
  0 siblings, 0 replies; 99+ messages in thread
From: John Ogness @ 2022-05-04 22:42 UTC (permalink / raw)
  To: Marek Szyprowski, Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman, linux-amlogic

On 2022-05-04, John Ogness <john.ogness@linutronix.de> wrote:
> I can reproduce the apparent stack corruption with qemu:
>
> [    5.545268] task:pr/ttyAMA0      state:S stack:    0 pid:   26 ppid:     2 flags:0x00000008
> [    5.545520] Call trace:
> [    5.545620]  __switch_to+0x104/0x160
> [    5.545796]  __schedule+0x2f4/0x9f0
> [    5.546122]  schedule+0x54/0xd0
> [    5.546206]  0x0

I believe I am chasing a ghost. I can rather easily reproduce these
strange call traces, but if another sysrq-t is sent afterwards, the call
trace is OK. Also, I added trace_dump_stack() into the printk-kthread
main loop to dump the stack on every iteration. There I never see any
corruption, even though the timestamps are near the sysrq-t dump showing
corruption. Moving trace_dump_stack() into
amba-pl011:pl011_console_write() also showed no stack corruption at very
near times when sysrq-t did.

And it should be noted that the console-hanging issues reported in this
thread _cannot_ be reproduced with qemu.

So I will stop focussing on this "corrupt stack" thing and instead
investigate what the meson driver is doing that causes it to get
stuck. Since interrupts do not even fire, I'm guessing that the RX
interrupts are not being re-enabled (AML_UART_RX_INT_EN) for some code
path. This bit is only explicitly set once, in
meson_uart_startup(). Whenever the bit is cleared, later the previous
value is restored. This is assumed to mean the interrupt gets
re-enabled. But if there is some code path where multiple CPUs can
modify the register, then the interrupt could end up permanently
disabled.

I will go through and check if all access to AML_UART_CONTROL is
protected by port->lock.

John

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
  2022-05-04 22:42                                 ` John Ogness
@ 2022-05-05 22:33                                   ` John Ogness
  -1 siblings, 0 replies; 99+ messages in thread
From: John Ogness @ 2022-05-05 22:33 UTC (permalink / raw)
  To: Marek Szyprowski, Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman, linux-amlogic

Hi Marek,

On 2022-05-05, John Ogness <john.ogness@linutronix.de> wrote:
> I will go through and check if all access to AML_UART_CONTROL is
> protected by port->lock.

The startup() callback of the uart_ops is not called with the port
locked. I'm having difficulties identifying if the startup() callback
can occur after the console was already registered via meson_uart_init()
and could be actively printing, but I see other serial drivers are
protecting their registers in the startup() callback with the
port->lock.

Could you try booting the meson hardware with the following change? (And
removing any previous debug changes I posted?)

John

diff --git a/drivers/tty/serial/meson_uart.c b/drivers/tty/serial/meson_uart.c
index 2bf1c57e0981..f551b8603817 100644
--- a/drivers/tty/serial/meson_uart.c
+++ b/drivers/tty/serial/meson_uart.c
@@ -267,9 +267,12 @@ static void meson_uart_reset(struct uart_port *port)
 
 static int meson_uart_startup(struct uart_port *port)
 {
+	unsigned long flags;
 	u32 val;
 	int ret = 0;
 
+	spin_lock_irqsave(&port->lock, flags);
+
 	val = readl(port->membase + AML_UART_CONTROL);
 	val |= AML_UART_CLEAR_ERR;
 	writel(val, port->membase + AML_UART_CONTROL);
@@ -285,6 +288,8 @@ static int meson_uart_startup(struct uart_port *port)
 	val = (AML_UART_RECV_IRQ(1) | AML_UART_XMIT_IRQ(port->fifosize / 2));
 	writel(val, port->membase + AML_UART_MISC);
 
+	spin_unlock_irqrestore(&port->lock, flags);
+
 	ret = request_irq(port->irq, meson_uart_interrupt, 0,
 			  port->name, port);
 

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
@ 2022-05-05 22:33                                   ` John Ogness
  0 siblings, 0 replies; 99+ messages in thread
From: John Ogness @ 2022-05-05 22:33 UTC (permalink / raw)
  To: Marek Szyprowski, Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman, linux-amlogic

Hi Marek,

On 2022-05-05, John Ogness <john.ogness@linutronix.de> wrote:
> I will go through and check if all access to AML_UART_CONTROL is
> protected by port->lock.

The startup() callback of the uart_ops is not called with the port
locked. I'm having difficulties identifying if the startup() callback
can occur after the console was already registered via meson_uart_init()
and could be actively printing, but I see other serial drivers are
protecting their registers in the startup() callback with the
port->lock.

Could you try booting the meson hardware with the following change? (And
removing any previous debug changes I posted?)

John

diff --git a/drivers/tty/serial/meson_uart.c b/drivers/tty/serial/meson_uart.c
index 2bf1c57e0981..f551b8603817 100644
--- a/drivers/tty/serial/meson_uart.c
+++ b/drivers/tty/serial/meson_uart.c
@@ -267,9 +267,12 @@ static void meson_uart_reset(struct uart_port *port)
 
 static int meson_uart_startup(struct uart_port *port)
 {
+	unsigned long flags;
 	u32 val;
 	int ret = 0;
 
+	spin_lock_irqsave(&port->lock, flags);
+
 	val = readl(port->membase + AML_UART_CONTROL);
 	val |= AML_UART_CLEAR_ERR;
 	writel(val, port->membase + AML_UART_CONTROL);
@@ -285,6 +288,8 @@ static int meson_uart_startup(struct uart_port *port)
 	val = (AML_UART_RECV_IRQ(1) | AML_UART_XMIT_IRQ(port->fifosize / 2));
 	writel(val, port->membase + AML_UART_MISC);
 
+	spin_unlock_irqrestore(&port->lock, flags);
+
 	ret = request_irq(port->irq, meson_uart_interrupt, 0,
 			  port->name, port);
 

_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
  2022-05-05 22:33                                   ` John Ogness
@ 2022-05-06  6:43                                     ` Marek Szyprowski
  -1 siblings, 0 replies; 99+ messages in thread
From: Marek Szyprowski @ 2022-05-06  6:43 UTC (permalink / raw)
  To: John Ogness, Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman, linux-amlogic

Hi John,

On 06.05.2022 00:33, John Ogness wrote:
> On 2022-05-05, John Ogness <john.ogness@linutronix.de> wrote:
>> I will go through and check if all access to AML_UART_CONTROL is
>> protected by port->lock.
> The startup() callback of the uart_ops is not called with the port
> locked. I'm having difficulties identifying if the startup() callback
> can occur after the console was already registered via meson_uart_init()
> and could be actively printing, but I see other serial drivers are
> protecting their registers in the startup() callback with the
> port->lock.
>
> Could you try booting the meson hardware with the following change? (And
> removing any previous debug changes I posted?)

Bingo! It looks that the startup() is called when getty initializes 
console. This fixed the issues observed on the Amlogic Meson based boards.

Feel free to add:

Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
@ 2022-05-06  6:43                                     ` Marek Szyprowski
  0 siblings, 0 replies; 99+ messages in thread
From: Marek Szyprowski @ 2022-05-06  6:43 UTC (permalink / raw)
  To: John Ogness, Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman, linux-amlogic

Hi John,

On 06.05.2022 00:33, John Ogness wrote:
> On 2022-05-05, John Ogness <john.ogness@linutronix.de> wrote:
>> I will go through and check if all access to AML_UART_CONTROL is
>> protected by port->lock.
> The startup() callback of the uart_ops is not called with the port
> locked. I'm having difficulties identifying if the startup() callback
> can occur after the console was already registered via meson_uart_init()
> and could be actively printing, but I see other serial drivers are
> protecting their registers in the startup() callback with the
> port->lock.
>
> Could you try booting the meson hardware with the following change? (And
> removing any previous debug changes I posted?)

Bingo! It looks that the startup() is called when getty initializes 
console. This fixed the issues observed on the Amlogic Meson based boards.

Feel free to add:

Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
  2022-05-06  6:43                                     ` Marek Szyprowski
@ 2022-05-06  7:55                                       ` Neil Armstrong
  -1 siblings, 0 replies; 99+ messages in thread
From: Neil Armstrong @ 2022-05-06  7:55 UTC (permalink / raw)
  To: Marek Szyprowski, John Ogness, Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman, linux-amlogic

Hi,

On 06/05/2022 08:43, Marek Szyprowski wrote:
> Hi John,
> 
> On 06.05.2022 00:33, John Ogness wrote:
>> On 2022-05-05, John Ogness <john.ogness@linutronix.de> wrote:
>>> I will go through and check if all access to AML_UART_CONTROL is
>>> protected by port->lock.
>> The startup() callback of the uart_ops is not called with the port
>> locked. I'm having difficulties identifying if the startup() callback
>> can occur after the console was already registered via meson_uart_init()
>> and could be actively printing, but I see other serial drivers are
>> protecting their registers in the startup() callback with the
>> port->lock.
>>
>> Could you try booting the meson hardware with the following change? (And
>> removing any previous debug changes I posted?)
> 
> Bingo! It looks that the startup() is called when getty initializes
> console. This fixed the issues observed on the Amlogic Meson based boards.
> 
> Feel free to add:
> 
> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
> 
> Best regards

Thanks all for figuring out the issue, perhaps other uart drivers could fall
in the same issue if startup code isn't protected with lock ?

Neil

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
@ 2022-05-06  7:55                                       ` Neil Armstrong
  0 siblings, 0 replies; 99+ messages in thread
From: Neil Armstrong @ 2022-05-06  7:55 UTC (permalink / raw)
  To: Marek Szyprowski, John Ogness, Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman, linux-amlogic

Hi,

On 06/05/2022 08:43, Marek Szyprowski wrote:
> Hi John,
> 
> On 06.05.2022 00:33, John Ogness wrote:
>> On 2022-05-05, John Ogness <john.ogness@linutronix.de> wrote:
>>> I will go through and check if all access to AML_UART_CONTROL is
>>> protected by port->lock.
>> The startup() callback of the uart_ops is not called with the port
>> locked. I'm having difficulties identifying if the startup() callback
>> can occur after the console was already registered via meson_uart_init()
>> and could be actively printing, but I see other serial drivers are
>> protecting their registers in the startup() callback with the
>> port->lock.
>>
>> Could you try booting the meson hardware with the following change? (And
>> removing any previous debug changes I posted?)
> 
> Bingo! It looks that the startup() is called when getty initializes
> console. This fixed the issues observed on the Amlogic Meson based boards.
> 
> Feel free to add:
> 
> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
> 
> Best regards

Thanks all for figuring out the issue, perhaps other uart drivers could fall
in the same issue if startup code isn't protected with lock ?

Neil

_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
  2022-05-06  6:43                                     ` Marek Szyprowski
@ 2022-05-06  8:16                                       ` Petr Mladek
  -1 siblings, 0 replies; 99+ messages in thread
From: Petr Mladek @ 2022-05-06  8:16 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: John Ogness, Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman, linux-amlogic

On Fri 2022-05-06 08:43:02, Marek Szyprowski wrote:
> Hi John,
> 
> On 06.05.2022 00:33, John Ogness wrote:
> > On 2022-05-05, John Ogness <john.ogness@linutronix.de> wrote:
> >> I will go through and check if all access to AML_UART_CONTROL is
> >> protected by port->lock.
> > The startup() callback of the uart_ops is not called with the port
> > locked. I'm having difficulties identifying if the startup() callback
> > can occur after the console was already registered via meson_uart_init()
> > and could be actively printing, but I see other serial drivers are
> > protecting their registers in the startup() callback with the
> > port->lock.

I guess that it is used by the early console before the racy
code is called.

> > Could you try booting the meson hardware with the following change? (And
> > removing any previous debug changes I posted?)
> 
> Bingo! It looks that the startup() is called when getty initializes 
> console. This fixed the issues observed on the Amlogic Meson based boards.
> 
> Feel free to add:
> 
> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>

Uff, it is a huge relief that it has got fixed.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
@ 2022-05-06  8:16                                       ` Petr Mladek
  0 siblings, 0 replies; 99+ messages in thread
From: Petr Mladek @ 2022-05-06  8:16 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: John Ogness, Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman, linux-amlogic

On Fri 2022-05-06 08:43:02, Marek Szyprowski wrote:
> Hi John,
> 
> On 06.05.2022 00:33, John Ogness wrote:
> > On 2022-05-05, John Ogness <john.ogness@linutronix.de> wrote:
> >> I will go through and check if all access to AML_UART_CONTROL is
> >> protected by port->lock.
> > The startup() callback of the uart_ops is not called with the port
> > locked. I'm having difficulties identifying if the startup() callback
> > can occur after the console was already registered via meson_uart_init()
> > and could be actively printing, but I see other serial drivers are
> > protecting their registers in the startup() callback with the
> > port->lock.

I guess that it is used by the early console before the racy
code is called.

> > Could you try booting the meson hardware with the following change? (And
> > removing any previous debug changes I posted?)
> 
> Bingo! It looks that the startup() is called when getty initializes 
> console. This fixed the issues observed on the Amlogic Meson based boards.
> 
> Feel free to add:
> 
> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>

Uff, it is a huge relief that it has got fixed.

Best Regards,
Petr

_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
  2022-05-06  6:43                                     ` Marek Szyprowski
@ 2022-05-06  9:20                                       ` John Ogness
  -1 siblings, 0 replies; 99+ messages in thread
From: John Ogness @ 2022-05-06  9:20 UTC (permalink / raw)
  To: Marek Szyprowski, Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman, linux-amlogic

On 2022-05-06, Marek Szyprowski <m.szyprowski@samsung.com> wrote:
>> Could you try booting the meson hardware with the following change? (And
>> removing any previous debug changes I posted?)
>
> Bingo! It looks that the startup() is called when getty initializes 
> console. This fixed the issues observed on the Amlogic Meson based boards.
>
> Feel free to add:
>
> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>

Thanks Marek. I will post an official patch on the correct people/lists.

John

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
@ 2022-05-06  9:20                                       ` John Ogness
  0 siblings, 0 replies; 99+ messages in thread
From: John Ogness @ 2022-05-06  9:20 UTC (permalink / raw)
  To: Marek Szyprowski, Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman, linux-amlogic

On 2022-05-06, Marek Szyprowski <m.szyprowski@samsung.com> wrote:
>> Could you try booting the meson hardware with the following change? (And
>> removing any previous debug changes I posted?)
>
> Bingo! It looks that the startup() is called when getty initializes 
> console. This fixed the issues observed on the Amlogic Meson based boards.
>
> Feel free to add:
>
> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>

Thanks Marek. I will post an official patch on the correct people/lists.

John

_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
       [not found]             ` <CGME20220506112526eucas1p2a3688f87d3ed8331b99f2f876bf6c2f6@eucas1p2.samsung.com>
@ 2022-05-06 11:25               ` Marek Szyprowski
  2022-05-06 12:41                 ` John Ogness
  0 siblings, 1 reply; 99+ messages in thread
From: Marek Szyprowski @ 2022-05-06 11:25 UTC (permalink / raw)
  To: Petr Mladek, John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman, linux-arm-msm, Bjorn Andersson,
	Andy Gross

Hi All,

On 27.04.2022 09:08, Marek Szyprowski wrote:
> On 26.04.2022 15:16, Petr Mladek wrote:
>> On Tue 2022-04-26 14:07:42, Petr Mladek wrote:
>>> On Mon 2022-04-25 23:04:28, John Ogness wrote:
>>>> Currently threaded console printers synchronize against each
>>>> other using console_lock(). However, different console drivers
>>>> are unrelated and do not require any synchronization between
>>>> each other. Removing the synchronization between the threaded
>>>> console printers will allow each console to print at its own
>>>> speed.
>>>>
>>>> But the threaded consoles printers do still need to synchronize
>>>> against console_lock() callers. Introduce a per-console mutex
>>>> and a new console boolean field @blocked to provide this
>>>> synchronization.
>>>>
>>>> console_lock() is modified so that it must acquire the mutex
>>>> of each console in order to set the @blocked field. Console
>>>> printing threads will acquire their mutex while printing a
>>>> record. If @blocked was set, the thread will go back to sleep
>>>> instead of printing.
>>>>
>>>> The reason for the @blocked boolean field is so that
>>>> console_lock() callers do not need to acquire multiple console
>>>> mutexes simultaneously, which would introduce unnecessary
>>>> complexity due to nested mutex locking. Also, a new field
>>>> was chosen instead of adding a new @flags value so that the
>>>> blocked status could be checked without concern of reading
>>>> inconsistent values due to @flags updates from other contexts.
>>>>
>>>> Threaded console printers also need to synchronize against
>>>> console_trylock() callers. Since console_trylock() may be
>>>> called from any context, the per-console mutex cannot be used
>>>> for this synchronization. (mutex_trylock() cannot be called
>>>> from atomic contexts.) Introduce a global atomic counter to
>>>> identify if any threaded printers are active. The threaded
>>>> printers will also check the atomic counter to identify if the
>>>> console has been locked by another task via console_trylock().
>>>>
>>>> Note that @console_sem is still used to provide synchronization
>>>> between console_lock() and console_trylock() callers.
>>>>
>>>> A locking overview for console_lock(), console_trylock(), and the
>>>> threaded printers is as follows (pseudo code):
>>>>
>>>> console_lock()
>>>> {
>>>>          down(&console_sem);
>>>>          for_each_console(con) {
>>>>                  mutex_lock(&con->lock);
>>>>                  con->blocked = true;
>>>>                  mutex_unlock(&con->lock);
>>>>          }
>>>>          /* console_lock acquired */
>>>> }
>>>>
>>>> console_trylock()
>>>> {
>>>>          if (down_trylock(&console_sem) == 0) {
>>>>                  if (atomic_cmpxchg(&console_kthreads_active, 0, 
>>>> -1) == 0) {
>>>>                          /* console_lock acquired */
>>>>                  }
>>>>          }
>>>> }
>>>>
>>>> threaded_printer()
>>>> {
>>>>          mutex_lock(&con->lock);
>>>>          if (!con->blocked) {
>>>>         /* console_lock() callers blocked */
>>>>
>>>>                  if 
>>>> (atomic_inc_unless_negative(&console_kthreads_active)) {
>>>>                          /* console_trylock() callers blocked */
>>>>
>>>>                          con->write();
>>>>
>>>> atomic_dec(&console_lock_count);
>>>>                  }
>>>>          }
>>>>          mutex_unlock(&con->lock);
>>>> }
>>>>
>>>> The console owner and waiter logic now only applies between contexts
>>>> that have taken the console_lock via console_trylock(). Threaded
>>>> printers never take the console_lock, so they do not have a
>>>> console_lock to handover. Tasks that have used console_lock() will
>>>> block the threaded printers using a mutex and if the console_lock
>>>> is handed over to an atomic context, it would be unable to unblock
>>>> the threaded printers. However, the console_trylock() case is
>>>> really the only scenario that is interesting for handovers anyway.
>>>>
>>>> @panic_console_dropped must change to atomic_t since it is no longer
>>>> protected exclusively by the console_lock.
>>>>
>>>> Since threaded printers remain asleep if they see that the console
>>>> is locked, they now must be explicitly woken in __console_unlock().
>>>> This means wake_up_klogd() calls following a console_unlock() are
>>>> no longer necessary and are removed.
>>>>
>>>> Also note that threaded printers no longer need to check
>>>> @console_suspended. The check for the @blocked field implicitly
>>>> covers the suspended console case.
>>>>
>>>> Signed-off-by: John Ogness <john.ogness@linutronix.de>
>>> Nice, it it better than v4. I am going to push this for linux-next.
>>>
>>> Reviewed-by: Petr Mladek <pmladek@suse.com>
>> JFYI, I have just pushed this patch instead of the one
>> from v4 into printk/linux.git, branch rework/kthreads.
>>
>> It means that this branch has been rebased. It will be
>> used in the next refresh of linux-next.
>
> This patchset landed in linux next-20220426. In my tests I've found 
> that it causes deadlock on all my Amlogic Meson G12B/SM1 based boards: 
> Odroid C4/N2 and Khadas VIM3/VIM3l. The deadlock happens when system 
> boots to userspace and getty (with automated login) is executed. I 
> even see the bash prompt, but then the console is freezed. Reverting 
> this patch (e00cc0e1cbf4) on top of linux-next (together with 
> 6b3d71e87892 to make revert clean) fixes the issue.
>
The Amlogic Meson related issue has been investigated and fixed:

https://lore.kernel.org/all/b7c81f02-039e-e877-d7c3-6834728d2117@samsung.com/

but I just found that there is one more issue.


It appears on QCom-based DragonBoard 410c SBC 
(arch/arm64/boot/dts/qcom/apq8016-sbc.dts). To see it on today's linux 
next-20220506, one has to revert 
42cd402b8fd4672b692400fe5f9eecd55d2794ac, otherwise lockdep triggers 
other warning and it is turned off too early:

================================
WARNING: inconsistent lock state
5.18.0-rc5-next-20220506+ #11869 Not tainted
--------------------------------
inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
swapper/0/0 [HC1[1]:SC0[0]:HE0:SE1] takes:
ffff80000aaa8478 (&port_lock_key){?.+.}-{2:2}, at: msm_uart_irq+0x38/0x750
{HARDIRQ-ON-W} state was registered at:
   lock_acquire.part.0+0xe0/0x230
   lock_acquire+0x68/0x84
   _raw_spin_lock+0x5c/0x80
   __msm_console_write+0x1ac/0x220
   msm_console_write+0x48/0x60
   __console_emit_next_record+0x188/0x420
   printk_kthread_func+0x3a0/0x3bc
   kthread+0x118/0x11c
   ret_from_fork+0x10/0x20
irq event stamp: 12182
hardirqs last  enabled at (12181): [<ffff800008e3d2a8>] 
cpuidle_enter_state+0xc4/0x30c

stack backtrace:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.18.0-rc5-next-20220506+ #11869
Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT)
Call trace:
  dump_backtrace.part.0+0xd0/0xe0
  show_stack+0x18/0x6c
  dump_stack_lvl+0x8c/0xb8
  dump_stack+0x18/0x34
  print_usage_bug.part.0+0x208/0x22c
  mark_lock+0x710/0x954
  __lock_acquire+0x9fc/0x20cc
  lock_acquire.part.0+0xe0/0x230
  lock_acquire+0x68/0x84
  _raw_spin_lock_irqsave+0x80/0xcc
  msm_uart_irq+0x38/0x750
  __handle_irq_event_percpu+0xac/0x3d0
  handle_irq_event+0x4c/0x120
  handle_fasteoi_irq+0xa4/0x1a0
  generic_handle_domain_irq+0x3c/0x60
  gic_handle_irq+0x44/0xc4
  call_on_irq_stack+0x2c/0x54
  do_interrupt_handler+0x80/0x84
  el1_interrupt+0x34/0x64
  el1h_64_irq_handler+0x18/0x24
  el1h_64_irq+0x64/0x68
  cpuidle_enter_state+0xcc/0x30c
  cpuidle_enter+0x38/0x50
  do_idle+0x22c/0x2bc
  cpu_startup_entry+0x28/0x30
  rest_init+0x110/0x190
  arch_post_acpi_subsys_init+0x0/0x18
  start_kernel+0x6c4/0x704
  __primary_switched+0xc0/0xc8
  INIT: version 2.88 booting
[info] Using makefile-style concurrent boot in runlevel S.


Reverting the following patches on top of linux next-20220506 
fixes/hides this lockdep warning:

6b3d71e87892 ("printk: remove @console_locked")
8e274732115f ("printk: extend console_lock for per-console locking")
09c5ba0aa2fc ("printk: add kthread console printers")


Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
  2022-05-06 11:25               ` Marek Szyprowski
@ 2022-05-06 12:41                 ` John Ogness
  2022-05-06 13:04                   ` Marek Szyprowski
  0 siblings, 1 reply; 99+ messages in thread
From: John Ogness @ 2022-05-06 12:41 UTC (permalink / raw)
  To: Marek Szyprowski, Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman, linux-arm-msm, Bjorn Andersson,
	Andy Gross

On 2022-05-06, Marek Szyprowski <m.szyprowski@samsung.com> wrote:
> The Amlogic Meson related issue has been investigated and fixed:
>
> https://lore.kernel.org/all/b7c81f02-039e-e877-d7c3-6834728d2117@samsung.com/
>
> but I just found that there is one more issue.
>
> It appears on QCom-based DragonBoard 410c SBC 
> (arch/arm64/boot/dts/qcom/apq8016-sbc.dts). To see it on today's linux 
> next-20220506, one has to revert 
> 42cd402b8fd4672b692400fe5f9eecd55d2794ac, otherwise lockdep triggers 
> other warning and it is turned off too early:
>
> ================================
> WARNING: inconsistent lock state
> 5.18.0-rc5-next-20220506+ #11869 Not tainted
> --------------------------------
> inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
> swapper/0/0 [HC1[1]:SC0[0]:HE0:SE1] takes:
> ffff80000aaa8478 (&port_lock_key){?.+.}-{2:2}, at: msm_uart_irq+0x38/0x750
> {HARDIRQ-ON-W} state was registered at:
>    lock_acquire.part.0+0xe0/0x230
>    lock_acquire+0x68/0x84
>    _raw_spin_lock+0x5c/0x80
>    __msm_console_write+0x1ac/0x220
>    msm_console_write+0x48/0x60
>    __console_emit_next_record+0x188/0x420
>    printk_kthread_func+0x3a0/0x3bc
>    kthread+0x118/0x11c
>    ret_from_fork+0x10/0x20
> irq event stamp: 12182
> hardirqs last  enabled at (12181): [<ffff800008e3d2a8>] 
> cpuidle_enter_state+0xc4/0x30c
>
> stack backtrace:
> CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.18.0-rc5-next-20220506+ #11869
> Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT)
> Call trace:
>   dump_backtrace.part.0+0xd0/0xe0
>   show_stack+0x18/0x6c
>   dump_stack_lvl+0x8c/0xb8
>   dump_stack+0x18/0x34
>   print_usage_bug.part.0+0x208/0x22c
>   mark_lock+0x710/0x954
>   __lock_acquire+0x9fc/0x20cc
>   lock_acquire.part.0+0xe0/0x230
>   lock_acquire+0x68/0x84
>   _raw_spin_lock_irqsave+0x80/0xcc
>   msm_uart_irq+0x38/0x750
>   __handle_irq_event_percpu+0xac/0x3d0
>   handle_irq_event+0x4c/0x120
>   handle_fasteoi_irq+0xa4/0x1a0
>   generic_handle_domain_irq+0x3c/0x60
>   gic_handle_irq+0x44/0xc4
>   call_on_irq_stack+0x2c/0x54
>   do_interrupt_handler+0x80/0x84
>   el1_interrupt+0x34/0x64
>   el1h_64_irq_handler+0x18/0x24
>   el1h_64_irq+0x64/0x68
>   cpuidle_enter_state+0xcc/0x30c
>   cpuidle_enter+0x38/0x50
>   do_idle+0x22c/0x2bc
>   cpu_startup_entry+0x28/0x30
>   rest_init+0x110/0x190
>   arch_post_acpi_subsys_init+0x0/0x18
>   start_kernel+0x6c4/0x704
>   __primary_switched+0xc0/0xc8
>   INIT: version 2.88 booting
> [info] Using makefile-style concurrent boot in runlevel S.

The console write() callback for the msm driver (__msm_console_write)
assumes interrupts are off and is doing a spin_lock(&port->lock) rather
than spin_lock_irqsave(&port->lock, flags).

The following change should address the issue:

John

diff --git a/drivers/tty/serial/msm_serial.c b/drivers/tty/serial/msm_serial.c
index 23c94b927776..ab3f360bd354 100644
--- a/drivers/tty/serial/msm_serial.c
+++ b/drivers/tty/serial/msm_serial.c
@@ -1599,6 +1599,7 @@ static inline struct uart_port *msm_get_port_from_line(unsigned int line)
 static void __msm_console_write(struct uart_port *port, const char *s,
 				unsigned int count, bool is_uartdm)
 {
+	unsigned long flags;
 	int i;
 	int num_newlines = 0;
 	bool replaced = false;
@@ -1621,7 +1622,7 @@ static void __msm_console_write(struct uart_port *port, const char *s,
 	else if (oops_in_progress)
 		locked = spin_trylock(&port->lock);
 	else
-		spin_lock(&port->lock);
+		spin_lock_irqsave(&port->lock, flags);
 
 	if (is_uartdm)
 		msm_reset_dm_count(port, count);
@@ -1660,7 +1661,7 @@ static void __msm_console_write(struct uart_port *port, const char *s,
 	}
 
 	if (locked)
-		spin_unlock(&port->lock);
+		spin_unlock_irqrestore(&port->lock, flags);
 }
 
 static void msm_console_write(struct console *co, const char *s,

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
  2022-05-06 12:41                 ` John Ogness
@ 2022-05-06 13:04                   ` Marek Szyprowski
  0 siblings, 0 replies; 99+ messages in thread
From: Marek Szyprowski @ 2022-05-06 13:04 UTC (permalink / raw)
  To: John Ogness, Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman, linux-arm-msm, Bjorn Andersson,
	Andy Gross

Hi John,

On 06.05.2022 14:41, John Ogness wrote:
> On 2022-05-06, Marek Szyprowski <m.szyprowski@samsung.com> wrote:
>> The Amlogic Meson related issue has been investigated and fixed:
>>
>> https://lore.kernel.org/all/b7c81f02-039e-e877-d7c3-6834728d2117@samsung.com/
>>
>> but I just found that there is one more issue.
>>
>> It appears on QCom-based DragonBoard 410c SBC
>> (arch/arm64/boot/dts/qcom/apq8016-sbc.dts). To see it on today's linux
>> next-20220506, one has to revert
>> 42cd402b8fd4672b692400fe5f9eecd55d2794ac, otherwise lockdep triggers
>> other warning and it is turned off too early:
>>
>> ================================
>> WARNING: inconsistent lock state
>> 5.18.0-rc5-next-20220506+ #11869 Not tainted
>> --------------------------------
>> inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
>> swapper/0/0 [HC1[1]:SC0[0]:HE0:SE1] takes:
>> ffff80000aaa8478 (&port_lock_key){?.+.}-{2:2}, at: msm_uart_irq+0x38/0x750
>> {HARDIRQ-ON-W} state was registered at:
>>     lock_acquire.part.0+0xe0/0x230
>>     lock_acquire+0x68/0x84
>>     _raw_spin_lock+0x5c/0x80
>>     __msm_console_write+0x1ac/0x220
>>     msm_console_write+0x48/0x60
>>     __console_emit_next_record+0x188/0x420
>>     printk_kthread_func+0x3a0/0x3bc
>>     kthread+0x118/0x11c
>>     ret_from_fork+0x10/0x20
>> irq event stamp: 12182
>> hardirqs last  enabled at (12181): [<ffff800008e3d2a8>]
>> cpuidle_enter_state+0xc4/0x30c
>>
>> stack backtrace:
>> CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.18.0-rc5-next-20220506+ #11869
>> Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT)
>> Call trace:
>>    dump_backtrace.part.0+0xd0/0xe0
>>    show_stack+0x18/0x6c
>>    dump_stack_lvl+0x8c/0xb8
>>    dump_stack+0x18/0x34
>>    print_usage_bug.part.0+0x208/0x22c
>>    mark_lock+0x710/0x954
>>    __lock_acquire+0x9fc/0x20cc
>>    lock_acquire.part.0+0xe0/0x230
>>    lock_acquire+0x68/0x84
>>    _raw_spin_lock_irqsave+0x80/0xcc
>>    msm_uart_irq+0x38/0x750
>>    __handle_irq_event_percpu+0xac/0x3d0
>>    handle_irq_event+0x4c/0x120
>>    handle_fasteoi_irq+0xa4/0x1a0
>>    generic_handle_domain_irq+0x3c/0x60
>>    gic_handle_irq+0x44/0xc4
>>    call_on_irq_stack+0x2c/0x54
>>    do_interrupt_handler+0x80/0x84
>>    el1_interrupt+0x34/0x64
>>    el1h_64_irq_handler+0x18/0x24
>>    el1h_64_irq+0x64/0x68
>>    cpuidle_enter_state+0xcc/0x30c
>>    cpuidle_enter+0x38/0x50
>>    do_idle+0x22c/0x2bc
>>    cpu_startup_entry+0x28/0x30
>>    rest_init+0x110/0x190
>>    arch_post_acpi_subsys_init+0x0/0x18
>>    start_kernel+0x6c4/0x704
>>    __primary_switched+0xc0/0xc8
>>    INIT: version 2.88 booting
>> [info] Using makefile-style concurrent boot in runlevel S.
> The console write() callback for the msm driver (__msm_console_write)
> assumes interrupts are off and is doing a spin_lock(&port->lock) rather
> than spin_lock_irqsave(&port->lock, flags).
>
> The following change should address the issue:

Right, this help. spin_trylock() should be also converted imho, see below.

Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>

> John
>
> diff --git a/drivers/tty/serial/msm_serial.c b/drivers/tty/serial/msm_serial.c
> index 23c94b927776..ab3f360bd354 100644
> --- a/drivers/tty/serial/msm_serial.c
> +++ b/drivers/tty/serial/msm_serial.c
> @@ -1599,6 +1599,7 @@ static inline struct uart_port *msm_get_port_from_line(unsigned int line)
>   static void __msm_console_write(struct uart_port *port, const char *s,
>   				unsigned int count, bool is_uartdm)
>   {
> +	unsigned long flags;
>   	int i;
>   	int num_newlines = 0;
>   	bool replaced = false;
> @@ -1621,7 +1622,7 @@ static void __msm_console_write(struct uart_port *port, const char *s,
>   	else if (oops_in_progress)
>   		locked = spin_trylock(&port->lock);

locked = spin_trylock_irqsave(&port->lock, flags);

>   	else
> -		spin_lock(&port->lock);
> +		spin_lock_irqsave(&port->lock, flags);
>   
>   	if (is_uartdm)
>   		msm_reset_dm_count(port, count);
> @@ -1660,7 +1661,7 @@ static void __msm_console_write(struct uart_port *port, const char *s,
>   	}
>   
>   	if (locked)
> -		spin_unlock(&port->lock);
> +		spin_unlock_irqrestore(&port->lock, flags);
>   }
>   
>   static void msm_console_write(struct console *co, const char *s,

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
  2022-05-06  7:55                                       ` Neil Armstrong
@ 2022-05-08 11:02                                         ` John Ogness
  -1 siblings, 0 replies; 99+ messages in thread
From: John Ogness @ 2022-05-08 11:02 UTC (permalink / raw)
  To: Neil Armstrong, Marek Szyprowski, Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman, linux-amlogic

Hi Neil,

On 2022-05-06, Neil Armstrong <narmstrong@baylibre.com> wrote:
> Thanks all for figuring out the issue, perhaps other uart drivers
> could fall in the same issue if startup code isn't protected with
> lock?

When preparing for the official patch submission [0], I needed quite a
bit of time to understand why another function (meson_uart_reset) should
not and cannot acquire the port->lock.

I then started investigating some other drivers and indeed I see lots of
potential problems. Any console initializing port->lock from the
driver's probe() is probably wrong (and there are lots of them). But as
I've learned with the meson driver, the details are subtle. Each driver
will need to be carefully evaluated to see if it is actually safe.

uart_ops->startup() is called without holding port->lock. If the device
is a console, it is already registered and printing.

driver->probe() is called without holding port->lock. If the device is a
console, it is already registered and printing.

For both functions, port->lock might not be initialized yet, so blindly
acquiring it is wrong.

Note that this is not related to the introduction of kthread printing.

I've put it on my TODO list to go through the ~76 console drivers to
investigate their startup() and probe() implementations. But I will not
be able to do this quickly. My time might be better spent writing to all
the maintainers asking them to please verify the usage.

John Ogness

[0] https://lore.kernel.org/lkml/20220508103547.626355-1-john.ogness@linutronix.de

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
@ 2022-05-08 11:02                                         ` John Ogness
  0 siblings, 0 replies; 99+ messages in thread
From: John Ogness @ 2022-05-08 11:02 UTC (permalink / raw)
  To: Neil Armstrong, Marek Szyprowski, Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman, linux-amlogic

Hi Neil,

On 2022-05-06, Neil Armstrong <narmstrong@baylibre.com> wrote:
> Thanks all for figuring out the issue, perhaps other uart drivers
> could fall in the same issue if startup code isn't protected with
> lock?

When preparing for the official patch submission [0], I needed quite a
bit of time to understand why another function (meson_uart_reset) should
not and cannot acquire the port->lock.

I then started investigating some other drivers and indeed I see lots of
potential problems. Any console initializing port->lock from the
driver's probe() is probably wrong (and there are lots of them). But as
I've learned with the meson driver, the details are subtle. Each driver
will need to be carefully evaluated to see if it is actually safe.

uart_ops->startup() is called without holding port->lock. If the device
is a console, it is already registered and printing.

driver->probe() is called without holding port->lock. If the device is a
console, it is already registered and printing.

For both functions, port->lock might not be initialized yet, so blindly
acquiring it is wrong.

Note that this is not related to the introduction of kthread printing.

I've put it on my TODO list to go through the ~76 console drivers to
investigate their startup() and probe() implementations. But I will not
be able to do this quickly. My time might be better spent writing to all
the maintainers asking them to please verify the usage.

John Ogness

[0] https://lore.kernel.org/lkml/20220508103547.626355-1-john.ogness@linutronix.de

_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
  2022-05-02 13:11                           ` John Ogness
@ 2022-06-08 15:10                             ` Geert Uytterhoeven
  -1 siblings, 0 replies; 99+ messages in thread
From: Geert Uytterhoeven @ 2022-06-08 15:10 UTC (permalink / raw)
  To: John Ogness
  Cc: Marek Szyprowski, Petr Mladek, Sergey Senozhatsky,
	Steven Rostedt, Thomas Gleixner, Linux Kernel Mailing List,
	Greg Kroah-Hartman, open list:ARM/Amlogic Meson...

Hi John,

On Mon, May 2, 2022 at 3:19 PM John Ogness <john.ogness@linutronix.de> wrote:
> On 2022-05-02, Marek Szyprowski <m.szyprowski@samsung.com> wrote:
> > Data written to /dev/kmsg and all kernel logs were always displayed
> > correctly. Also data written directly to /dev/ttyAML0 is displayed
> > properly on the console. The latter doesn't however trigger the input
> > related activity.
> >
> > It looks that the data read from the uart is delivered only if other
> > activity happens on the kernel console. If I type 'reboot' and press
> > enter, nothing happens immediately. If I type 'date >/dev/ttyAML0' via
> > ssh then, I only see the date printed on the console. However if I
> > type 'date >/dev/kmsg', the the date is printed and reboot happens.
>
> I suppose if you login via ssh and check /proc/interrupts, then type
> some things over serial, then check /proc/interrupts again, you will see
> there have been no interrupts for the uart. But interrupts for other
> devices are happening. Is this correct?
>
> > For comparison, here is a 't' sysrq result from the 'working' serial
> > console (next-20220429), which happens usually 1 of 4 boots:
> >
> > https://pastebin.com/mp8zGFbW
>
> This still looks odd to me. We should be seeing a trace originating from
> ret_from_fork+0x10/0x20 and kthread+0x118/0x11c.
>
> I wonder if the early creation of the thread is somehow causing
> problems. Could you try the following patch to see if it makes a
> difference? I would also like to see the sysrq-t output with this patch
> applied:

On one board, I'm seeing a new splat during early boot, pointing to
printk_activate_kthreads:

    Calibrating delay loop (skipped), value calculated using timer
frequency.. 48.00 BogoMIPS (lpj=96000)
    pid_max: default: 32768 minimum: 301
    Mount-cache hash table entries: 4096 (order: 3, 32768 bytes, linear)
    Mountpoint-cache hash table entries: 4096 (order: 3, 32768 bytes, linear)

    =============================
    [ BUG: Invalid wait context ]
    5.19.0-rc1-ebisu-00802-g06a0dd60d6e4 #431 Not tainted
    -----------------------------
    swapper/0/1 is trying to lock:
    ffffffc00910bac8 (base_crng.lock){....}-{3:3}, at:
crng_make_state+0x148/0x1e4
    other info that might help us debug this:
    context-{5:5}
    2 locks held by swapper/0/1:
     #0: ffffffc008f8ae00 (console_lock){+.+.}-{0:0}, at:
printk_activate_kthreads+0x10/0x54
     #1: ffffffc009da4a28 (&meta->lock){....}-{2:2}, at:
__kfence_alloc+0x378/0x5c4
    stack backtrace:
    CPU: 0 PID: 1 Comm: swapper/0 Not tainted
5.19.0-rc1-ebisu-00802-g06a0dd60d6e4 #431
    Hardware name: Renesas Ebisu-4D board based on r8a77990 (DT)
    Call trace:
     dump_backtrace.part.0+0x98/0xc0
     show_stack+0x14/0x28
     dump_stack_lvl+0xac/0xec
     dump_stack+0x14/0x2c
     __lock_acquire+0x388/0x10a0
     lock_acquire+0x190/0x2c0
     _raw_spin_lock_irqsave+0x6c/0x94
     crng_make_state+0x148/0x1e4
     _get_random_bytes.part.0+0x4c/0xe8
     get_random_u32+0x4c/0x140
     __kfence_alloc+0x460/0x5c4
     kmem_cache_alloc_trace+0x194/0x1dc
     __kthread_create_on_node+0x5c/0x1a8
     kthread_create_on_node+0x58/0x7c
     printk_start_kthread.part.0+0x34/0xa8
     printk_activate_kthreads+0x4c/0x54
     do_one_initcall+0xec/0x278
     kernel_init_freeable+0x11c/0x214
     kernel_init+0x24/0x124
     ret_from_fork+0x10/0x20
    rcu: Hierarchical SRCU implementation.
    printk: console [tty0] printing thread started
    EFI services will not be available.
    smp: Bringing up secondary CPUs ...
    Detected VIPT I-cache on CPU1
    CPU1: Booted secondary processor 0x0000000001 [0x410fd034]
    smp: Brought up 1 node, 2 CPUs
    SMP: Total of 2 processors activated.

> ---------------- BEGIN PATCH ---------------
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 2311a0ad584a..c4362d25de22 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -3837,7 +3837,7 @@ static int __init printk_activate_kthreads(void)
>
>         return 0;
>  }
> -early_initcall(printk_activate_kthreads);
> +late_initcall(printk_activate_kthreads);
>
>  #if defined CONFIG_PRINTK
>  /* If @con is specified, only wait for that console. Otherwise wait for all. */
> ---------------- END PATCH ---------------

Doesn't seem to make much of a difference, only a slightly different
backtrace, compared to the above:

     Mount-cache hash table entries: 4096 (order: 3, 32768 bytes, linear)
     Mountpoint-cache hash table entries: 4096 (order: 3, 32768 bytes, linear)
    +rcu: Hierarchical SRCU implementation.

     =============================
     [ BUG: Invalid wait context ]
    -5.19.0-rc1-ebisu-00802-g06a0dd60d6e4 #431 Not tainted
    +5.19.0-rc1-ebisu-00802-g06a0dd60d6e4-dirty #433 Not tainted
     -----------------------------
     swapper/0/1 is trying to lock:
     ffffffc00910bac8 (base_crng.lock){....}-{3:3}, at:
crng_make_state+0x148/0x1e4
     other info that might help us debug this:
     context-{5:5}
    -2 locks held by swapper/0/1:
    - #0: ffffffc008f8ae00 (console_lock){+.+.}-{0:0}, at:
printk_activate_kthreads+0x10/0x54
    - #1: ffffffc009da4a28 (&meta->lock){....}-{2:2}, at:
__kfence_alloc+0x378/0x5c4
    +1 lock held by swapper/0/1:
    + #0: ffffffc009da4a28 (&meta->lock){....}-{2:2}, at:
__kfence_alloc+0x378/0x5c4
     stack backtrace:
    -CPU: 0 PID: 1 Comm: swapper/0 Not tainted
5.19.0-rc1-ebisu-00802-g06a0dd60d6e4 #431
    +CPU: 0 PID: 1 Comm: swapper/0 Not tainted
5.19.0-rc1-ebisu-00802-g06a0dd60d6e4-dirty #433
     Hardware name: Renesas Ebisu-4D board based on r8a77990 (DT)
     Call trace:
      dump_backtrace.part.0+0x98/0xc0
    @@ -33,20 +32,14 @@ Call trace:
      kmem_cache_alloc_trace+0x194/0x1dc
      __kthread_create_on_node+0x5c/0x1a8
      kthread_create_on_node+0x58/0x7c
    - printk_start_kthread.part.0+0x34/0xa8
    - printk_activate_kthreads+0x4c/0x54
    + rcu_spawn_gp_kthread+0x54/0x208
      do_one_initcall+0xec/0x278
      kernel_init_freeable+0x11c/0x214
      kernel_init+0x24/0x124
      ret_from_fork+0x10/0x20
    -rcu: Hierarchical SRCU implementation.
    -printk: console [tty0] printing thread started
     EFI services will not be available.
     smp: Bringing up secondary CPUs ...
     Detected VIPT I-cache on CPU1
     ...
    +printk: console [tty0] printing thread started

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
@ 2022-06-08 15:10                             ` Geert Uytterhoeven
  0 siblings, 0 replies; 99+ messages in thread
From: Geert Uytterhoeven @ 2022-06-08 15:10 UTC (permalink / raw)
  To: John Ogness
  Cc: Marek Szyprowski, Petr Mladek, Sergey Senozhatsky,
	Steven Rostedt, Thomas Gleixner, Linux Kernel Mailing List,
	Greg Kroah-Hartman, open list:ARM/Amlogic Meson...

Hi John,

On Mon, May 2, 2022 at 3:19 PM John Ogness <john.ogness@linutronix.de> wrote:
> On 2022-05-02, Marek Szyprowski <m.szyprowski@samsung.com> wrote:
> > Data written to /dev/kmsg and all kernel logs were always displayed
> > correctly. Also data written directly to /dev/ttyAML0 is displayed
> > properly on the console. The latter doesn't however trigger the input
> > related activity.
> >
> > It looks that the data read from the uart is delivered only if other
> > activity happens on the kernel console. If I type 'reboot' and press
> > enter, nothing happens immediately. If I type 'date >/dev/ttyAML0' via
> > ssh then, I only see the date printed on the console. However if I
> > type 'date >/dev/kmsg', the the date is printed and reboot happens.
>
> I suppose if you login via ssh and check /proc/interrupts, then type
> some things over serial, then check /proc/interrupts again, you will see
> there have been no interrupts for the uart. But interrupts for other
> devices are happening. Is this correct?
>
> > For comparison, here is a 't' sysrq result from the 'working' serial
> > console (next-20220429), which happens usually 1 of 4 boots:
> >
> > https://pastebin.com/mp8zGFbW
>
> This still looks odd to me. We should be seeing a trace originating from
> ret_from_fork+0x10/0x20 and kthread+0x118/0x11c.
>
> I wonder if the early creation of the thread is somehow causing
> problems. Could you try the following patch to see if it makes a
> difference? I would also like to see the sysrq-t output with this patch
> applied:

On one board, I'm seeing a new splat during early boot, pointing to
printk_activate_kthreads:

    Calibrating delay loop (skipped), value calculated using timer
frequency.. 48.00 BogoMIPS (lpj=96000)
    pid_max: default: 32768 minimum: 301
    Mount-cache hash table entries: 4096 (order: 3, 32768 bytes, linear)
    Mountpoint-cache hash table entries: 4096 (order: 3, 32768 bytes, linear)

    =============================
    [ BUG: Invalid wait context ]
    5.19.0-rc1-ebisu-00802-g06a0dd60d6e4 #431 Not tainted
    -----------------------------
    swapper/0/1 is trying to lock:
    ffffffc00910bac8 (base_crng.lock){....}-{3:3}, at:
crng_make_state+0x148/0x1e4
    other info that might help us debug this:
    context-{5:5}
    2 locks held by swapper/0/1:
     #0: ffffffc008f8ae00 (console_lock){+.+.}-{0:0}, at:
printk_activate_kthreads+0x10/0x54
     #1: ffffffc009da4a28 (&meta->lock){....}-{2:2}, at:
__kfence_alloc+0x378/0x5c4
    stack backtrace:
    CPU: 0 PID: 1 Comm: swapper/0 Not tainted
5.19.0-rc1-ebisu-00802-g06a0dd60d6e4 #431
    Hardware name: Renesas Ebisu-4D board based on r8a77990 (DT)
    Call trace:
     dump_backtrace.part.0+0x98/0xc0
     show_stack+0x14/0x28
     dump_stack_lvl+0xac/0xec
     dump_stack+0x14/0x2c
     __lock_acquire+0x388/0x10a0
     lock_acquire+0x190/0x2c0
     _raw_spin_lock_irqsave+0x6c/0x94
     crng_make_state+0x148/0x1e4
     _get_random_bytes.part.0+0x4c/0xe8
     get_random_u32+0x4c/0x140
     __kfence_alloc+0x460/0x5c4
     kmem_cache_alloc_trace+0x194/0x1dc
     __kthread_create_on_node+0x5c/0x1a8
     kthread_create_on_node+0x58/0x7c
     printk_start_kthread.part.0+0x34/0xa8
     printk_activate_kthreads+0x4c/0x54
     do_one_initcall+0xec/0x278
     kernel_init_freeable+0x11c/0x214
     kernel_init+0x24/0x124
     ret_from_fork+0x10/0x20
    rcu: Hierarchical SRCU implementation.
    printk: console [tty0] printing thread started
    EFI services will not be available.
    smp: Bringing up secondary CPUs ...
    Detected VIPT I-cache on CPU1
    CPU1: Booted secondary processor 0x0000000001 [0x410fd034]
    smp: Brought up 1 node, 2 CPUs
    SMP: Total of 2 processors activated.

> ---------------- BEGIN PATCH ---------------
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 2311a0ad584a..c4362d25de22 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -3837,7 +3837,7 @@ static int __init printk_activate_kthreads(void)
>
>         return 0;
>  }
> -early_initcall(printk_activate_kthreads);
> +late_initcall(printk_activate_kthreads);
>
>  #if defined CONFIG_PRINTK
>  /* If @con is specified, only wait for that console. Otherwise wait for all. */
> ---------------- END PATCH ---------------

Doesn't seem to make much of a difference, only a slightly different
backtrace, compared to the above:

     Mount-cache hash table entries: 4096 (order: 3, 32768 bytes, linear)
     Mountpoint-cache hash table entries: 4096 (order: 3, 32768 bytes, linear)
    +rcu: Hierarchical SRCU implementation.

     =============================
     [ BUG: Invalid wait context ]
    -5.19.0-rc1-ebisu-00802-g06a0dd60d6e4 #431 Not tainted
    +5.19.0-rc1-ebisu-00802-g06a0dd60d6e4-dirty #433 Not tainted
     -----------------------------
     swapper/0/1 is trying to lock:
     ffffffc00910bac8 (base_crng.lock){....}-{3:3}, at:
crng_make_state+0x148/0x1e4
     other info that might help us debug this:
     context-{5:5}
    -2 locks held by swapper/0/1:
    - #0: ffffffc008f8ae00 (console_lock){+.+.}-{0:0}, at:
printk_activate_kthreads+0x10/0x54
    - #1: ffffffc009da4a28 (&meta->lock){....}-{2:2}, at:
__kfence_alloc+0x378/0x5c4
    +1 lock held by swapper/0/1:
    + #0: ffffffc009da4a28 (&meta->lock){....}-{2:2}, at:
__kfence_alloc+0x378/0x5c4
     stack backtrace:
    -CPU: 0 PID: 1 Comm: swapper/0 Not tainted
5.19.0-rc1-ebisu-00802-g06a0dd60d6e4 #431
    +CPU: 0 PID: 1 Comm: swapper/0 Not tainted
5.19.0-rc1-ebisu-00802-g06a0dd60d6e4-dirty #433
     Hardware name: Renesas Ebisu-4D board based on r8a77990 (DT)
     Call trace:
      dump_backtrace.part.0+0x98/0xc0
    @@ -33,20 +32,14 @@ Call trace:
      kmem_cache_alloc_trace+0x194/0x1dc
      __kthread_create_on_node+0x5c/0x1a8
      kthread_create_on_node+0x58/0x7c
    - printk_start_kthread.part.0+0x34/0xa8
    - printk_activate_kthreads+0x4c/0x54
    + rcu_spawn_gp_kthread+0x54/0x208
      do_one_initcall+0xec/0x278
      kernel_init_freeable+0x11c/0x214
      kernel_init+0x24/0x124
      ret_from_fork+0x10/0x20
    -rcu: Hierarchical SRCU implementation.
    -printk: console [tty0] printing thread started
     EFI services will not be available.
     smp: Bringing up secondary CPUs ...
     Detected VIPT I-cache on CPU1
     ...
    +printk: console [tty0] printing thread started

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
  2022-06-08 15:10                             ` Geert Uytterhoeven
@ 2022-06-09 11:19                               ` John Ogness
  -1 siblings, 0 replies; 99+ messages in thread
From: John Ogness @ 2022-06-09 11:19 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Marek Szyprowski, Petr Mladek, Sergey Senozhatsky,
	Steven Rostedt, Thomas Gleixner, Linux Kernel Mailing List,
	Greg Kroah-Hartman, open list:ARM/Amlogic Meson...,
	Theodore Ts'o, Jason A. Donenfeld, Alexander Potapenko,
	Marco Elver, kasan-dev

(Added RANDOM NUMBER DRIVER and KFENCE people.)

Hi Geert,

On 2022-06-08, Geert Uytterhoeven <geert@linux-m68k.org> wrote:
>     =============================
>     [ BUG: Invalid wait context ]
>     5.19.0-rc1-ebisu-00802-g06a0dd60d6e4 #431 Not tainted
>     -----------------------------
>     swapper/0/1 is trying to lock:
>     ffffffc00910bac8 (base_crng.lock){....}-{3:3}, at:
> crng_make_state+0x148/0x1e4
>     other info that might help us debug this:
>     context-{5:5}
>     2 locks held by swapper/0/1:
>      #0: ffffffc008f8ae00 (console_lock){+.+.}-{0:0}, at:
> printk_activate_kthreads+0x10/0x54
>      #1: ffffffc009da4a28 (&meta->lock){....}-{2:2}, at:
> __kfence_alloc+0x378/0x5c4
>     stack backtrace:
>     CPU: 0 PID: 1 Comm: swapper/0 Not tainted
> 5.19.0-rc1-ebisu-00802-g06a0dd60d6e4 #431
>     Hardware name: Renesas Ebisu-4D board based on r8a77990 (DT)
>     Call trace:
>      dump_backtrace.part.0+0x98/0xc0
>      show_stack+0x14/0x28
>      dump_stack_lvl+0xac/0xec
>      dump_stack+0x14/0x2c
>      __lock_acquire+0x388/0x10a0
>      lock_acquire+0x190/0x2c0
>      _raw_spin_lock_irqsave+0x6c/0x94
>      crng_make_state+0x148/0x1e4
>      _get_random_bytes.part.0+0x4c/0xe8
>      get_random_u32+0x4c/0x140
>      __kfence_alloc+0x460/0x5c4
>      kmem_cache_alloc_trace+0x194/0x1dc
>      __kthread_create_on_node+0x5c/0x1a8
>      kthread_create_on_node+0x58/0x7c
>      printk_start_kthread.part.0+0x34/0xa8
>      printk_activate_kthreads+0x4c/0x54
>      do_one_initcall+0xec/0x278
>      kernel_init_freeable+0x11c/0x214
>      kernel_init+0x24/0x124
>      ret_from_fork+0x10/0x20

I am guessing you have CONFIG_PROVE_RAW_LOCK_NESTING enabled?

We are seeing a spinlock (base_crng.lock) taken while holding a
raw_spinlock (meta->lock).

kfence_guarded_alloc()
  raw_spin_trylock_irqsave(&meta->lock, flags)
    prandom_u32_max()
      prandom_u32()
        get_random_u32()
          get_random_bytes()
            _get_random_bytes()
              crng_make_state()
                spin_lock_irqsave(&base_crng.lock, flags);

I expect it is allowed to create kthreads via kthread_run() in
early_initcalls.

John Ogness

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
@ 2022-06-09 11:19                               ` John Ogness
  0 siblings, 0 replies; 99+ messages in thread
From: John Ogness @ 2022-06-09 11:19 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Marek Szyprowski, Petr Mladek, Sergey Senozhatsky,
	Steven Rostedt, Thomas Gleixner, Linux Kernel Mailing List,
	Greg Kroah-Hartman, open list:ARM/Amlogic Meson...,
	Theodore Ts'o, Jason A. Donenfeld, Alexander Potapenko,
	Marco Elver, kasan-dev

(Added RANDOM NUMBER DRIVER and KFENCE people.)

Hi Geert,

On 2022-06-08, Geert Uytterhoeven <geert@linux-m68k.org> wrote:
>     =============================
>     [ BUG: Invalid wait context ]
>     5.19.0-rc1-ebisu-00802-g06a0dd60d6e4 #431 Not tainted
>     -----------------------------
>     swapper/0/1 is trying to lock:
>     ffffffc00910bac8 (base_crng.lock){....}-{3:3}, at:
> crng_make_state+0x148/0x1e4
>     other info that might help us debug this:
>     context-{5:5}
>     2 locks held by swapper/0/1:
>      #0: ffffffc008f8ae00 (console_lock){+.+.}-{0:0}, at:
> printk_activate_kthreads+0x10/0x54
>      #1: ffffffc009da4a28 (&meta->lock){....}-{2:2}, at:
> __kfence_alloc+0x378/0x5c4
>     stack backtrace:
>     CPU: 0 PID: 1 Comm: swapper/0 Not tainted
> 5.19.0-rc1-ebisu-00802-g06a0dd60d6e4 #431
>     Hardware name: Renesas Ebisu-4D board based on r8a77990 (DT)
>     Call trace:
>      dump_backtrace.part.0+0x98/0xc0
>      show_stack+0x14/0x28
>      dump_stack_lvl+0xac/0xec
>      dump_stack+0x14/0x2c
>      __lock_acquire+0x388/0x10a0
>      lock_acquire+0x190/0x2c0
>      _raw_spin_lock_irqsave+0x6c/0x94
>      crng_make_state+0x148/0x1e4
>      _get_random_bytes.part.0+0x4c/0xe8
>      get_random_u32+0x4c/0x140
>      __kfence_alloc+0x460/0x5c4
>      kmem_cache_alloc_trace+0x194/0x1dc
>      __kthread_create_on_node+0x5c/0x1a8
>      kthread_create_on_node+0x58/0x7c
>      printk_start_kthread.part.0+0x34/0xa8
>      printk_activate_kthreads+0x4c/0x54
>      do_one_initcall+0xec/0x278
>      kernel_init_freeable+0x11c/0x214
>      kernel_init+0x24/0x124
>      ret_from_fork+0x10/0x20

I am guessing you have CONFIG_PROVE_RAW_LOCK_NESTING enabled?

We are seeing a spinlock (base_crng.lock) taken while holding a
raw_spinlock (meta->lock).

kfence_guarded_alloc()
  raw_spin_trylock_irqsave(&meta->lock, flags)
    prandom_u32_max()
      prandom_u32()
        get_random_u32()
          get_random_bytes()
            _get_random_bytes()
              crng_make_state()
                spin_lock_irqsave(&base_crng.lock, flags);

I expect it is allowed to create kthreads via kthread_run() in
early_initcalls.

John Ogness

_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
  2022-06-09 11:19                               ` John Ogness
@ 2022-06-09 11:58                                 ` Jason A. Donenfeld
  -1 siblings, 0 replies; 99+ messages in thread
From: Jason A. Donenfeld @ 2022-06-09 11:58 UTC (permalink / raw)
  To: John Ogness
  Cc: Geert Uytterhoeven, Marek Szyprowski, Petr Mladek,
	Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	Linux Kernel Mailing List, Greg Kroah-Hartman,
	open list:ARM/Amlogic Meson...,
	Theodore Ts'o, Alexander Potapenko, Marco Elver, kasan-dev

Hi John,

On Thu, Jun 09, 2022 at 01:25:15PM +0206, John Ogness wrote:
> (Added RANDOM NUMBER DRIVER and KFENCE people.)

Thanks.

> I am guessing you have CONFIG_PROVE_RAW_LOCK_NESTING enabled?
> 
> We are seeing a spinlock (base_crng.lock) taken while holding a
> raw_spinlock (meta->lock).
> 
> kfence_guarded_alloc()
>   raw_spin_trylock_irqsave(&meta->lock, flags)
>     prandom_u32_max()
>       prandom_u32()
>         get_random_u32()
>           get_random_bytes()
>             _get_random_bytes()
>               crng_make_state()
>                 spin_lock_irqsave(&base_crng.lock, flags);
> 
> I expect it is allowed to create kthreads via kthread_run() in
> early_initcalls.

AFAIK, CONFIG_PROVE_RAW_LOCK_NESTING is useful for teasing out cases
where RT's raw spinlocks will nest wrong with RT's sleeping spinlocks.
But nobody who wants an RT kernel will be using KFENCE. So this seems
like a non-issue? Maybe just add a `depends on !KFENCE` to
PROVE_RAW_LOCK_NESTING?

Jason

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
@ 2022-06-09 11:58                                 ` Jason A. Donenfeld
  0 siblings, 0 replies; 99+ messages in thread
From: Jason A. Donenfeld @ 2022-06-09 11:58 UTC (permalink / raw)
  To: John Ogness
  Cc: Geert Uytterhoeven, Marek Szyprowski, Petr Mladek,
	Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	Linux Kernel Mailing List, Greg Kroah-Hartman,
	open list:ARM/Amlogic Meson...,
	Theodore Ts'o, Alexander Potapenko, Marco Elver, kasan-dev

Hi John,

On Thu, Jun 09, 2022 at 01:25:15PM +0206, John Ogness wrote:
> (Added RANDOM NUMBER DRIVER and KFENCE people.)

Thanks.

> I am guessing you have CONFIG_PROVE_RAW_LOCK_NESTING enabled?
> 
> We are seeing a spinlock (base_crng.lock) taken while holding a
> raw_spinlock (meta->lock).
> 
> kfence_guarded_alloc()
>   raw_spin_trylock_irqsave(&meta->lock, flags)
>     prandom_u32_max()
>       prandom_u32()
>         get_random_u32()
>           get_random_bytes()
>             _get_random_bytes()
>               crng_make_state()
>                 spin_lock_irqsave(&base_crng.lock, flags);
> 
> I expect it is allowed to create kthreads via kthread_run() in
> early_initcalls.

AFAIK, CONFIG_PROVE_RAW_LOCK_NESTING is useful for teasing out cases
where RT's raw spinlocks will nest wrong with RT's sleeping spinlocks.
But nobody who wants an RT kernel will be using KFENCE. So this seems
like a non-issue? Maybe just add a `depends on !KFENCE` to
PROVE_RAW_LOCK_NESTING?

Jason

_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
  2022-06-09 11:58                                 ` Jason A. Donenfeld
@ 2022-06-09 12:18                                   ` Dmitry Vyukov
  -1 siblings, 0 replies; 99+ messages in thread
From: Dmitry Vyukov @ 2022-06-09 12:18 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: John Ogness, Geert Uytterhoeven, Marek Szyprowski, Petr Mladek,
	Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	Linux Kernel Mailing List, Greg Kroah-Hartman,
	open list:ARM/Amlogic Meson...,
	Theodore Ts'o, Alexander Potapenko, Marco Elver, kasan-dev

On Thu, 9 Jun 2022 at 13:59, Jason A. Donenfeld <Jason@zx2c4.com> wrote:
>
> Hi John,
>
> On Thu, Jun 09, 2022 at 01:25:15PM +0206, John Ogness wrote:
> > (Added RANDOM NUMBER DRIVER and KFENCE people.)
>
> Thanks.
>
> > I am guessing you have CONFIG_PROVE_RAW_LOCK_NESTING enabled?
> >
> > We are seeing a spinlock (base_crng.lock) taken while holding a
> > raw_spinlock (meta->lock).
> >
> > kfence_guarded_alloc()
> >   raw_spin_trylock_irqsave(&meta->lock, flags)
> >     prandom_u32_max()
> >       prandom_u32()
> >         get_random_u32()
> >           get_random_bytes()
> >             _get_random_bytes()
> >               crng_make_state()
> >                 spin_lock_irqsave(&base_crng.lock, flags);
> >
> > I expect it is allowed to create kthreads via kthread_run() in
> > early_initcalls.
>
> AFAIK, CONFIG_PROVE_RAW_LOCK_NESTING is useful for teasing out cases
> where RT's raw spinlocks will nest wrong with RT's sleeping spinlocks.
> But nobody who wants an RT kernel will be using KFENCE. So this seems
> like a non-issue? Maybe just add a `depends on !KFENCE` to
> PROVE_RAW_LOCK_NESTING?

Don't know if there are other good solutions (of similar simplicity).
But fwiw this is not about the target production environment. Real
production uses of RT kernels will probably not enable LOCKDEP,
PROVE_RAW_LOCK_NESTING and other debugging configs.
This is about detecting as many bugs as possible in testing
environments. And testing environments can well have both LOCKDEP and
KFENCE enabled. Any such limitation will require doubling the number
of tested configurations.

Btw, should this new CONFIG_PROVE_RAW_LOCK_NESTING be generally
enabled on testing systems? We don't have it enabled on syzbot.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
@ 2022-06-09 12:18                                   ` Dmitry Vyukov
  0 siblings, 0 replies; 99+ messages in thread
From: Dmitry Vyukov @ 2022-06-09 12:18 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: John Ogness, Geert Uytterhoeven, Marek Szyprowski, Petr Mladek,
	Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	Linux Kernel Mailing List, Greg Kroah-Hartman,
	open list:ARM/Amlogic Meson...,
	Theodore Ts'o, Alexander Potapenko, Marco Elver, kasan-dev

On Thu, 9 Jun 2022 at 13:59, Jason A. Donenfeld <Jason@zx2c4.com> wrote:
>
> Hi John,
>
> On Thu, Jun 09, 2022 at 01:25:15PM +0206, John Ogness wrote:
> > (Added RANDOM NUMBER DRIVER and KFENCE people.)
>
> Thanks.
>
> > I am guessing you have CONFIG_PROVE_RAW_LOCK_NESTING enabled?
> >
> > We are seeing a spinlock (base_crng.lock) taken while holding a
> > raw_spinlock (meta->lock).
> >
> > kfence_guarded_alloc()
> >   raw_spin_trylock_irqsave(&meta->lock, flags)
> >     prandom_u32_max()
> >       prandom_u32()
> >         get_random_u32()
> >           get_random_bytes()
> >             _get_random_bytes()
> >               crng_make_state()
> >                 spin_lock_irqsave(&base_crng.lock, flags);
> >
> > I expect it is allowed to create kthreads via kthread_run() in
> > early_initcalls.
>
> AFAIK, CONFIG_PROVE_RAW_LOCK_NESTING is useful for teasing out cases
> where RT's raw spinlocks will nest wrong with RT's sleeping spinlocks.
> But nobody who wants an RT kernel will be using KFENCE. So this seems
> like a non-issue? Maybe just add a `depends on !KFENCE` to
> PROVE_RAW_LOCK_NESTING?

Don't know if there are other good solutions (of similar simplicity).
But fwiw this is not about the target production environment. Real
production uses of RT kernels will probably not enable LOCKDEP,
PROVE_RAW_LOCK_NESTING and other debugging configs.
This is about detecting as many bugs as possible in testing
environments. And testing environments can well have both LOCKDEP and
KFENCE enabled. Any such limitation will require doubling the number
of tested configurations.

Btw, should this new CONFIG_PROVE_RAW_LOCK_NESTING be generally
enabled on testing systems? We don't have it enabled on syzbot.

_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
  2022-06-09 11:58                                 ` Jason A. Donenfeld
@ 2022-06-09 12:18                                   ` Jason A. Donenfeld
  -1 siblings, 0 replies; 99+ messages in thread
From: Jason A. Donenfeld @ 2022-06-09 12:18 UTC (permalink / raw)
  To: John Ogness
  Cc: Geert Uytterhoeven, Marek Szyprowski, Petr Mladek,
	Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	Linux Kernel Mailing List, Greg Kroah-Hartman,
	open list:ARM/Amlogic Meson...,
	Theodore Ts'o, Alexander Potapenko, Marco Elver, kasan-dev

Hey again,

On Thu, Jun 09, 2022 at 01:58:44PM +0200, Jason A. Donenfeld wrote:
> Hi John,
> 
> On Thu, Jun 09, 2022 at 01:25:15PM +0206, John Ogness wrote:
> > (Added RANDOM NUMBER DRIVER and KFENCE people.)
> 
> Thanks.
> 
> > I am guessing you have CONFIG_PROVE_RAW_LOCK_NESTING enabled?
> > 
> > We are seeing a spinlock (base_crng.lock) taken while holding a
> > raw_spinlock (meta->lock).
> > 
> > kfence_guarded_alloc()
> >   raw_spin_trylock_irqsave(&meta->lock, flags)
> >     prandom_u32_max()
> >       prandom_u32()
> >         get_random_u32()
> >           get_random_bytes()
> >             _get_random_bytes()
> >               crng_make_state()
> >                 spin_lock_irqsave(&base_crng.lock, flags);
> > 
> > I expect it is allowed to create kthreads via kthread_run() in
> > early_initcalls.
> 
> AFAIK, CONFIG_PROVE_RAW_LOCK_NESTING is useful for teasing out cases
> where RT's raw spinlocks will nest wrong with RT's sleeping spinlocks.
> But nobody who wants an RT kernel will be using KFENCE. So this seems
> like a non-issue? Maybe just add a `depends on !KFENCE` to
> PROVE_RAW_LOCK_NESTING?

On second thought, the fix is trivial:
https://lore.kernel.org/lkml/20220609121709.12939-1-Jason@zx2c4.com/

Jason

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
@ 2022-06-09 12:18                                   ` Jason A. Donenfeld
  0 siblings, 0 replies; 99+ messages in thread
From: Jason A. Donenfeld @ 2022-06-09 12:18 UTC (permalink / raw)
  To: John Ogness
  Cc: Geert Uytterhoeven, Marek Szyprowski, Petr Mladek,
	Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	Linux Kernel Mailing List, Greg Kroah-Hartman,
	open list:ARM/Amlogic Meson...,
	Theodore Ts'o, Alexander Potapenko, Marco Elver, kasan-dev

Hey again,

On Thu, Jun 09, 2022 at 01:58:44PM +0200, Jason A. Donenfeld wrote:
> Hi John,
> 
> On Thu, Jun 09, 2022 at 01:25:15PM +0206, John Ogness wrote:
> > (Added RANDOM NUMBER DRIVER and KFENCE people.)
> 
> Thanks.
> 
> > I am guessing you have CONFIG_PROVE_RAW_LOCK_NESTING enabled?
> > 
> > We are seeing a spinlock (base_crng.lock) taken while holding a
> > raw_spinlock (meta->lock).
> > 
> > kfence_guarded_alloc()
> >   raw_spin_trylock_irqsave(&meta->lock, flags)
> >     prandom_u32_max()
> >       prandom_u32()
> >         get_random_u32()
> >           get_random_bytes()
> >             _get_random_bytes()
> >               crng_make_state()
> >                 spin_lock_irqsave(&base_crng.lock, flags);
> > 
> > I expect it is allowed to create kthreads via kthread_run() in
> > early_initcalls.
> 
> AFAIK, CONFIG_PROVE_RAW_LOCK_NESTING is useful for teasing out cases
> where RT's raw spinlocks will nest wrong with RT's sleeping spinlocks.
> But nobody who wants an RT kernel will be using KFENCE. So this seems
> like a non-issue? Maybe just add a `depends on !KFENCE` to
> PROVE_RAW_LOCK_NESTING?

On second thought, the fix is trivial:
https://lore.kernel.org/lkml/20220609121709.12939-1-Jason@zx2c4.com/

Jason

_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
  2022-06-09 12:18                                   ` Dmitry Vyukov
@ 2022-06-09 12:27                                     ` Jason A. Donenfeld
  -1 siblings, 0 replies; 99+ messages in thread
From: Jason A. Donenfeld @ 2022-06-09 12:27 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: John Ogness, Geert Uytterhoeven, Marek Szyprowski, Petr Mladek,
	Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	Linux Kernel Mailing List, Greg Kroah-Hartman,
	open list:ARM/Amlogic Meson...,
	Theodore Ts'o, Alexander Potapenko, Marco Elver, kasan-dev,
	bigeasy

Hi Dmitry,

On Thu, Jun 09, 2022 at 02:18:19PM +0200, Dmitry Vyukov wrote:
> > AFAIK, CONFIG_PROVE_RAW_LOCK_NESTING is useful for teasing out cases
> > where RT's raw spinlocks will nest wrong with RT's sleeping spinlocks.
> > But nobody who wants an RT kernel will be using KFENCE. So this seems
> > like a non-issue? Maybe just add a `depends on !KFENCE` to
> > PROVE_RAW_LOCK_NESTING?
> 
> Don't know if there are other good solutions (of similar simplicity).

Fortunately, I found one that solves things without needing to
compromise on anything:
https://lore.kernel.org/lkml/20220609121709.12939-1-Jason@zx2c4.com/

> Btw, should this new CONFIG_PROVE_RAW_LOCK_NESTING be generally
> enabled on testing systems? We don't have it enabled on syzbot.

Last time I spoke with RT people about this, the goal was eventually to
*always* enable it when lock proving is enabled, but there are too many
bugs and cases now to do that, so it's an opt-in. I might be
misremembering, though, so CC'ing Sebastian in case he wants to chime
in.

Jason

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
@ 2022-06-09 12:27                                     ` Jason A. Donenfeld
  0 siblings, 0 replies; 99+ messages in thread
From: Jason A. Donenfeld @ 2022-06-09 12:27 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: John Ogness, Geert Uytterhoeven, Marek Szyprowski, Petr Mladek,
	Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	Linux Kernel Mailing List, Greg Kroah-Hartman,
	open list:ARM/Amlogic Meson...,
	Theodore Ts'o, Alexander Potapenko, Marco Elver, kasan-dev,
	bigeasy

Hi Dmitry,

On Thu, Jun 09, 2022 at 02:18:19PM +0200, Dmitry Vyukov wrote:
> > AFAIK, CONFIG_PROVE_RAW_LOCK_NESTING is useful for teasing out cases
> > where RT's raw spinlocks will nest wrong with RT's sleeping spinlocks.
> > But nobody who wants an RT kernel will be using KFENCE. So this seems
> > like a non-issue? Maybe just add a `depends on !KFENCE` to
> > PROVE_RAW_LOCK_NESTING?
> 
> Don't know if there are other good solutions (of similar simplicity).

Fortunately, I found one that solves things without needing to
compromise on anything:
https://lore.kernel.org/lkml/20220609121709.12939-1-Jason@zx2c4.com/

> Btw, should this new CONFIG_PROVE_RAW_LOCK_NESTING be generally
> enabled on testing systems? We don't have it enabled on syzbot.

Last time I spoke with RT people about this, the goal was eventually to
*always* enable it when lock proving is enabled, but there are too many
bugs and cases now to do that, so it's an opt-in. I might be
misremembering, though, so CC'ing Sebastian in case he wants to chime
in.

Jason

_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
  2022-06-09 12:27                                     ` Jason A. Donenfeld
@ 2022-06-09 12:32                                       ` Dmitry Vyukov
  -1 siblings, 0 replies; 99+ messages in thread
From: Dmitry Vyukov @ 2022-06-09 12:32 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: John Ogness, Geert Uytterhoeven, Marek Szyprowski, Petr Mladek,
	Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	Linux Kernel Mailing List, Greg Kroah-Hartman,
	open list:ARM/Amlogic Meson...,
	Theodore Ts'o, Alexander Potapenko, Marco Elver, kasan-dev,
	bigeasy

On Thu, 9 Jun 2022 at 14:27, Jason A. Donenfeld <Jason@zx2c4.com> wrote:
>
> Hi Dmitry,
>
> On Thu, Jun 09, 2022 at 02:18:19PM +0200, Dmitry Vyukov wrote:
> > > AFAIK, CONFIG_PROVE_RAW_LOCK_NESTING is useful for teasing out cases
> > > where RT's raw spinlocks will nest wrong with RT's sleeping spinlocks.
> > > But nobody who wants an RT kernel will be using KFENCE. So this seems
> > > like a non-issue? Maybe just add a `depends on !KFENCE` to
> > > PROVE_RAW_LOCK_NESTING?
> >
> > Don't know if there are other good solutions (of similar simplicity).
>
> Fortunately, I found one that solves things without needing to
> compromise on anything:
> https://lore.kernel.org/lkml/20220609121709.12939-1-Jason@zx2c4.com/

Cool! Thanks!

> > Btw, should this new CONFIG_PROVE_RAW_LOCK_NESTING be generally
> > enabled on testing systems? We don't have it enabled on syzbot.
>
> Last time I spoke with RT people about this, the goal was eventually to
> *always* enable it when lock proving is enabled, but there are too many
> bugs and cases now to do that, so it's an opt-in. I might be
> misremembering, though, so CC'ing Sebastian in case he wants to chime
> in.

OK, we will wait then.
Little point in doubling the number of reports for known issues.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
@ 2022-06-09 12:32                                       ` Dmitry Vyukov
  0 siblings, 0 replies; 99+ messages in thread
From: Dmitry Vyukov @ 2022-06-09 12:32 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: John Ogness, Geert Uytterhoeven, Marek Szyprowski, Petr Mladek,
	Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	Linux Kernel Mailing List, Greg Kroah-Hartman,
	open list:ARM/Amlogic Meson...,
	Theodore Ts'o, Alexander Potapenko, Marco Elver, kasan-dev,
	bigeasy

On Thu, 9 Jun 2022 at 14:27, Jason A. Donenfeld <Jason@zx2c4.com> wrote:
>
> Hi Dmitry,
>
> On Thu, Jun 09, 2022 at 02:18:19PM +0200, Dmitry Vyukov wrote:
> > > AFAIK, CONFIG_PROVE_RAW_LOCK_NESTING is useful for teasing out cases
> > > where RT's raw spinlocks will nest wrong with RT's sleeping spinlocks.
> > > But nobody who wants an RT kernel will be using KFENCE. So this seems
> > > like a non-issue? Maybe just add a `depends on !KFENCE` to
> > > PROVE_RAW_LOCK_NESTING?
> >
> > Don't know if there are other good solutions (of similar simplicity).
>
> Fortunately, I found one that solves things without needing to
> compromise on anything:
> https://lore.kernel.org/lkml/20220609121709.12939-1-Jason@zx2c4.com/

Cool! Thanks!

> > Btw, should this new CONFIG_PROVE_RAW_LOCK_NESTING be generally
> > enabled on testing systems? We don't have it enabled on syzbot.
>
> Last time I spoke with RT people about this, the goal was eventually to
> *always* enable it when lock proving is enabled, but there are too many
> bugs and cases now to do that, so it's an opt-in. I might be
> misremembering, though, so CC'ing Sebastian in case he wants to chime
> in.

OK, we will wait then.
Little point in doubling the number of reports for known issues.

_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
  2022-06-09 12:27                                     ` Jason A. Donenfeld
@ 2022-06-17 16:51                                       ` Sebastian Andrzej Siewior
  -1 siblings, 0 replies; 99+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-06-17 16:51 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: Dmitry Vyukov, John Ogness, Geert Uytterhoeven, Marek Szyprowski,
	Petr Mladek, Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	Linux Kernel Mailing List, Greg Kroah-Hartman,
	open list:ARM/Amlogic Meson...,
	Theodore Ts'o, Alexander Potapenko, Marco Elver, kasan-dev

On 2022-06-09 14:27:11 [+0200], Jason A. Donenfeld wrote:
> Hi Dmitry,
> 
> On Thu, Jun 09, 2022 at 02:18:19PM +0200, Dmitry Vyukov wrote:
> > > AFAIK, CONFIG_PROVE_RAW_LOCK_NESTING is useful for teasing out cases
> > > where RT's raw spinlocks will nest wrong with RT's sleeping spinlocks.
> > > But nobody who wants an RT kernel will be using KFENCE. So this seems
> > > like a non-issue? Maybe just add a `depends on !KFENCE` to
> > > PROVE_RAW_LOCK_NESTING?
> > 
> > Don't know if there are other good solutions (of similar simplicity).
> 
> Fortunately, I found one that solves things without needing to
> compromise on anything:
> https://lore.kernel.org/lkml/20220609121709.12939-1-Jason@zx2c4.com/
> 
> > Btw, should this new CONFIG_PROVE_RAW_LOCK_NESTING be generally
> > enabled on testing systems? We don't have it enabled on syzbot.
> 
> Last time I spoke with RT people about this, the goal was eventually to
> *always* enable it when lock proving is enabled, but there are too many
> bugs and cases now to do that, so it's an opt-in. I might be
> misremembering, though, so CC'ing Sebastian in case he wants to chime
> in.

That is basically still the case. If CONFIG_PROVE_RAW_LOCK_NESTING yells
then there will be yelling on PREEMPT_RT, too. We would like to get
things fixed ;)

Without going through this thread, John is looking at printk and printk
triggers a few of those. That is one of reasons why this is not enabled
by default.

> Jason

Sebastian

_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
@ 2022-06-17 16:51                                       ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 99+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-06-17 16:51 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: Dmitry Vyukov, John Ogness, Geert Uytterhoeven, Marek Szyprowski,
	Petr Mladek, Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	Linux Kernel Mailing List, Greg Kroah-Hartman,
	open list:ARM/Amlogic Meson...,
	Theodore Ts'o, Alexander Potapenko, Marco Elver, kasan-dev

On 2022-06-09 14:27:11 [+0200], Jason A. Donenfeld wrote:
> Hi Dmitry,
> 
> On Thu, Jun 09, 2022 at 02:18:19PM +0200, Dmitry Vyukov wrote:
> > > AFAIK, CONFIG_PROVE_RAW_LOCK_NESTING is useful for teasing out cases
> > > where RT's raw spinlocks will nest wrong with RT's sleeping spinlocks.
> > > But nobody who wants an RT kernel will be using KFENCE. So this seems
> > > like a non-issue? Maybe just add a `depends on !KFENCE` to
> > > PROVE_RAW_LOCK_NESTING?
> > 
> > Don't know if there are other good solutions (of similar simplicity).
> 
> Fortunately, I found one that solves things without needing to
> compromise on anything:
> https://lore.kernel.org/lkml/20220609121709.12939-1-Jason@zx2c4.com/
> 
> > Btw, should this new CONFIG_PROVE_RAW_LOCK_NESTING be generally
> > enabled on testing systems? We don't have it enabled on syzbot.
> 
> Last time I spoke with RT people about this, the goal was eventually to
> *always* enable it when lock proving is enabled, but there are too many
> bugs and cases now to do that, so it's an opt-in. I might be
> misremembering, though, so CC'ing Sebastian in case he wants to chime
> in.

That is basically still the case. If CONFIG_PROVE_RAW_LOCK_NESTING yells
then there will be yelling on PREEMPT_RT, too. We would like to get
things fixed ;)

Without going through this thread, John is looking at printk and printk
triggers a few of those. That is one of reasons why this is not enabled
by default.

> Jason

Sebastian

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
  2022-04-25 20:58   ` [PATCH printk v5 1/1] printk: extend console_lock for per-console locking John Ogness
@ 2022-06-22  9:03       ` Geert Uytterhoeven
  2022-06-22  9:03       ` Geert Uytterhoeven
  1 sibling, 0 replies; 99+ messages in thread
From: Geert Uytterhoeven @ 2022-06-22  9:03 UTC (permalink / raw)
  To: John Ogness
  Cc: Petr Mladek, Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	Linux Kernel Mailing List, Greg Kroah-Hartman, linux-riscv

Hi John,

On Tue, Apr 26, 2022 at 10:50 AM John Ogness <john.ogness@linutronix.de> wrote:
> Currently threaded console printers synchronize against each
> other using console_lock(). However, different console drivers
> are unrelated and do not require any synchronization between
> each other. Removing the synchronization between the threaded
> console printers will allow each console to print at its own
> speed.

[...]

> Signed-off-by: John Ogness <john.ogness@linutronix.de>

Thanks for your patch, which is now commit 8e274732115f63c1
("printk: extend console_lock for per-console locking") in
v5.19-rc1.

I have bisected another intriguing issue to this commit: on SiPEED
MAiX BiT (Canaan K210 riscv), it no longer prints the line detecting
ttySIF0, i.e. the console output changes like:

     spi-nor spi1.0: gd25lq128d (16384 Kbytes)
     i2c_dev: i2c /dev entries driver
     k210-fpioa 502b0000.pinmux: K210 FPIOA pin controller
    -38000000.serial: ttySIF0 at MMIO 0x38000000 (irq = 1, base_baud =
115200) is a SiFive UART v0
     printk: console [ttySIF0] enabled
     printk: bootconsole [sifive0] disabled
     printk: console [ttySIF0] printing thread started

As this patch does not make any changes to drivers/tty/, and ttySIF0
does work (it's the console), I looked in /proc/kmsg, and bingo,
the missing line is there, so it is generated, but never printed.

I tried taking the port spinlock in sifive_serial_startup(), as
suggested for the meson driver, but that doesn't make a difference.

Do you have a clue?
Thanks!

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
@ 2022-06-22  9:03       ` Geert Uytterhoeven
  0 siblings, 0 replies; 99+ messages in thread
From: Geert Uytterhoeven @ 2022-06-22  9:03 UTC (permalink / raw)
  To: John Ogness
  Cc: Petr Mladek, Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	Linux Kernel Mailing List, Greg Kroah-Hartman, linux-riscv

Hi John,

On Tue, Apr 26, 2022 at 10:50 AM John Ogness <john.ogness@linutronix.de> wrote:
> Currently threaded console printers synchronize against each
> other using console_lock(). However, different console drivers
> are unrelated and do not require any synchronization between
> each other. Removing the synchronization between the threaded
> console printers will allow each console to print at its own
> speed.

[...]

> Signed-off-by: John Ogness <john.ogness@linutronix.de>

Thanks for your patch, which is now commit 8e274732115f63c1
("printk: extend console_lock for per-console locking") in
v5.19-rc1.

I have bisected another intriguing issue to this commit: on SiPEED
MAiX BiT (Canaan K210 riscv), it no longer prints the line detecting
ttySIF0, i.e. the console output changes like:

     spi-nor spi1.0: gd25lq128d (16384 Kbytes)
     i2c_dev: i2c /dev entries driver
     k210-fpioa 502b0000.pinmux: K210 FPIOA pin controller
    -38000000.serial: ttySIF0 at MMIO 0x38000000 (irq = 1, base_baud =
115200) is a SiFive UART v0
     printk: console [ttySIF0] enabled
     printk: bootconsole [sifive0] disabled
     printk: console [ttySIF0] printing thread started

As this patch does not make any changes to drivers/tty/, and ttySIF0
does work (it's the console), I looked in /proc/kmsg, and bingo,
the missing line is there, so it is generated, but never printed.

I tried taking the port spinlock in sifive_serial_startup(), as
suggested for the meson driver, but that doesn't make a difference.

Do you have a clue?
Thanks!

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
  2022-06-22  9:03       ` Geert Uytterhoeven
@ 2022-06-22 22:37         ` John Ogness
  -1 siblings, 0 replies; 99+ messages in thread
From: John Ogness @ 2022-06-22 22:37 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Petr Mladek, Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	Linux Kernel Mailing List, Greg Kroah-Hartman, linux-riscv

On 2022-06-22, Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> I have bisected another intriguing issue to this commit: on SiPEED
> MAiX BiT (Canaan K210 riscv), it no longer prints the line detecting
> ttySIF0, i.e. the console output changes like:
>
>      spi-nor spi1.0: gd25lq128d (16384 Kbytes)
>      i2c_dev: i2c /dev entries driver
>      k210-fpioa 502b0000.pinmux: K210 FPIOA pin controller
>     -38000000.serial: ttySIF0 at MMIO 0x38000000 (irq = 1, base_baud =
> 115200) is a SiFive UART v0
>      printk: console [ttySIF0] enabled
>      printk: bootconsole [sifive0] disabled
>      printk: console [ttySIF0] printing thread started
>
> As this patch does not make any changes to drivers/tty/, and ttySIF0
> does work (it's the console), I looked in /proc/kmsg, and bingo,
> the missing line is there, so it is generated, but never printed.

What is sifive0? Are you using the earlycon driver to create an early
boot console? Can I see your boot args?

There is a known issue the that earlycon does not synchronize with
normal consoles. A patch was recently posted [0] on LKML.

> I tried taking the port spinlock in sifive_serial_startup(), as
> suggested for the meson driver, but that doesn't make a difference.

It may not have made a difference for you, but it should be there.
sifive_serial_startup() is writing to SIFIVE_SERIAL_IE_OFFS without
taking port->lock. sifive_serial_console_write() also writes to this
register (under port->lock). This could lead to RX watermark interrupts
being disabled for some time. The problem is not as bad as it was with
the meson driver because __ssp_enable_rxwm() is updating the shadow
copy. But still, it is a bug. And anyway we shouldn't have 2 CPUs
writing to a register simultaneously.

John Ogness

[0] https://lore.kernel.org/lkml/20220621090900.GB7891@pathway.suse.cz

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
@ 2022-06-22 22:37         ` John Ogness
  0 siblings, 0 replies; 99+ messages in thread
From: John Ogness @ 2022-06-22 22:37 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Petr Mladek, Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	Linux Kernel Mailing List, Greg Kroah-Hartman, linux-riscv

On 2022-06-22, Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> I have bisected another intriguing issue to this commit: on SiPEED
> MAiX BiT (Canaan K210 riscv), it no longer prints the line detecting
> ttySIF0, i.e. the console output changes like:
>
>      spi-nor spi1.0: gd25lq128d (16384 Kbytes)
>      i2c_dev: i2c /dev entries driver
>      k210-fpioa 502b0000.pinmux: K210 FPIOA pin controller
>     -38000000.serial: ttySIF0 at MMIO 0x38000000 (irq = 1, base_baud =
> 115200) is a SiFive UART v0
>      printk: console [ttySIF0] enabled
>      printk: bootconsole [sifive0] disabled
>      printk: console [ttySIF0] printing thread started
>
> As this patch does not make any changes to drivers/tty/, and ttySIF0
> does work (it's the console), I looked in /proc/kmsg, and bingo,
> the missing line is there, so it is generated, but never printed.

What is sifive0? Are you using the earlycon driver to create an early
boot console? Can I see your boot args?

There is a known issue the that earlycon does not synchronize with
normal consoles. A patch was recently posted [0] on LKML.

> I tried taking the port spinlock in sifive_serial_startup(), as
> suggested for the meson driver, but that doesn't make a difference.

It may not have made a difference for you, but it should be there.
sifive_serial_startup() is writing to SIFIVE_SERIAL_IE_OFFS without
taking port->lock. sifive_serial_console_write() also writes to this
register (under port->lock). This could lead to RX watermark interrupts
being disabled for some time. The problem is not as bad as it was with
the meson driver because __ssp_enable_rxwm() is updating the shadow
copy. But still, it is a bug. And anyway we shouldn't have 2 CPUs
writing to a register simultaneously.

John Ogness

[0] https://lore.kernel.org/lkml/20220621090900.GB7891@pathway.suse.cz

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
  2022-06-22 22:37         ` John Ogness
@ 2022-06-23 10:10           ` Geert Uytterhoeven
  -1 siblings, 0 replies; 99+ messages in thread
From: Geert Uytterhoeven @ 2022-06-23 10:10 UTC (permalink / raw)
  To: John Ogness
  Cc: Petr Mladek, Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	Linux Kernel Mailing List, Greg Kroah-Hartman, linux-riscv

Hi John,

On Thu, Jun 23, 2022 at 12:37 AM John Ogness <john.ogness@linutronix.de> wrote:
> On 2022-06-22, Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> > I have bisected another intriguing issue to this commit: on SiPEED
> > MAiX BiT (Canaan K210 riscv), it no longer prints the line detecting
> > ttySIF0, i.e. the console output changes like:
> >
> >      spi-nor spi1.0: gd25lq128d (16384 Kbytes)
> >      i2c_dev: i2c /dev entries driver
> >      k210-fpioa 502b0000.pinmux: K210 FPIOA pin controller
> >     -38000000.serial: ttySIF0 at MMIO 0x38000000 (irq = 1, base_baud =
> > 115200) is a SiFive UART v0
> >      printk: console [ttySIF0] enabled
> >      printk: bootconsole [sifive0] disabled
> >      printk: console [ttySIF0] printing thread started
> >
> > As this patch does not make any changes to drivers/tty/, and ttySIF0
> > does work (it's the console), I looked in /proc/kmsg, and bingo,
> > the missing line is there, so it is generated, but never printed.
>
> What is sifive0? Are you using the earlycon driver to create an early
> boot console? Can I see your boot args?

earlycon: sifive0 at MMIO 0x0000000038000000 (options '115200n8')
printk: bootconsole [sifive0] enabled
Kernel command line: earlycon console=ttySIF0 rootdelay=2 root=/dev/mmcblk0p1 ro

> There is a known issue the that earlycon does not synchronize with
> normal consoles. A patch was recently posted [0] on LKML.

> [0] https://lore.kernel.org/lkml/20220621090900.GB7891@pathway.suse.cz

Thank you, that did the trick!

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
@ 2022-06-23 10:10           ` Geert Uytterhoeven
  0 siblings, 0 replies; 99+ messages in thread
From: Geert Uytterhoeven @ 2022-06-23 10:10 UTC (permalink / raw)
  To: John Ogness
  Cc: Petr Mladek, Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	Linux Kernel Mailing List, Greg Kroah-Hartman, linux-riscv

Hi John,

On Thu, Jun 23, 2022 at 12:37 AM John Ogness <john.ogness@linutronix.de> wrote:
> On 2022-06-22, Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> > I have bisected another intriguing issue to this commit: on SiPEED
> > MAiX BiT (Canaan K210 riscv), it no longer prints the line detecting
> > ttySIF0, i.e. the console output changes like:
> >
> >      spi-nor spi1.0: gd25lq128d (16384 Kbytes)
> >      i2c_dev: i2c /dev entries driver
> >      k210-fpioa 502b0000.pinmux: K210 FPIOA pin controller
> >     -38000000.serial: ttySIF0 at MMIO 0x38000000 (irq = 1, base_baud =
> > 115200) is a SiFive UART v0
> >      printk: console [ttySIF0] enabled
> >      printk: bootconsole [sifive0] disabled
> >      printk: console [ttySIF0] printing thread started
> >
> > As this patch does not make any changes to drivers/tty/, and ttySIF0
> > does work (it's the console), I looked in /proc/kmsg, and bingo,
> > the missing line is there, so it is generated, but never printed.
>
> What is sifive0? Are you using the earlycon driver to create an early
> boot console? Can I see your boot args?

earlycon: sifive0 at MMIO 0x0000000038000000 (options '115200n8')
printk: bootconsole [sifive0] enabled
Kernel command line: earlycon console=ttySIF0 rootdelay=2 root=/dev/mmcblk0p1 ro

> There is a known issue the that earlycon does not synchronize with
> normal consoles. A patch was recently posted [0] on LKML.

> [0] https://lore.kernel.org/lkml/20220621090900.GB7891@pathway.suse.cz

Thank you, that did the trick!

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 99+ messages in thread

end of thread, other threads:[~2022-06-23 10:11 UTC | newest]

Thread overview: 99+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-21 21:22 [PATCH printk v4 00/15] implement threaded console printing John Ogness
2022-04-21 21:22 ` [PATCH printk v4 01/15] printk: rename cpulock functions John Ogness
2022-04-21 21:22 ` [PATCH printk v4 02/15] printk: cpu sync always disable interrupts John Ogness
2022-04-21 21:22 ` [PATCH printk v4 03/15] printk: add missing memory barrier to wake_up_klogd() John Ogness
2022-04-21 21:22 ` [PATCH printk v4 04/15] printk: wake up all waiters John Ogness
2022-04-21 21:22 ` [PATCH printk v4 05/15] printk: wake waiters for safe and NMI contexts John Ogness
2022-04-21 21:22 ` [PATCH printk v4 06/15] printk: get caller_id/timestamp after migration disable John Ogness
2022-04-21 21:22 ` [PATCH printk v4 07/15] printk: call boot_delay_msec() in printk_delay() John Ogness
2022-04-21 21:22 ` [PATCH printk v4 08/15] printk: add con_printk() macro for console details John Ogness
2022-04-21 21:22 ` [PATCH printk v4 09/15] printk: refactor and rework printing logic John Ogness
2022-04-21 21:22 ` [PATCH printk v4 10/15] printk: move buffer definitions into console_emit_next_record() caller John Ogness
2022-04-21 21:22 ` [PATCH printk v4 11/15] printk: add pr_flush() John Ogness
2022-04-21 21:22 ` [PATCH printk v4 12/15] printk: add functions to prefer direct printing John Ogness
2022-04-21 21:22 ` [PATCH printk v4 13/15] printk: add kthread console printers John Ogness
2022-04-22  7:48   ` Petr Mladek
2022-04-21 21:22 ` [PATCH printk v4 14/15] printk: extend console_lock for proper kthread support John Ogness
2022-04-21 21:40   ` John Ogness
2022-04-22  9:21   ` Petr Mladek
2022-04-25 20:58   ` [PATCH printk v5 1/1] printk: extend console_lock for per-console locking John Ogness
2022-04-26 12:07     ` Petr Mladek
2022-04-26 13:16       ` Petr Mladek
     [not found]         ` <CGME20220427070833eucas1p27a32ce7c41c0da26f05bd52155f0031c@eucas1p2.samsung.com>
2022-04-27  7:08           ` Marek Szyprowski
2022-04-27  7:08             ` Marek Szyprowski
2022-04-27  7:38             ` Petr Mladek
2022-04-27  7:38               ` Petr Mladek
2022-04-27 11:44               ` Marek Szyprowski
2022-04-27 11:44                 ` Marek Szyprowski
2022-04-27 16:15                 ` John Ogness
2022-04-27 16:15                   ` John Ogness
2022-04-27 16:48                   ` Petr Mladek
2022-04-27 16:48                     ` Petr Mladek
2022-04-28 14:54                   ` Petr Mladek
2022-04-28 14:54                     ` Petr Mladek
2022-04-29 13:53                   ` Marek Szyprowski
2022-04-29 13:53                     ` Marek Szyprowski
2022-04-30 16:00                     ` John Ogness
2022-04-30 16:00                       ` John Ogness
2022-05-02  9:19                       ` Marek Szyprowski
2022-05-02  9:19                         ` Marek Szyprowski
2022-05-02 13:11                         ` John Ogness
2022-05-02 13:11                           ` John Ogness
2022-05-02 22:29                           ` Marek Szyprowski
2022-05-02 22:29                             ` Marek Szyprowski
2022-05-04  5:56                             ` John Ogness
2022-05-04  5:56                               ` John Ogness
2022-05-04  6:52                               ` Marek Szyprowski
2022-05-04  6:52                                 ` Marek Szyprowski
2022-06-08 15:10                           ` Geert Uytterhoeven
2022-06-08 15:10                             ` Geert Uytterhoeven
2022-06-09 11:19                             ` John Ogness
2022-06-09 11:19                               ` John Ogness
2022-06-09 11:58                               ` Jason A. Donenfeld
2022-06-09 11:58                                 ` Jason A. Donenfeld
2022-06-09 12:18                                 ` Dmitry Vyukov
2022-06-09 12:18                                   ` Dmitry Vyukov
2022-06-09 12:27                                   ` Jason A. Donenfeld
2022-06-09 12:27                                     ` Jason A. Donenfeld
2022-06-09 12:32                                     ` Dmitry Vyukov
2022-06-09 12:32                                       ` Dmitry Vyukov
2022-06-17 16:51                                     ` Sebastian Andrzej Siewior
2022-06-17 16:51                                       ` Sebastian Andrzej Siewior
2022-06-09 12:18                                 ` Jason A. Donenfeld
2022-06-09 12:18                                   ` Jason A. Donenfeld
2022-05-02 13:17                         ` Petr Mladek
2022-05-02 13:17                           ` Petr Mladek
2022-05-02 23:13                           ` Marek Szyprowski
2022-05-02 23:13                             ` Marek Szyprowski
2022-05-03  6:49                             ` Petr Mladek
2022-05-03  6:49                               ` Petr Mladek
2022-05-04  6:05                               ` Marek Szyprowski
2022-05-04  6:05                                 ` Marek Szyprowski
2022-05-04 21:11                             ` John Ogness
2022-05-04 21:11                               ` John Ogness
2022-05-04 22:42                               ` John Ogness
2022-05-04 22:42                                 ` John Ogness
2022-05-05 22:33                                 ` John Ogness
2022-05-05 22:33                                   ` John Ogness
2022-05-06  6:43                                   ` Marek Szyprowski
2022-05-06  6:43                                     ` Marek Szyprowski
2022-05-06  7:55                                     ` Neil Armstrong
2022-05-06  7:55                                       ` Neil Armstrong
2022-05-08 11:02                                       ` John Ogness
2022-05-08 11:02                                         ` John Ogness
2022-05-06  8:16                                     ` Petr Mladek
2022-05-06  8:16                                       ` Petr Mladek
2022-05-06  9:20                                     ` John Ogness
2022-05-06  9:20                                       ` John Ogness
     [not found]             ` <CGME20220506112526eucas1p2a3688f87d3ed8331b99f2f876bf6c2f6@eucas1p2.samsung.com>
2022-05-06 11:25               ` Marek Szyprowski
2022-05-06 12:41                 ` John Ogness
2022-05-06 13:04                   ` Marek Szyprowski
2022-06-22  9:03     ` Geert Uytterhoeven
2022-06-22  9:03       ` Geert Uytterhoeven
2022-06-22 22:37       ` John Ogness
2022-06-22 22:37         ` John Ogness
2022-06-23 10:10         ` Geert Uytterhoeven
2022-06-23 10:10           ` Geert Uytterhoeven
2022-04-21 21:22 ` [PATCH printk v4 15/15] printk: remove @console_locked John Ogness
2022-04-22  9:39 ` [PATCH printk v4 00/15] implement threaded console printing Petr Mladek
2022-04-22 20:29   ` Petr Mladek

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.