All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v10 0/2] printk: Make printk() completely async
@ 2016-04-04 16:57 Sergey Senozhatsky
  2016-04-04 16:57 ` [PATCH v10 1/2] " Sergey Senozhatsky
  2016-04-04 16:57 ` [PATCH v10 2/2] printk: Make wake_up_klogd_work_func() async Sergey Senozhatsky
  0 siblings, 2 replies; 40+ messages in thread
From: Sergey Senozhatsky @ 2016-04-04 16:57 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Jan Kara, Petr Mladek, Tejun Heo, Tetsuo Handa, linux-kernel,
	Byungchul Park, Sergey Senozhatsky, Sergey Senozhatsky

Hello,

This patch set makes printk() completely asynchronous: new messages
are getting upended to the kernel printk buffer, but instead of 'direct'
printing the actual print job is performed by a dedicated kthread.
This has the advantage that printing always happens from a schedulable
context and thus we don't lockup any particular CPU or even interrupts.

against next-20160404

v10:
-- simplify printk_kthread_need_flush_console (Jan, Petr)

v9:
-- move need_flush_console assignment down in vprintk_emit (Jan)
-- simplify need_flush_console assignment rules (Petr)
-- clear need_flush_console in printing function (Petr)
-- rename need_flush_console (Petr)

v8:
-- rename kthread printing function (Petr)
-- clear need_flush_console in console_unlock() under logbuf (Petr)

v7:
-- do not set global printk_sync in panic in vrintk_emit() (Petr)
-- simplify vprintk_emit(). drop some of local variables (Petr)
-- move handling of LOGLEVEL_SCHED messages back to printk_deferred()
   so we wake_up_process()/console_trylock() in vprintk_emit() only
   for !in_sched messages

v6:
-- move wake_up_process out of logbuf lock (Jan, Byungchul)
-- do not disable async printk in recursion handling code.
-- rebase against next-20160321 (w/NMI patches)

v5:
-- make printk.synchronous RO (Petr)
-- make printing_func() correct and do not use wait_queue (Petr)
-- do not panic() when can't allocate printing thread (Petr)
-- do not wake_up_process() only in IRQ, prefer vprintk_emit() (Jan)
-- move wake_up_klogd_work_func() to a separate patch (Petr)
-- move wake_up_process() under logbuf lock so printk recursion logic can
   help us out
-- switch to sync_print mode if printk recursion occured
-- drop "printk: Skip messages on oops" patch

v4:
-- do not directly wake_up() the printing kthread from vprintk_emit(), need
   to go via IRQ->wake_up() to avoid sched deadlocks (Jan)

v3:
-- use a dedicated kthread for printing instead of using wq (Jan, Tetsuo, Tejun)

v2:
- use dedicated printk workqueue with WQ_MEM_RECLAIM bit
- fallback to system-wide workqueue only if allocation of printk_wq has
  failed
- do not use system_wq as a fallback wq. both console_lock() and onsole_unlock()
  can spend a significant amount of time; so we need to use system_long_wq.
- rework sync/!sync detection logic
  a) we can have deferred (in_sched) messages before we allocate printk_wq,
     so the only way to handle those messages is via IRQ context
  b) even in printk.synchronous mode, deferred messages must not be printed
     directly, and should go via IRQ context
  c) even if we allocated printk_wq and have !sync_printk mode, we must route
     deferred messages via IRQ context
- so this adds additional bool flags to vprint_emit() and introduces a new
  pending bit to `printk_pending'
- fix build on !PRINTK configs


Jan Kara (2):
  printk: Make printk() completely async
  printk: Make wake_up_klogd_work_func() async

 Documentation/kernel-parameters.txt |  10 ++++
 kernel/printk/printk.c              | 100 ++++++++++++++++++++++++++++++++----
 2 files changed, 100 insertions(+), 10 deletions(-)

-- 
2.8.0

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v10 1/2] printk: Make printk() completely async
  2016-04-04 16:57 [PATCH v10 0/2] printk: Make printk() completely async Sergey Senozhatsky
@ 2016-04-04 16:57 ` Sergey Senozhatsky
  2016-04-04 22:51   ` Andrew Morton
  2016-04-04 16:57 ` [PATCH v10 2/2] printk: Make wake_up_klogd_work_func() async Sergey Senozhatsky
  1 sibling, 1 reply; 40+ messages in thread
From: Sergey Senozhatsky @ 2016-04-04 16:57 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Jan Kara, Petr Mladek, Tejun Heo, Tetsuo Handa, linux-kernel,
	Byungchul Park, Sergey Senozhatsky, Sergey Senozhatsky, Jan Kara

From: Jan Kara <jack@suse.cz>

Currently, printk() sometimes waits for message to be printed to console
and sometimes it does not (when console_sem is held by some other
process). In case printk() grabs console_sem and starts printing to
console, it prints messages from kernel printk buffer until the buffer
is empty. When serial console is attached, printing is slow and thus
other CPUs in the system have plenty of time to append new messages to
the buffer while one CPU is printing. Thus the CPU can spend unbounded
amount of time doing printing in console_unlock().  This is especially
serious problem if the printk() calling console_unlock() was called with
interrupts disabled.

In practice users have observed a CPU can spend tens of seconds printing
in console_unlock() (usually during boot when hundreds of SCSI devices
are discovered) resulting in RCU stalls (CPU doing printing doesn't
reach quiescent state for a long time), softlockup reports (IPIs for the
printing CPU don't get served and thus other CPUs are spinning waiting
for the printing CPU to process IPIs), and eventually a machine death
(as messages from stalls and lockups append to printk buffer faster than
we are able to print). So these machines are unable to boot with serial
console attached. Another observed issue is that due to slow printk,
hardware discovery is slow and udev times out before kernel manages to
discover all the attached HW. Also during artificial stress testing SATA
disk disappears from the system because its interrupts aren't served for
too long.

This patch makes printk() completely asynchronous (similar to what
printk_deferred() did until now). It appends message to the kernel
printk buffer and wake_up()s a special dedicated kthread to do the
printing to console. This has the advantage that printing always happens
from a schedulable contex and thus we don't lockup any particular CPU or
even interrupts. Also it has the advantage that printk() is fast and
thus kernel booting is not slowed down by slow serial console.
Disadvantage of this method is that in case of crash there is higher
chance that important messages won't appear in console output (we may
need working scheduling to print message to console). We somewhat
mitigate this risk by switching printk to the original method of
immediate printing to console if oops is in progress.  Also for
debugging purposes we provide printk.synchronous kernel parameter which
resorts to the original printk behavior.

printk() is expected to work under different conditions and in different
scenarios, including corner cases of OOM when all of the workers are busy
(e.g. allocating memory), thus printk() uses its own dedicated printing
kthread, rather than relying on workqueue (even with WQ_MEM_RECLAIM bit
set we potentially can receive delays in printing until workqueue
declares a ->mayday, as noted by Tetsuo Handa).

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
---
 Documentation/kernel-parameters.txt | 10 +++++
 kernel/printk/printk.c              | 87 ++++++++++++++++++++++++++++++++++---
 2 files changed, 90 insertions(+), 7 deletions(-)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 3d28b50..c23a5bd 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -3122,6 +3122,16 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
 	printk.time=	Show timing data prefixed to each printk message line
 			Format: <bool>  (1/Y/y=enable, 0/N/n=disable)
 
+	printk.synchronous=
+			By default kernel messages are printed to console
+			asynchronously (except during early boot or when oops
+			is happening). That avoids kernel stalling behind slow
+			serial console and thus avoids softlockups, interrupt
+			timeouts, or userspace timing out during heavy printing.
+			However for debugging problems, printing messages to
+			console immediately may be desirable. This option
+			enables such behavior.
+
 	processor.max_cstate=	[HW,ACPI]
 			Limit processor to maximum C-state
 			max_cstate=9 overrides any DMI blacklist limit.
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index bfbf284..f63dfe4 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -46,6 +46,7 @@
 #include <linux/utsname.h>
 #include <linux/ctype.h>
 #include <linux/uio.h>
+#include <linux/kthread.h>
 
 #include <asm/uaccess.h>
 #include <asm-generic/sections.h>
@@ -284,6 +285,19 @@ static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN);
 static char *log_buf = __log_buf;
 static u32 log_buf_len = __LOG_BUF_LEN;
 
+/*
+ * When true, printing to console will happen synchronously.
+ * The default value on UP systems is 'true'.
+ */
+static bool __read_mostly printk_sync = !IS_ENABLED(CONFIG_SMP);
+module_param_named(synchronous, printk_sync, bool, S_IRUGO);
+MODULE_PARM_DESC(synchronous, "make printing to console synchronous");
+
+/* Printing kthread for async printk */
+static struct task_struct *printk_kthread;
+/* When `true' - printing thread has messages to print */
+static bool printk_kthread_need_flush_console;
+
 /* Return log buffer address */
 char *log_buf_addr_get(void)
 {
@@ -1608,6 +1622,8 @@ asmlinkage int vprintk_emit(int facility, int level,
 			    const char *dict, size_t dictlen,
 			    const char *fmt, va_list args)
 {
+	/* cpu currently holding logbuf_lock in this function */
+	static unsigned int logbuf_cpu = UINT_MAX;
 	static bool recursion_bug;
 	static char textbuf[LOG_LINE_MAX];
 	char *text = textbuf;
@@ -1617,8 +1633,7 @@ asmlinkage int vprintk_emit(int facility, int level,
 	int this_cpu;
 	int printed_len = 0;
 	bool in_sched = false;
-	/* cpu currently holding logbuf_lock in this function */
-	static unsigned int logbuf_cpu = UINT_MAX;
+	bool in_panic = console_loglevel == CONSOLE_LOGLEVEL_MOTORMOUTH;
 
 	if (level == LOGLEVEL_SCHED) {
 		level = LOGLEVEL_DEFAULT;
@@ -1757,12 +1772,29 @@ asmlinkage int vprintk_emit(int facility, int level,
 	if (!in_sched) {
 		lockdep_off();
 		/*
-		 * Try to acquire and then immediately release the console
-		 * semaphore.  The release will print out buffers and wake up
-		 * /dev/kmsg and syslog() users.
+		 * By default we print message to console asynchronously so
+		 * that kernel doesn't get stalled due to slow serial console.
+		 * That can lead to softlockups, lost interrupts, or userspace
+		 * timing out under heavy printing load.
+		 *
+		 * However we resort to synchronous printing of messages during
+		 * early boot, when synchronous printing was explicitly
+		 * requested by kernel parameter, or when console_verbose() was
+		 * called to print everything during panic / oops.
 		 */
-		if (console_trylock())
-			console_unlock();
+		if (!in_panic && printk_kthread) {
+			/* Offload printing to a schedulable context. */
+			printk_kthread_need_flush_console = true;
+			wake_up_process(printk_kthread);
+		} else {
+			/*
+			 * Try to acquire and then immediately release the
+			 * console semaphore.  The release will print out
+			 * buffers and wake up /dev/kmsg and syslog() users.
+			 */
+			if (console_trylock())
+				console_unlock();
+		}
 		lockdep_on();
 	}
 
@@ -2722,6 +2754,47 @@ static int __init printk_late_init(void)
 late_initcall(printk_late_init);
 
 #if defined CONFIG_PRINTK
+static int printk_kthread_func(void *data)
+{
+	while (1) {
+		set_current_state(TASK_INTERRUPTIBLE);
+		if (!printk_kthread_need_flush_console)
+			schedule();
+
+		__set_current_state(TASK_RUNNING);
+		/*
+		 * Avoid an infinite loop when console_unlock() cannot
+		 * access consoles, e.g. because console_suspended is
+		 * true. schedule(), someone else will print the messages
+		 * from resume_console().
+		 */
+		printk_kthread_need_flush_console = false;
+
+		console_lock();
+		console_unlock();
+	}
+
+	return 0;
+}
+
+static int __init init_printk_kthread(void)
+{
+	struct task_struct *thread;
+
+	if (printk_sync)
+		return 0;
+
+	thread = kthread_run(printk_kthread_func, NULL, "printk");
+	if (IS_ERR(thread)) {
+		pr_err("printk: unable to create printing thread\n");
+		printk_sync = true;
+	} else {
+		printk_kthread = thread;
+	}
+	return 0;
+}
+late_initcall(init_printk_kthread);
+
 /*
  * Delayed printk version, for scheduler-internal messages:
  */
-- 
2.8.0

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v10 2/2] printk: Make wake_up_klogd_work_func() async
  2016-04-04 16:57 [PATCH v10 0/2] printk: Make printk() completely async Sergey Senozhatsky
  2016-04-04 16:57 ` [PATCH v10 1/2] " Sergey Senozhatsky
@ 2016-04-04 16:57 ` Sergey Senozhatsky
  1 sibling, 0 replies; 40+ messages in thread
From: Sergey Senozhatsky @ 2016-04-04 16:57 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Jan Kara, Petr Mladek, Tejun Heo, Tetsuo Handa, linux-kernel,
	Byungchul Park, Sergey Senozhatsky, Sergey Senozhatsky, Jan Kara

From: Jan Kara <jack@suse.cz>

Offload printing of scheduler deferred messages from IRQ context
to a schedulable printing kthread, when possible (the same way we
do it in vprintk_emit()). Otherwise, the CPU can spend unbounded
amount of time doing printing in console_unlock() from IRQ.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
---
 kernel/printk/printk.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index f63dfe4..c364f18 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2808,9 +2808,16 @@ static void wake_up_klogd_work_func(struct irq_work *irq_work)
 	int pending = __this_cpu_xchg(printk_pending, 0);
 
 	if (pending & PRINTK_PENDING_OUTPUT) {
-		/* If trylock fails, someone else is doing the printing */
-		if (console_trylock())
-			console_unlock();
+		if (printk_kthread) {
+			wake_up_process(printk_kthread);
+		} else {
+			/*
+			 * If trylock fails, someone else is doing
+			 * the printing
+			 */
+			if (console_trylock())
+				console_unlock();
+		}
 	}
 
 	if (pending & PRINTK_PENDING_WAKEUP)
-- 
2.8.0

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v10 1/2] printk: Make printk() completely async
  2016-04-04 16:57 ` [PATCH v10 1/2] " Sergey Senozhatsky
@ 2016-04-04 22:51   ` Andrew Morton
  2016-04-05  5:17     ` Sergey Senozhatsky
  2016-04-06  8:27     ` Jan Kara
  0 siblings, 2 replies; 40+ messages in thread
From: Andrew Morton @ 2016-04-04 22:51 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Jan Kara, Petr Mladek, Tejun Heo, Tetsuo Handa, linux-kernel,
	Byungchul Park, Sergey Senozhatsky, Jan Kara

On Tue,  5 Apr 2016 01:57:27 +0900 Sergey Senozhatsky <sergey.senozhatsky@gmail.com> wrote:

> From: Jan Kara <jack@suse.cz>
> 
> Currently, printk() sometimes waits for message to be printed to console
> and sometimes it does not (when console_sem is held by some other
> process). In case printk() grabs console_sem and starts printing to
> console, it prints messages from kernel printk buffer until the buffer
> is empty. When serial console is attached, printing is slow and thus
> other CPUs in the system have plenty of time to append new messages to
> the buffer while one CPU is printing. Thus the CPU can spend unbounded
> amount of time doing printing in console_unlock().  This is especially
> serious problem if the printk() calling console_unlock() was called with
> interrupts disabled.
> 
> In practice users have observed a CPU can spend tens of seconds printing
> in console_unlock() (usually during boot when hundreds of SCSI devices
> are discovered) resulting in RCU stalls (CPU doing printing doesn't
> reach quiescent state for a long time), softlockup reports (IPIs for the
> printing CPU don't get served and thus other CPUs are spinning waiting
> for the printing CPU to process IPIs), and eventually a machine death
> (as messages from stalls and lockups append to printk buffer faster than
> we are able to print). So these machines are unable to boot with serial
> console attached. Another observed issue is that due to slow printk,
> hardware discovery is slow and udev times out before kernel manages to
> discover all the attached HW. Also during artificial stress testing SATA
> disk disappears from the system because its interrupts aren't served for
> too long.
> 
> This patch makes printk() completely asynchronous (similar to what
> printk_deferred() did until now). It appends message to the kernel
> printk buffer and wake_up()s a special dedicated kthread to do the
> printing to console. This has the advantage that printing always happens
> from a schedulable contex and thus we don't lockup any particular CPU or
> even interrupts. Also it has the advantage that printk() is fast and
> thus kernel booting is not slowed down by slow serial console.
> Disadvantage of this method is that in case of crash there is higher
> chance that important messages won't appear in console output (we may
> need working scheduling to print message to console). We somewhat
> mitigate this risk by switching printk to the original method of
> immediate printing to console if oops is in progress.  Also for
> debugging purposes we provide printk.synchronous kernel parameter which
> resorts to the original printk behavior.
> 
> printk() is expected to work under different conditions and in different
> scenarios, including corner cases of OOM when all of the workers are busy
> (e.g. allocating memory), thus printk() uses its own dedicated printing
> kthread, rather than relying on workqueue (even with WQ_MEM_RECLAIM bit
> set we potentially can receive delays in printing until workqueue
> declares a ->mayday, as noted by Tetsuo Handa).

The whole idea remains worrisome.  It is definitely making printk()
less reliable in the vast majority of cases: what happens if the
scheduler is busted or random memory has been scribbled on, etc.

All this downside to handle (afaict) one special case.  Surely there is
another way?  For example (but feel free to suggest other approaches!)
can we put some limit on the number of extra characters which the
printing task will print?  Once that limit is hit, new printk callers
will spin until they can get in and do some printing themselves.  Or
something else?

> index 3d28b50..c23a5bd 100644
> --- a/Documentation/kernel-parameters.txt
> +++ b/Documentation/kernel-parameters.txt
> @@ -3122,6 +3122,16 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
>  	printk.time=	Show timing data prefixed to each printk message line
>  			Format: <bool>  (1/Y/y=enable, 0/N/n=disable)
>  
> +	printk.synchronous=
> +			By default kernel messages are printed to console
> +			asynchronously (except during early boot or when oops
> +			is happening). That avoids kernel stalling behind slow
> +			serial console and thus avoids softlockups, interrupt
> +			timeouts, or userspace timing out during heavy printing.
> +			However for debugging problems, printing messages to
> +			console immediately may be desirable. This option
> +			enables such behavior.

Well, it's good that we have this.

It would be better if it was runtime-controllable - changing boot
parameters is a bit of a pain.  In fact with this approach, your
zillions-of-scsi-disks scenario becomes less problematic: do the async
offloading during the boot process then switch back to the more
reliable sync printing late in boot.

> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -46,6 +46,7 @@
>  #include <linux/utsname.h>
>  #include <linux/ctype.h>
>  #include <linux/uio.h>
> +#include <linux/kthread.h>
>  
>  #include <asm/uaccess.h>
>  #include <asm-generic/sections.h>
> @@ -284,6 +285,19 @@ static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN);
>  static char *log_buf = __log_buf;
>  static u32 log_buf_len = __LOG_BUF_LEN;
>  
> +/*
> + * When true, printing to console will happen synchronously.
> + * The default value on UP systems is 'true'.

That's rather obvious from the code.  Comments should explain "why",
not "what".

> + */
> +static bool __read_mostly printk_sync = !IS_ENABLED(CONFIG_SMP);
> +module_param_named(synchronous, printk_sync, bool, S_IRUGO);
> +MODULE_PARM_DESC(synchronous, "make printing to console synchronous");
> +
> +/* Printing kthread for async printk */
> +static struct task_struct *printk_kthread;
> +/* When `true' - printing thread has messages to print */
> +static bool printk_kthread_need_flush_console;
> +
>  /* Return log buffer address */
>  char *log_buf_addr_get(void)
>  {
> @@ -1608,6 +1622,8 @@ asmlinkage int vprintk_emit(int facility, int level,
>  			    const char *dict, size_t dictlen,
>  			    const char *fmt, va_list args)
>  {
> +	/* cpu currently holding logbuf_lock in this function */
> +	static unsigned int logbuf_cpu = UINT_MAX;
>  	static bool recursion_bug;
>  	static char textbuf[LOG_LINE_MAX];
>  	char *text = textbuf;
> @@ -1617,8 +1633,7 @@ asmlinkage int vprintk_emit(int facility, int level,
>  	int this_cpu;
>  	int printed_len = 0;
>  	bool in_sched = false;
> -	/* cpu currently holding logbuf_lock in this function */
> -	static unsigned int logbuf_cpu = UINT_MAX;
> +	bool in_panic = console_loglevel == CONSOLE_LOGLEVEL_MOTORMOUTH;
>
>  	if (level == LOGLEVEL_SCHED) {
>  		level = LOGLEVEL_DEFAULT;
> @@ -1757,12 +1772,29 @@ asmlinkage int vprintk_emit(int facility, int level,
>  	if (!in_sched) {
>  		lockdep_off();
>  		/*
> -		 * Try to acquire and then immediately release the console
> -		 * semaphore.  The release will print out buffers and wake up
> -		 * /dev/kmsg and syslog() users.
> +		 * By default we print message to console asynchronously so

Nit: this comment down here shouldn't know what the default is.  That
should be documented up at the printk_sync definition site.

> +		 * that kernel doesn't get stalled due to slow serial console.

s/kernel/the kernel/

> +		 * That can lead to softlockups, lost interrupts, or userspace
> +		 * timing out under heavy printing load.
> +		 *
> +		 * However we resort to synchronous printing of messages during
> +		 * early boot, when synchronous printing was explicitly
> +		 * requested by kernel parameter, or when console_verbose() was

s/kernel/a kernel/


> +		 * called to print everything during panic / oops.

We're missing a description of *why* console_verbose() is handled
specially.

>  		 */
> -		if (console_trylock())
> -			console_unlock();
> +		if (!in_panic && printk_kthread) {

We don't really need local variable in_panic.  I guess it has some
documentary value.

> +			/* Offload printing to a schedulable context. */
> +			printk_kthread_need_flush_console = true;
> +			wake_up_process(printk_kthread);
> +		} else {
> +			/*
> +			 * Try to acquire and then immediately release the
> +			 * console semaphore.  The release will print out
> +			 * buffers and wake up /dev/kmsg and syslog() users.
> +			 */
> +			if (console_trylock())
> +				console_unlock();
> +		}
>  		lockdep_on();
>  	}
>  
> @@ -2722,6 +2754,47 @@ static int __init printk_late_init(void)
>  late_initcall(printk_late_init);
>  
>  #if defined CONFIG_PRINTK
> +static int printk_kthread_func(void *data)
> +{
> +	while (1) {
> +		set_current_state(TASK_INTERRUPTIBLE);
> +		if (!printk_kthread_need_flush_console)
> +			schedule();
> +
> +		__set_current_state(TASK_RUNNING);
> +		/*
> +		 * Avoid an infinite loop when console_unlock() cannot
> +		 * access consoles, e.g. because console_suspended is
> +		 * true. schedule(), someone else will print the messages
> +		 * from resume_console().
> +		 */
> +		printk_kthread_need_flush_console = false;
> +
> +		console_lock();
> +		console_unlock();
> +	}
> +
> +	return 0;
> +}
> +
> +static int __init init_printk_kthread(void)
> +{
> +	struct task_struct *thread;
> +
> +	if (printk_sync)
> +		return 0;
> +
> +	thread = kthread_run(printk_kthread_func, NULL, "printk");

This gets normal scheduling policy, so a spinning userspace SCHED_FIFO
task will block printk for ever.  This seems bad.

> +	if (IS_ERR(thread)) {
> +		pr_err("printk: unable to create printing thread\n");
> +		printk_sync = true;
> +	} else {
> +		printk_kthread = thread;
> +	}
> +	return 0;
> +}
> +late_initcall(init_printk_kthread);

Could do with a comment explaining why late_initcall was chosen.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v10 1/2] printk: Make printk() completely async
  2016-04-04 22:51   ` Andrew Morton
@ 2016-04-05  5:17     ` Sergey Senozhatsky
  2016-04-05  7:39       ` Sergey Senozhatsky
  2016-04-06  8:27     ` Jan Kara
  1 sibling, 1 reply; 40+ messages in thread
From: Sergey Senozhatsky @ 2016-04-05  5:17 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Sergey Senozhatsky, Jan Kara, Petr Mladek, Tejun Heo,
	Tetsuo Handa, linux-kernel, Byungchul Park, Sergey Senozhatsky,
	Jan Kara

Hello Andrew,

On (04/04/16 15:51), Andrew Morton wrote:
[..]
> The whole idea remains worrisome.  It is definitely making printk()
> less reliable in the vast majority of cases: what happens if the
> scheduler is busted or random memory has been scribbled on, etc.

yes.

well, printk, in some sense, already depend on the scheduler: console
semaphore on its own; cond_resched() from console_unlock() with console_sem
being locked, etc. neither memory corruption is something that printk() can
always handle nicely. I saw logbuf_lock corruption and recursive spin_dump
from vprintk_emit(), was quite dramatic.


> All this downside to handle (afaict) one special case.

well, it's not just to make zillions-of-scsi-disks happy. I'm facing
different scenarios, more general ones, a 'moderate printk abuse'
(combined with slow serial console). printk is not always friendly
and shiny, it has some secrets and it can bite easily (lockups, stalls,
starvations, sched throttling, et cetera), and this is not something
that every developer know.

> Surely there is another way?  For example (but feel free to suggest other
> approaches!) can we put some limit on the number of extra characters which
> the printing task will print?  Once that limit is hit, new printk callers
> will spin until they can get in and do some printing themselves.  Or
> something else?

hm... there are not so many options, I think. this busy wait, depending on
the number of CPUs (and some other factors), can provoke mass softlockups
on other CPUs and a number of bad things. printk() and its deferred version
can be called from any context, so in some cases spinning in printk is as
good as doing console_unlock()->call_concosle_drivers() loop (iow, not really
good). things are not so bad (well, Tetsuo has some issues here though) if
printk() is called from non-atomic context, since now we have cond_resched()
in console_unlock() for console_lock()/console_trylock() callers; but printk()
from atomic context is still a problem -- we need to offload the actual
printing, unless we can guarantee that every atomic printk will be followed
by non-atomic printk (which will do the printing).

> > +	printk.synchronous=
..
> Well, it's good that we have this.
> 
> It would be better if it was runtime-controllable - changing boot
> parameters is a bit of a pain.  In fact with this approach, your
> zillions-of-scsi-disks scenario becomes less problematic: do the async
> offloading during the boot process then switch back to the more
> reliable sync printing late in boot.

well, I can add it if you insist.
my personal opinion is to keep it RO; RO->RW transition is easier than
RW->RO. giving the control over printk behaviour to user space can
potentially be even worse than drop_caches. besides I couldn't clearly
understand based on what observations user space may decide to switch
printk back to sync mode? and what may cause user space to switch printk
back from sync to async... lockups in dmesg output? any hint?

..
> > + * When true, printing to console will happen synchronously.
> > + * The default value on UP systems is 'true'.
> 
> That's rather obvious from the code.  Comments should explain "why",
> not "what".

fair enough.

> > +		 * By default we print message to console asynchronously so
> 
> Nit: this comment down here shouldn't know what the default is.  That
> should be documented up at the printk_sync definition site.

ok.

> > +		 * that kernel doesn't get stalled due to slow serial console.
> 
> s/kernel/the kernel/

ok.

> > +		 * requested by kernel parameter, or when console_verbose() was
> 
> s/kernel/a kernel/

ok.

> 
> > +		 * called to print everything during panic / oops.
> 
> We're missing a description of *why* console_verbose() is handled
> specially.

ok.

> > -		if (console_trylock())
> > -			console_unlock();
> > +		if (!in_panic && printk_kthread) {
> 
> We don't really need local variable in_panic.  I guess it has some
> documentary value.

just a shorter version of "console_loglevel == CONSOLE_LOGLEVEL_MOTORMOUTH".

> > +
> > +	thread = kthread_run(printk_kthread_func, NULL, "printk");
> 
> This gets normal scheduling policy, so a spinning userspace SCHED_FIFO
> task will block printk for ever.  This seems bad.

yes, using SCHED_RR/FIFO policy here makes sense.

> > +late_initcall(init_printk_kthread);
> 
> Could do with a comment explaining why late_initcall was chosen.

late_initcall was chosen because of workqueue early_initcall, and
I just decided not to change it when I switched from wq to a
dedicated printk kthread. late_initcall seemed to be OK. can do
init_printk_kthread() somewhere in init/main start_kernel().

	-ss

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v10 1/2] printk: Make printk() completely async
  2016-04-05  5:17     ` Sergey Senozhatsky
@ 2016-04-05  7:39       ` Sergey Senozhatsky
  2016-04-06  0:19         ` Sergey Senozhatsky
  0 siblings, 1 reply; 40+ messages in thread
From: Sergey Senozhatsky @ 2016-04-05  7:39 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Andrew Morton, Sergey Senozhatsky, Jan Kara, Petr Mladek,
	Tejun Heo, Tetsuo Handa, linux-kernel, Byungchul Park, Jan Kara

On (04/05/16 14:17), Sergey Senozhatsky wrote:
[..]
> > Could do with a comment explaining why late_initcall was chosen.
> 
> late_initcall was chosen because of workqueue early_initcall, and
> I just decided not to change it when I switched from wq to a
> dedicated printk kthread. late_initcall seemed to be OK. can do
> init_printk_kthread() somewhere in init/main start_kernel().

or rather move it a bit earlier. core_init sounds appropriate;
or postcore. don't want to export yet another printk symbol.

	-ss

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v10 1/2] printk: Make printk() completely async
  2016-04-05  7:39       ` Sergey Senozhatsky
@ 2016-04-06  0:19         ` Sergey Senozhatsky
  0 siblings, 0 replies; 40+ messages in thread
From: Sergey Senozhatsky @ 2016-04-06  0:19 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Jan Kara, Petr Mladek,
	Tejun Heo, Tetsuo Handa, linux-kernel, Byungchul Park, Jan Kara


Hello,

so I currently have something like this.
untouched things:
  - module param is RO; offload printing to kthread

(I'll wait for opinions/replies).

=======

>From 1e7de1f9a590d8f23609f943362768e4c14580cc Mon Sep 17 00:00:00 2001
From: Jan Kara <jack@suse.cz>
Date: Tue, 5 Apr 2016 01:57:27 +0900
Subject: [PATCH] printk: Make printk() completely async

Currently, printk() sometimes waits for message to be printed to console
and sometimes it does not (when console_sem is held by some other
process). In case printk() grabs console_sem and starts printing to
console, it prints messages from kernel printk buffer until the buffer
is empty. When serial console is attached, printing is slow and thus
other CPUs in the system have plenty of time to append new messages to
the buffer while one CPU is printing. Thus the CPU can spend unbounded
amount of time doing printing in console_unlock().  This is especially
serious problem if the printk() calling console_unlock() was called with
interrupts disabled.

In practice users have observed a CPU can spend tens of seconds printing
in console_unlock() (usually during boot when hundreds of SCSI devices
are discovered) resulting in RCU stalls (CPU doing printing doesn't
reach quiescent state for a long time), softlockup reports (IPIs for the
printing CPU don't get served and thus other CPUs are spinning waiting
for the printing CPU to process IPIs), and eventually a machine death
(as messages from stalls and lockups append to printk buffer faster than
we are able to print). So these machines are unable to boot with serial
console attached. Another observed issue is that due to slow printk,
hardware discovery is slow and udev times out before kernel manages to
discover all the attached HW. Also during artificial stress testing SATA
disk disappears from the system because its interrupts aren't served for
too long.

This patch makes printk() completely asynchronous (similar to what
printk_deferred() did until now). It appends message to the kernel
printk buffer and wake_up()s a special dedicated kthread to do the
printing to console. This has the advantage that printing always happens
from a schedulable contex and thus we don't lockup any particular CPU or
even interrupts. Also it has the advantage that printk() is fast and
thus kernel booting is not slowed down by slow serial console.
Disadvantage of this method is that in case of crash there is higher
chance that important messages won't appear in console output (we may
need working scheduling to print message to console). We somewhat
mitigate this risk by switching printk to the original method of
immediate printing to console if oops is in progress.  Also for
debugging purposes we provide printk.synchronous kernel parameter which
resorts to the original printk behavior.

printk() is expected to work under different conditions and in different
scenarios, including corner cases of OOM when all of the workers are busy
(e.g. allocating memory), thus printk() uses its own dedicated printing
kthread, rather than relying on workqueue (even with WQ_MEM_RECLAIM bit
set we potentially can receive delays in printing until workqueue
declares a ->mayday, as noted by Tetsuo Handa).

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
---
 Documentation/kernel-parameters.txt | 10 ++++
 kernel/printk/printk.c              | 94 ++++++++++++++++++++++++++++++++++---
 2 files changed, 97 insertions(+), 7 deletions(-)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 46c0b19..733a13d 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -3117,6 +3117,16 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
 	printk.time=	Show timing data prefixed to each printk message line
 			Format: <bool>  (1/Y/y=enable, 0/N/n=disable)
 
+	printk.synchronous=
+			By default kernel messages are printed to console
+			asynchronously (except during early boot or when oops
+			is happening). That avoids kernel stalling behind slow
+			serial console and thus avoids softlockups, interrupt
+			timeouts, or userspace timing out during heavy printing.
+			However for debugging problems, printing messages to
+			console immediately may be desirable. This option
+			enables such behavior.
+
 	processor.max_cstate=	[HW,ACPI]
 			Limit processor to maximum C-state
 			max_cstate=9 overrides any DMI blacklist limit.
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index bfbf284..217f27a 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -46,6 +46,7 @@
 #include <linux/utsname.h>
 #include <linux/ctype.h>
 #include <linux/uio.h>
+#include <linux/kthread.h>
 
 #include <asm/uaccess.h>
 #include <asm-generic/sections.h>
@@ -284,6 +285,16 @@ static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN);
 static char *log_buf = __log_buf;
 static u32 log_buf_len = __LOG_BUF_LEN;
 
+/* Control whether printing to console must be synchronous. */
+static bool __read_mostly printk_sync = !IS_ENABLED(CONFIG_SMP);
+module_param_named(synchronous, printk_sync, bool, S_IRUGO);
+MODULE_PARM_DESC(synchronous, "make printing to console synchronous");
+
+/* Printing kthread for async printk */
+static struct task_struct *printk_kthread;
+/* When `true' printing thread has messages to print */
+static bool printk_kthread_need_flush_console;
+
 /* Return log buffer address */
 char *log_buf_addr_get(void)
 {
@@ -1608,6 +1619,8 @@ asmlinkage int vprintk_emit(int facility, int level,
 			    const char *dict, size_t dictlen,
 			    const char *fmt, va_list args)
 {
+	/* cpu currently holding logbuf_lock in this function */
+	static unsigned int logbuf_cpu = UINT_MAX;
 	static bool recursion_bug;
 	static char textbuf[LOG_LINE_MAX];
 	char *text = textbuf;
@@ -1617,8 +1630,6 @@ asmlinkage int vprintk_emit(int facility, int level,
 	int this_cpu;
 	int printed_len = 0;
 	bool in_sched = false;
-	/* cpu currently holding logbuf_lock in this function */
-	static unsigned int logbuf_cpu = UINT_MAX;
 
 	if (level == LOGLEVEL_SCHED) {
 		level = LOGLEVEL_DEFAULT;
@@ -1757,12 +1768,35 @@ asmlinkage int vprintk_emit(int facility, int level,
 	if (!in_sched) {
 		lockdep_off();
 		/*
-		 * Try to acquire and then immediately release the console
-		 * semaphore.  The release will print out buffers and wake up
-		 * /dev/kmsg and syslog() users.
+		 * Attempt to print the messages to console asynchronously so
+		 * that the kernel doesn't get stalled due to slow serial
+		 * console. That can lead to softlockups, lost interrupts, or
+		 * userspace timing out under heavy printing load.
+		 *
+		 * However we resort to synchronous printing of messages during
+		 * early boot, when synchronous printing was explicitly
+		 * requested by a kernel parameter, or when console_verbose()
+		 * was called to print everything during panic / oops.
+		 * Unlike bust_spinlocks() and oops_in_progress,
+		 * console_verbose() sets console_loglevel to MOTORMOUTH and
+		 * never clears it, while oops_in_progress can go back to 0,
+		 * switching printk back to async mode; we want printk to
+		 * operate in sync mode once panic() occurred.
 		 */
-		if (console_trylock())
-			console_unlock();
+		if (console_loglevel != CONSOLE_LOGLEVEL_MOTORMOUTH &&
+				printk_kthread) {
+			/* Offload printing to a schedulable context. */
+			printk_kthread_need_flush_console = true;
+			wake_up_process(printk_kthread);
+		} else {
+			/*
+			 * Try to acquire and then immediately release the
+			 * console semaphore.  The release will print out
+			 * buffers and wake up /dev/kmsg and syslog() users.
+			 */
+			if (console_trylock())
+				console_unlock();
+		}
 		lockdep_on();
 	}
 
@@ -2722,6 +2756,52 @@ static int __init printk_late_init(void)
 late_initcall(printk_late_init);
 
 #if defined CONFIG_PRINTK
+static int printk_kthread_func(void *data)
+{
+	while (1) {
+		set_current_state(TASK_INTERRUPTIBLE);
+		if (!printk_kthread_need_flush_console)
+			schedule();
+
+		__set_current_state(TASK_RUNNING);
+		/*
+		 * Avoid an infinite loop when console_unlock() cannot
+		 * access consoles, e.g. because console_suspended is
+		 * true. schedule(), someone else will print the messages
+		 * from resume_console().
+		 */
+		printk_kthread_need_flush_console = false;
+
+		console_lock();
+		console_unlock();
+	}
+
+	return 0;
+}
+
+static int __init init_printk_kthread(void)
+{
+	struct task_struct *thread;
+
+	if (printk_sync)
+		return 0;
+
+	thread = kthread_run(printk_kthread_func, NULL, "printk");
+	if (IS_ERR(thread)) {
+		pr_err("printk: unable to create printing thread\n");
+		printk_sync = true;
+	} else {
+		struct sched_param param = {
+			.sched_priority = MAX_RT_PRIO - 1,
+		};
+
+		sched_setscheduler(thread, SCHED_FIFO, &param);
+		printk_kthread = thread;
+	}
+	return 0;
+}
+core_initcall(init_printk_kthread);
+
 /*
  * Delayed printk version, for scheduler-internal messages:
  */
-- 
2.8.0.rc0.1.gd285ab0

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v10 1/2] printk: Make printk() completely async
  2016-04-04 22:51   ` Andrew Morton
  2016-04-05  5:17     ` Sergey Senozhatsky
@ 2016-04-06  8:27     ` Jan Kara
  2016-04-07  9:48       ` Sergey Senozhatsky
  2016-08-10 21:17       ` Viresh Kumar
  1 sibling, 2 replies; 40+ messages in thread
From: Jan Kara @ 2016-04-06  8:27 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Sergey Senozhatsky, Jan Kara, Petr Mladek, Tejun Heo,
	Tetsuo Handa, linux-kernel, Byungchul Park, Sergey Senozhatsky,
	Jan Kara

On Mon 04-04-16 15:51:49, Andrew Morton wrote:
> On Tue,  5 Apr 2016 01:57:27 +0900 Sergey Senozhatsky <sergey.senozhatsky@gmail.com> wrote:
> 
> > From: Jan Kara <jack@suse.cz>
> > 
> > Currently, printk() sometimes waits for message to be printed to console
> > and sometimes it does not (when console_sem is held by some other
> > process). In case printk() grabs console_sem and starts printing to
> > console, it prints messages from kernel printk buffer until the buffer
> > is empty. When serial console is attached, printing is slow and thus
> > other CPUs in the system have plenty of time to append new messages to
> > the buffer while one CPU is printing. Thus the CPU can spend unbounded
> > amount of time doing printing in console_unlock().  This is especially
> > serious problem if the printk() calling console_unlock() was called with
> > interrupts disabled.
> > 
> > In practice users have observed a CPU can spend tens of seconds printing
> > in console_unlock() (usually during boot when hundreds of SCSI devices
> > are discovered) resulting in RCU stalls (CPU doing printing doesn't
> > reach quiescent state for a long time), softlockup reports (IPIs for the
> > printing CPU don't get served and thus other CPUs are spinning waiting
> > for the printing CPU to process IPIs), and eventually a machine death
> > (as messages from stalls and lockups append to printk buffer faster than
> > we are able to print). So these machines are unable to boot with serial
> > console attached. Another observed issue is that due to slow printk,
> > hardware discovery is slow and udev times out before kernel manages to
> > discover all the attached HW. Also during artificial stress testing SATA
> > disk disappears from the system because its interrupts aren't served for
> > too long.
> > 
> > This patch makes printk() completely asynchronous (similar to what
> > printk_deferred() did until now). It appends message to the kernel
> > printk buffer and wake_up()s a special dedicated kthread to do the
> > printing to console. This has the advantage that printing always happens
> > from a schedulable contex and thus we don't lockup any particular CPU or
> > even interrupts. Also it has the advantage that printk() is fast and
> > thus kernel booting is not slowed down by slow serial console.
> > Disadvantage of this method is that in case of crash there is higher
> > chance that important messages won't appear in console output (we may
> > need working scheduling to print message to console). We somewhat
> > mitigate this risk by switching printk to the original method of
> > immediate printing to console if oops is in progress.  Also for
> > debugging purposes we provide printk.synchronous kernel parameter which
> > resorts to the original printk behavior.
> > 
> > printk() is expected to work under different conditions and in different
> > scenarios, including corner cases of OOM when all of the workers are busy
> > (e.g. allocating memory), thus printk() uses its own dedicated printing
> > kthread, rather than relying on workqueue (even with WQ_MEM_RECLAIM bit
> > set we potentially can receive delays in printing until workqueue
> > declares a ->mayday, as noted by Tetsuo Handa).
> 
> The whole idea remains worrisome.  It is definitely making printk()
> less reliable in the vast majority of cases: what happens if the
> scheduler is busted or random memory has been scribbled on, etc.

So the scheduler dependency was always there due to handling of
console_sem and some console drivers possibly scheduling out while holding
console_sem. Recently the dependency has increased by adding cond_resched()
into console_unlock() loop in 8d91f8b15361 (printk: do cond_resched()
between lines while outputting to consoles). Also the changelog of that
commit explains another case besides 'tons of SCSI devices' which leads to
printk lockups. I have also seen a case where printing of 'martian packets'
killed the machine because it was too intensive (but it has been an older
kernel and I think networking people have added some ratelimiting for those
messages so it shouldn't happen anymore).

Now this commit makes the dependency on the scheduler even harder - unless
we know something is wrong (in_panic set) we rely on scheduler to schedule
the printing kthread. So I agree there is some risk in this patch.

> All this downside to handle (afaict) one special case.  Surely there is
> another way?  For example (but feel free to suggest other approaches!)
> can we put some limit on the number of extra characters which the
> printing task will print?  Once that limit is hit, new printk callers
> will spin until they can get in and do some printing themselves.  Or
> something else?

We have been through this a few times without much success. Last revision
of my patches did what you suggest but although the idea sounds simple, the
implementation turned out to be relatively complex and even you didn't
quite like it [1]. Moreover when I raised this issue at Kernel Summit 2015,
Hannes Reinecke mentioned that even these patches are not enough to make
his huge machine boot with serial console because printing still delays
device probing too much and systemd times out during boot. Then Linus said
that he'd want to keep things simple and just make printk asynchronous and
switch to synchronous in case problems are detected - which is what I and
Sergey have implemented in this patch.

So currently I'm of the opinion that we should merge this patch and see how
well it works in reality. I've tested it and it works for me and Sergey.
Maybe I can try breaking it some more by crashing the in various ways
kernel if it makes you calmer. In case we see problems in some setups,
there's always the kernel option to return to the original behavior and we
can work from there improving what we have...

[1] Message-Id: <20150918151411.a3fa65c3e4f33f9f2ddf1fd8@linux-foundation.org>

> > index 3d28b50..c23a5bd 100644
> > --- a/Documentation/kernel-parameters.txt
> > +++ b/Documentation/kernel-parameters.txt
> > @@ -3122,6 +3122,16 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
> >  	printk.time=	Show timing data prefixed to each printk message line
> >  			Format: <bool>  (1/Y/y=enable, 0/N/n=disable)
> >  
> > +	printk.synchronous=
> > +			By default kernel messages are printed to console
> > +			asynchronously (except during early boot or when oops
> > +			is happening). That avoids kernel stalling behind slow
> > +			serial console and thus avoids softlockups, interrupt
> > +			timeouts, or userspace timing out during heavy printing.
> > +			However for debugging problems, printing messages to
> > +			console immediately may be desirable. This option
> > +			enables such behavior.
> 
> Well, it's good that we have this.
> 
> It would be better if it was runtime-controllable - changing boot
> parameters is a bit of a pain.  In fact with this approach, your
> zillions-of-scsi-disks scenario becomes less problematic: do the async
> offloading during the boot process then switch back to the more
> reliable sync printing late in boot.

Doing this should be relatively easy. It would be userspace's decision
whether they want more reliable or faster printk. Sounds fine with me.

> > +static int __init init_printk_kthread(void)
> > +{
> > +	struct task_struct *thread;
> > +
> > +	if (printk_sync)
> > +		return 0;
> > +
> > +	thread = kthread_run(printk_kthread_func, NULL, "printk");
> 
> This gets normal scheduling policy, so a spinning userspace SCHED_FIFO
> task will block printk for ever.  This seems bad.

I have to research this a bit but won't the SCHED_FIFO task that has
potentially unbounded amount of work lockup the CPU even though it does
occasional cond_resched()?

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v10 1/2] printk: Make printk() completely async
  2016-04-06  8:27     ` Jan Kara
@ 2016-04-07  9:48       ` Sergey Senozhatsky
  2016-04-07 12:08         ` Sergey Senozhatsky
  2016-08-10 21:17       ` Viresh Kumar
  1 sibling, 1 reply; 40+ messages in thread
From: Sergey Senozhatsky @ 2016-04-07  9:48 UTC (permalink / raw)
  To: Jan Kara
  Cc: Andrew Morton, Sergey Senozhatsky, Jan Kara, Petr Mladek,
	Tejun Heo, Tetsuo Handa, linux-kernel, Byungchul Park,
	Sergey Senozhatsky

Hello,

On (04/06/16 10:27), Jan Kara wrote:
[..]
> > Well, it's good that we have this.
> > 
> > It would be better if it was runtime-controllable - changing boot
> > parameters is a bit of a pain.  In fact with this approach, your
> > zillions-of-scsi-disks scenario becomes less problematic: do the async
> > offloading during the boot process then switch back to the more
> > reliable sync printing late in boot.
> 
> Doing this should be relatively easy. It would be userspace's decision
> whether they want more reliable or faster printk. Sounds fine with me.

I can add it as a separate patch to the series. should be quite trivial.

I have [minor] concerns, though. I can see how, for example, user space
can decide what logging level it wants '1 4 4 7' or anything else, but
how can user space decide what printk implementation it wants to use?

I'm more or less positive not to back-port that `synchronous RW' patch
to the kernels that I use; just because I don't want to give this freedom
to people, sync printk is something I'm trying to run away from.


> > This gets normal scheduling policy, so a spinning userspace SCHED_FIFO
> > task will block printk for ever.  This seems bad.
> 
> I have to research this a bit but won't the SCHED_FIFO task that has
> potentially unbounded amount of work lockup the CPU even though it does
> occasional cond_resched()?

depending on `watchdog_thresh' value, it can take something like 20+
seconds before watchdog will notice softlockup.
so I'm setting printk kthread prio to `MAX_RT_PRIO - 1' as of now,
just in case.

I think I'll leave printk kthread init as a late_initcall. probably
would prefer core/arch/device init calls to happen in sync printk mode.

	-ss

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v10 1/2] printk: Make printk() completely async
  2016-04-07  9:48       ` Sergey Senozhatsky
@ 2016-04-07 12:08         ` Sergey Senozhatsky
  2016-04-07 13:15           ` Jan Kara
  0 siblings, 1 reply; 40+ messages in thread
From: Sergey Senozhatsky @ 2016-04-07 12:08 UTC (permalink / raw)
  To: Jan Kara, Andrew Morton
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Jan Kara, Petr Mladek,
	Tejun Heo, Tetsuo Handa, linux-kernel, Byungchul Park

Hello,

On (04/07/16 18:48), Sergey Senozhatsky wrote:
> On (04/06/16 10:27), Jan Kara wrote:
> [..]
> > > Well, it's good that we have this.
> > > 
> > > It would be better if it was runtime-controllable - changing boot
> > > parameters is a bit of a pain.  In fact with this approach, your
> > > zillions-of-scsi-disks scenario becomes less problematic: do the async
> > > offloading during the boot process then switch back to the more
> > > reliable sync printing late in boot.
> > 
> > Doing this should be relatively easy. It would be userspace's decision
> > whether they want more reliable or faster printk. Sounds fine with me.

ok, after some thinking -- it makes a lot of sense to have it. good old
sync printk is potentially more reliable after all. I think I also now
want to meke the 'default' for printk_sync being 'true'. so we won't
spoil printk on the systems that never had any problems with it. at
least for one release cycle, may be. thus, people would need to request
async printk via boot param and switch back to sync printk once the booting
process is done [or keep async printk].

how does that sound?

	-ss

> > > This gets normal scheduling policy, so a spinning userspace SCHED_FIFO
> > > task will block printk for ever.  This seems bad.
> > 
> > I have to research this a bit but won't the SCHED_FIFO task that has
> > potentially unbounded amount of work lockup the CPU even though it does
> > occasional cond_resched()?
> 
> depending on `watchdog_thresh' value, it can take something like 20+
> seconds before watchdog will notice softlockup.
> so I'm setting printk kthread prio to `MAX_RT_PRIO - 1' as of now,
> just in case.
> 
> I think I'll leave printk kthread init as a late_initcall. probably
> would prefer core/arch/device init calls to happen in sync printk mode.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v10 1/2] printk: Make printk() completely async
  2016-04-07 12:08         ` Sergey Senozhatsky
@ 2016-04-07 13:15           ` Jan Kara
  0 siblings, 0 replies; 40+ messages in thread
From: Jan Kara @ 2016-04-07 13:15 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Jan Kara, Andrew Morton, Sergey Senozhatsky, Jan Kara,
	Petr Mladek, Tejun Heo, Tetsuo Handa, linux-kernel,
	Byungchul Park

On Thu 07-04-16 21:08:26, Sergey Senozhatsky wrote:
> Hello,
> 
> On (04/07/16 18:48), Sergey Senozhatsky wrote:
> > On (04/06/16 10:27), Jan Kara wrote:
> > [..]
> > > > Well, it's good that we have this.
> > > > 
> > > > It would be better if it was runtime-controllable - changing boot
> > > > parameters is a bit of a pain.  In fact with this approach, your
> > > > zillions-of-scsi-disks scenario becomes less problematic: do the async
> > > > offloading during the boot process then switch back to the more
> > > > reliable sync printing late in boot.
> > > 
> > > Doing this should be relatively easy. It would be userspace's decision
> > > whether they want more reliable or faster printk. Sounds fine with me.
> 
> ok, after some thinking -- it makes a lot of sense to have it. good old
> sync printk is potentially more reliable after all. I think I also now
> want to meke the 'default' for printk_sync being 'true'. so we won't
> spoil printk on the systems that never had any problems with it. at
> least for one release cycle, may be. thus, people would need to request
> async printk via boot param and switch back to sync printk once the booting
> process is done [or keep async printk].
> 
> how does that sound?

That is fine with me. We can always enable this by default in our distro so
that it gets more exposure and big machines are able to boot...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v10 1/2] printk: Make printk() completely async
  2016-04-06  8:27     ` Jan Kara
  2016-04-07  9:48       ` Sergey Senozhatsky
@ 2016-08-10 21:17       ` Viresh Kumar
  2016-08-12  9:44         ` Petr Mladek
  1 sibling, 1 reply; 40+ messages in thread
From: Viresh Kumar @ 2016-08-10 21:17 UTC (permalink / raw)
  To: Jan Kara, Andrew Morton, Sergey Senozhatsky
  Cc: Jan Kara, Petr Mladek, Tejun Heo, Tetsuo Handa, linux-kernel,
	Byungchul Park, Sergey Senozhatsky, vlevenetz,
	Greg Kroah-Hartman

+Vladi/Greg,

On Wed, Apr 6, 2016 at 1:27 AM, Jan Kara <jack@suse.cz> wrote:
> On Mon 04-04-16 15:51:49, Andrew Morton wrote:

>> > +static int __init init_printk_kthread(void)
>> > +{
>> > +   struct task_struct *thread;
>> > +
>> > +   if (printk_sync)
>> > +           return 0;
>> > +
>> > +   thread = kthread_run(printk_kthread_func, NULL, "printk");
>>
>> This gets normal scheduling policy, so a spinning userspace SCHED_FIFO
>> task will block printk for ever.  This seems bad.
>
> I have to research this a bit but won't the SCHED_FIFO task that has
> potentially unbounded amount of work lockup the CPU even though it does
> occasional cond_resched()?

We are facing complete hogs because of the printk thread being a SCHED_FIFO
task and have this patch to fix it up for now.

Author: Vladislav Levenetz <vblagoev@mm-sol.com>
Date:   Wed Aug 10 13:58:00 2016 -0700

    SW-7786: printk: Lower the priority of printk thread

    Flooding the console (with a test module) in a tight loop indefinitely
    makes android user interface very sluggish. Opening YouTube app and the
    device hangs and becomes even more unresponsive to the point it
    completely hangs.

    The asynchronous printk thread is a SCHED FIFO thread with priority
    MAX_RT_PRIO - 1. If we create it as a simple thread (i.e. no SCHED FIFO)
    instead, we observe much better performance using the same printk flood
    test. We don't even notice any kind of sluggishness during device usage.
    We can play a YouTube clip smoothly and use the device normally in
    general.  The kernel log looks fine as well, as the flood of messages
    continue normally.

    Signed-off-by: Vladislav Levenetz <vblagoev@mm-sol.com>
    Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 kernel/printk/printk.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index c32872872cb6..ad5b30e5e6d9 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2856,9 +2856,6 @@ static int printk_kthread_func(void *data)
 static int __init_printk_kthread(void)
 {
        struct task_struct *thread;
-       struct sched_param param = {
-               .sched_priority = MAX_RT_PRIO - 1,
-       };

        if (!printk_kthread_can_run || printk_sync || printk_kthread)
                return 0;
@@ -2870,7 +2867,6 @@ static int __init_printk_kthread(void)
                return PTR_ERR(thread);
        }

-       sched_setscheduler(thread, SCHED_FIFO, &param);
        printk_kthread = thread;
        return 0;
 }

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v10 1/2] printk: Make printk() completely async
  2016-08-10 21:17       ` Viresh Kumar
@ 2016-08-12  9:44         ` Petr Mladek
  2016-08-15 14:26           ` Vladislav Levenetz
  2016-08-18  2:27           ` Sergey Senozhatsky
  0 siblings, 2 replies; 40+ messages in thread
From: Petr Mladek @ 2016-08-12  9:44 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Jan Kara, Andrew Morton, Sergey Senozhatsky, Jan Kara, Tejun Heo,
	Tetsuo Handa, linux-kernel, Byungchul Park, Sergey Senozhatsky,
	vlevenetz, Greg Kroah-Hartman

On Wed 2016-08-10 14:17:55, Viresh Kumar wrote:
> +Vladi/Greg,
> 
> On Wed, Apr 6, 2016 at 1:27 AM, Jan Kara <jack@suse.cz> wrote:
> > On Mon 04-04-16 15:51:49, Andrew Morton wrote:
> 
> >> > +static int __init init_printk_kthread(void)
> >> > +{
> >> > +   struct task_struct *thread;
> >> > +
> >> > +   if (printk_sync)
> >> > +           return 0;
> >> > +
> >> > +   thread = kthread_run(printk_kthread_func, NULL, "printk");
> >>
> >> This gets normal scheduling policy, so a spinning userspace SCHED_FIFO
> >> task will block printk for ever.  This seems bad.
> >
> > I have to research this a bit but won't the SCHED_FIFO task that has
> > potentially unbounded amount of work lockup the CPU even though it does
> > occasional cond_resched()?
> 
> We are facing complete hogs because of the printk thread being a SCHED_FIFO
> task and have this patch to fix it up for now.
> 
> Author: Vladislav Levenetz <vblagoev@mm-sol.com>
> Date:   Wed Aug 10 13:58:00 2016 -0700
> 
>     SW-7786: printk: Lower the priority of printk thread
> 
>     Flooding the console (with a test module) in a tight loop indefinitely
>     makes android user interface very sluggish. Opening YouTube app and the
>     device hangs and becomes even more unresponsive to the point it
>     completely hangs.
> 
>     The asynchronous printk thread is a SCHED FIFO thread with priority
>     MAX_RT_PRIO - 1. If we create it as a simple thread (i.e. no SCHED FIFO)
>     instead, we observe much better performance using the same printk flood
>     test. We don't even notice any kind of sluggishness during device usage.
>     We can play a YouTube clip smoothly and use the device normally in
>     general.  The kernel log looks fine as well, as the flood of messages
>     continue normally.
> 
>     Signed-off-by: Vladislav Levenetz <vblagoev@mm-sol.com>
>     Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
> ---
>  kernel/printk/printk.c | 4 ----
>  1 file changed, 4 deletions(-)
> 
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index c32872872cb6..ad5b30e5e6d9 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -2856,9 +2856,6 @@ static int printk_kthread_func(void *data)
>  static int __init_printk_kthread(void)
>  {
>         struct task_struct *thread;
> -       struct sched_param param = {
> -               .sched_priority = MAX_RT_PRIO - 1,
> -       };
> 
>         if (!printk_kthread_can_run || printk_sync || printk_kthread)
>                 return 0;
> @@ -2870,7 +2867,6 @@ static int __init_printk_kthread(void)
>                 return PTR_ERR(thread);
>         }
> 
> -       sched_setscheduler(thread, SCHED_FIFO, &param);
>         printk_kthread = thread;
>         return 0;
>  }

IMHO, this is fine. We force the synchronous mode in critical
situations anyway.

But I was curious if we could hit a printk from the wake_up_process().
The change above causes using the fair scheduler and there is
the following call chain [*]

  vprintk_emit()
  -> wake_up_process()
   -> try_to_wake_up()
    -> ttwu_queue()
     -> ttwu_do_activate()
      -> ttwu_activate()
       -> activate_task()
	-> enqueue_task()
	 -> enqueue_task_fair()	    via p->sched_class->enqueue_task
	  -> cfs_rq_of()
	   -> task_of()
	    -> WARN_ON_ONCE(!entity_is_task(se))

We should never trigger this because printk_kthread is a task.
But what if the date gets inconsistent?

Then there is the following chain:

  vprintk_emit()
  -> wake_up_process()
   -> try_to_wake_up()
    -> ttwu_queue()
     -> ttwu_do_activate()
      -> ttwu_activate()
       -> activate_task()
	-> enqueue_task()
	 -> enqueue_task_fair()	    via p->sched_class->enqueue_task
	  ->hrtick_update()
	   -> hrtick_start_fair()
	    -> WARN_ON(task_rq(p) != rq)

This looks like another paranoid consistency check that might be
triggered when the scheduler gets messed.

I see few possible solutions:

1. Replace the WARN_ONs by printk_deferred().

   This is the usual solution but it would make debugging less convenient.


2. Force synchronous printk inside WARN()/BUG() macros.

   This would make sense even from other reasons. These are printed
   when the system is in a strange state. There is no guarantee that
   the printk_kthread will get scheduled.


3. Force printk_deferred() inside WARN()/BUG() macros via the per-CPU
   printk_func.

   It might be elegant. But we do not want this outside the scheduler
   code. Therefore we would need special variants of  WARN_*_SCHED()
   BUG_*_SCHED() macros.


I personally prefer the 2nd solution. What do you think about it,
please?


Best Regards,
Petr

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v10 1/2] printk: Make printk() completely async
  2016-08-12  9:44         ` Petr Mladek
@ 2016-08-15 14:26           ` Vladislav Levenetz
  2016-08-16  9:04             ` Petr Mladek
  2016-08-18  2:27           ` Sergey Senozhatsky
  1 sibling, 1 reply; 40+ messages in thread
From: Vladislav Levenetz @ 2016-08-15 14:26 UTC (permalink / raw)
  To: Petr Mladek, Viresh Kumar
  Cc: Jan Kara, Andrew Morton, Sergey Senozhatsky, Jan Kara, Tejun Heo,
	Tetsuo Handa, linux-kernel, Byungchul Park, Sergey Senozhatsky,
	Greg Kroah-Hartman



On 08/12/2016 12:44 PM, Petr Mladek wrote:
> On Wed 2016-08-10 14:17:55, Viresh Kumar wrote:
>> +Vladi/Greg,
>>
>> On Wed, Apr 6, 2016 at 1:27 AM, Jan Kara <jack@suse.cz> wrote:
>>> On Mon 04-04-16 15:51:49, Andrew Morton wrote:
>>>>> +static int __init init_printk_kthread(void)
>>>>> +{
>>>>> +   struct task_struct *thread;
>>>>> +
>>>>> +   if (printk_sync)
>>>>> +           return 0;
>>>>> +
>>>>> +   thread = kthread_run(printk_kthread_func, NULL, "printk");
>>>> This gets normal scheduling policy, so a spinning userspace SCHED_FIFO
>>>> task will block printk for ever.  This seems bad.
>>> I have to research this a bit but won't the SCHED_FIFO task that has
>>> potentially unbounded amount of work lockup the CPU even though it does
>>> occasional cond_resched()?
>> We are facing complete hogs because of the printk thread being a SCHED_FIFO
>> task and have this patch to fix it up for now.
>>
>> Author: Vladislav Levenetz <vblagoev@mm-sol.com>
>> Date:   Wed Aug 10 13:58:00 2016 -0700
>>
>>      SW-7786: printk: Lower the priority of printk thread
>>
>>      Flooding the console (with a test module) in a tight loop indefinitely
>>      makes android user interface very sluggish. Opening YouTube app and the
>>      device hangs and becomes even more unresponsive to the point it
>>      completely hangs.
>>
>>      The asynchronous printk thread is a SCHED FIFO thread with priority
>>      MAX_RT_PRIO - 1. If we create it as a simple thread (i.e. no SCHED FIFO)
>>      instead, we observe much better performance using the same printk flood
>>      test. We don't even notice any kind of sluggishness during device usage.
>>      We can play a YouTube clip smoothly and use the device normally in
>>      general.  The kernel log looks fine as well, as the flood of messages
>>      continue normally.
>>
>>      Signed-off-by: Vladislav Levenetz <vblagoev@mm-sol.com>
>>      Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
>> ---
>>   kernel/printk/printk.c | 4 ----
>>   1 file changed, 4 deletions(-)
>>
>> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
>> index c32872872cb6..ad5b30e5e6d9 100644
>> --- a/kernel/printk/printk.c
>> +++ b/kernel/printk/printk.c
>> @@ -2856,9 +2856,6 @@ static int printk_kthread_func(void *data)
>>   static int __init_printk_kthread(void)
>>   {
>>          struct task_struct *thread;
>> -       struct sched_param param = {
>> -               .sched_priority = MAX_RT_PRIO - 1,
>> -       };
>>
>>          if (!printk_kthread_can_run || printk_sync || printk_kthread)
>>                  return 0;
>> @@ -2870,7 +2867,6 @@ static int __init_printk_kthread(void)
>>                  return PTR_ERR(thread);
>>          }
>>
>> -       sched_setscheduler(thread, SCHED_FIFO, &param);
>>          printk_kthread = thread;
>>          return 0;
>>   }
> IMHO, this is fine. We force the synchronous mode in critical
> situations anyway.
>
> But I was curious if we could hit a printk from the wake_up_process().
> The change above causes using the fair scheduler and there is
> the following call chain [*]
>
>    vprintk_emit()
>    -> wake_up_process()
>     -> try_to_wake_up()
>      -> ttwu_queue()
>       -> ttwu_do_activate()
>        -> ttwu_activate()
>         -> activate_task()
> 	-> enqueue_task()
> 	 -> enqueue_task_fair()	    via p->sched_class->enqueue_task
> 	  -> cfs_rq_of()
> 	   -> task_of()
> 	    -> WARN_ON_ONCE(!entity_is_task(se))
>
> We should never trigger this because printk_kthread is a task.
> But what if the date gets inconsistent?
>
> Then there is the following chain:
>
>    vprintk_emit()
>    -> wake_up_process()
>     -> try_to_wake_up()
>      -> ttwu_queue()
>       -> ttwu_do_activate()
>        -> ttwu_activate()
>         -> activate_task()
> 	-> enqueue_task()
> 	 -> enqueue_task_fair()	    via p->sched_class->enqueue_task
> 	  ->hrtick_update()
> 	   -> hrtick_start_fair()
> 	    -> WARN_ON(task_rq(p) != rq)
>
> This looks like another paranoid consistency check that might be
> triggered when the scheduler gets messed.
>
> I see few possible solutions:
>
> 1. Replace the WARN_ONs by printk_deferred().
>
>     This is the usual solution but it would make debugging less convenient.
>
>
> 2. Force synchronous printk inside WARN()/BUG() macros.
>
>     This would make sense even from other reasons. These are printed
>     when the system is in a strange state. There is no guarantee that
>     the printk_kthread will get scheduled.
>
>
> 3. Force printk_deferred() inside WARN()/BUG() macros via the per-CPU
>     printk_func.
>
>     It might be elegant. But we do not want this outside the scheduler
>     code. Therefore we would need special variants of  WARN_*_SCHED()
>     BUG_*_SCHED() macros.
>
>
> I personally prefer the 2nd solution. What do you think about it,
> please?
>
>
> Best Regards,
> Petr

Hi Petr,

Sorry with for the late reply.

Hitting a WARN()/BUG() from wake_up calls will lead to a deadlock if 
only a single CPU is running.
We already had such a situation with system suspend. During a specific 
test on our device sometimes we hit a WARN from the time keeping core. 
(Cannot recall which one exactly. Viresh have it) from a printk wake_up 
path during system suspend and with already only one CPU running.
So we were forced to make printing synchronous in the suspend path prior 
disabling all non-boot cpu's.

Your solution number 2) sounds reasonable to me.

I'm wondering if we could hit a WARN()/BUG() somewhere from the fair 
scheduler like the example you made for the RT sched?

Thanks.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v10 1/2] printk: Make printk() completely async
  2016-08-15 14:26           ` Vladislav Levenetz
@ 2016-08-16  9:04             ` Petr Mladek
  0 siblings, 0 replies; 40+ messages in thread
From: Petr Mladek @ 2016-08-16  9:04 UTC (permalink / raw)
  To: Vladislav Levenetz
  Cc: Viresh Kumar, Jan Kara, Andrew Morton, Sergey Senozhatsky,
	Jan Kara, Tejun Heo, Tetsuo Handa, linux-kernel, Byungchul Park,
	Sergey Senozhatsky, Greg Kroah-Hartman

On Mon 2016-08-15 17:26:50, Vladislav Levenetz wrote:
> On 08/12/2016 12:44 PM, Petr Mladek wrote:
> >But I was curious if we could hit a printk from the wake_up_process().
> >The change above causes using the fair scheduler and there is
> >the following call chain [*]
> >
> >   vprintk_emit()
> >   -> wake_up_process()
> >    -> try_to_wake_up()
> >     -> ttwu_queue()
> >      -> ttwu_do_activate()
> >       -> ttwu_activate()
> >        -> activate_task()
> >	-> enqueue_task()
> >	 -> enqueue_task_fair()	    via p->sched_class->enqueue_task
> >	  -> cfs_rq_of()
> >	   -> task_of()
> >	    -> WARN_ON_ONCE(!entity_is_task(se))
> >
> >We should never trigger this because printk_kthread is a task.
> >But what if the date gets inconsistent?
> >
> >Then there is the following chain:
> >
> >   vprintk_emit()
> >   -> wake_up_process()
> >    -> try_to_wake_up()
> >     -> ttwu_queue()
> >      -> ttwu_do_activate()
> >       -> ttwu_activate()
> >        -> activate_task()
> >	-> enqueue_task()
> >	 -> enqueue_task_fair()	    via p->sched_class->enqueue_task
> >	  ->hrtick_update()
> >	   -> hrtick_start_fair()
> >	    -> WARN_ON(task_rq(p) != rq)
> >
> >This looks like another paranoid consistency check that might be
> >triggered when the scheduler gets messed.
> >
> >I see few possible solutions:
> >
> >1. Replace the WARN_ONs by printk_deferred().
> >
> >    This is the usual solution but it would make debugging less convenient.
> >
> >
> >2. Force synchronous printk inside WARN()/BUG() macros.
> >
> >    This would make sense even from other reasons. These are printed
> >    when the system is in a strange state. There is no guarantee that
> >    the printk_kthread will get scheduled.
> >
> >
> >3. Force printk_deferred() inside WARN()/BUG() macros via the per-CPU
> >    printk_func.
> >
> >    It might be elegant. But we do not want this outside the scheduler
> >    code. Therefore we would need special variants of  WARN_*_SCHED()
> >    BUG_*_SCHED() macros.
> >
> >
> >I personally prefer the 2nd solution. What do you think about it,
> >please?
> >
> >
> >Best Regards,
> >Petr
> 
> Hi Petr,
> 
> Sorry with for the late reply.

No problem.

> Hitting a WARN()/BUG() from wake_up calls will lead to a deadlock if
> only a single CPU is running.

I think that the deadlock might happen also with more CPUs if
the async_printk() is enabled. I mean:

  printk_emit()
    wake_up_process()
      try_to_wake_up()
	raw_spin_lock_irqsave(&p->pi_lock, flags)  !!!!
	ttwu_queue()
	  ttwu_do_activate()
	    ttwu_activate()
	      activate_task()
		enqueue_task()
		  enqueue_task_fair()	    via p->sched_class->enqueue_task
		    hrtick_update()
		      hrtick_start_fair()
			WARN_ON(task_rq(p) != rq)
			  printk()
			    vprintk_emit()
			      wake_up_process()
				try_to_wake_up()
				  raw_spin_lock_irqsave(&p->pi_lock,
					flags)

There is a deadlock because p->pi_lock is already taken by
the first try_to_wake_up().

By other words, I think that the single running CPU was only
a symptom but it was not the root cause of the deadlock.

> We already had such a situation with system suspend. During a
> specific test on our device sometimes we hit a WARN from the time
> keeping core. (Cannot recall which one exactly. Viresh have it) from
> a printk wake_up path during system suspend and with already only
> one CPU running.
> So we were forced to make printing synchronous in the suspend path
> prior disabling all non-boot cpu's.
> 
> Your solution number 2) sounds reasonable to me.

Good.
 
> I'm wondering if we could hit a WARN()/BUG() somewhere from the fair
> scheduler like the example you made for the RT sched?

Unfortunately, it looks like. The example above actually is from
the fair scheduler.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v10 1/2] printk: Make printk() completely async
  2016-08-12  9:44         ` Petr Mladek
  2016-08-15 14:26           ` Vladislav Levenetz
@ 2016-08-18  2:27           ` Sergey Senozhatsky
  2016-08-18  9:33             ` Petr Mladek
  1 sibling, 1 reply; 40+ messages in thread
From: Sergey Senozhatsky @ 2016-08-18  2:27 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Viresh Kumar, Jan Kara, Andrew Morton, Sergey Senozhatsky,
	Jan Kara, Tejun Heo, Tetsuo Handa, linux-kernel, Byungchul Park,
	Sergey Senozhatsky, vlevenetz, Greg Kroah-Hartman

Hello,

really sorry for very long reply.

On (08/12/16 11:44), Petr Mladek wrote:
[..]
> IMHO, this is fine. We force the synchronous mode in critical
> situations anyway.

yes, I think it makes sense to lower the priority (we also have
briefly discussed this in private emails with Viresh). I'd still
prefer to have forced sync-printk on suspend/hibernate/etc., though.

> But I was curious if we could hit a printk from the wake_up_process().
> The change above causes using the fair scheduler and there is
> the following call chain [*]
> 
>   vprintk_emit()
>   -> wake_up_process()
>    -> try_to_wake_up()
>     -> ttwu_queue()
>      -> ttwu_do_activate()
>       -> ttwu_activate()
>        -> activate_task()
> 	-> enqueue_task()
> 	 -> enqueue_task_fair()	    via p->sched_class->enqueue_task
> 	  -> cfs_rq_of()
> 	   -> task_of()
> 	    -> WARN_ON_ONCE(!entity_is_task(se))
> 
> We should never trigger this because printk_kthread is a task.
> But what if the date gets inconsistent?
> 
> Then there is the following chain:
> 
>   vprintk_emit()
>   -> wake_up_process()
>    -> try_to_wake_up()
>     -> ttwu_queue()
>      -> ttwu_do_activate()
>       -> ttwu_activate()
>        -> activate_task()
> 	-> enqueue_task()
> 	 -> enqueue_task_fair()	    via p->sched_class->enqueue_task
> 	  ->hrtick_update()
> 	   -> hrtick_start_fair()
> 	    -> WARN_ON(task_rq(p) != rq)
> 
> This looks like another paranoid consistency check that might be
> triggered when the scheduler gets messed.
> 
> I see few possible solutions:
> 
> 1. Replace the WARN_ONs by printk_deferred().
> 
>    This is the usual solution but it would make debugging less convenient.

what I did internally was a combination of #1 and #3: I introduced a
dump_stack_deferred() function which is basically (almost) a copy-past
of dump_stack() from lib/dump_stack.c with the difference that it calls
printk_deferred(). and added a WARN_ON_DEFERRED() macro.


> 2. Force synchronous printk inside WARN()/BUG() macros.

will it help? semaphore up() calls wake_up_process() regardless the context.
not to mention that we still may have spin_dump() enabled.


> 3. Force printk_deferred() inside WARN()/BUG() macros via the per-CPU
>    printk_func.
> 
>    It might be elegant. But we do not want this outside the scheduler
>    code. Therefore we would need special variants of  WARN_*_SCHED()
>    BUG_*_SCHED() macros.
>
> I personally prefer the 2nd solution. What do you think about it,
> please?

I personally think a combo of #1 and #3 is a bit better than plain #2.

	-ss

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v10 1/2] printk: Make printk() completely async
  2016-08-18  2:27           ` Sergey Senozhatsky
@ 2016-08-18  9:33             ` Petr Mladek
  2016-08-18  9:51               ` Sergey Senozhatsky
  0 siblings, 1 reply; 40+ messages in thread
From: Petr Mladek @ 2016-08-18  9:33 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Viresh Kumar, Jan Kara, Andrew Morton, Sergey Senozhatsky,
	Jan Kara, Tejun Heo, Tetsuo Handa, linux-kernel, Byungchul Park,
	vlevenetz, Greg Kroah-Hartman

On Thu 2016-08-18 11:27:12, Sergey Senozhatsky wrote:
> Hello,
> 
> really sorry for very long reply.
> 
> On (08/12/16 11:44), Petr Mladek wrote:
> [..]
> > IMHO, this is fine. We force the synchronous mode in critical
> > situations anyway.
> 
> yes, I think it makes sense to lower the priority (we also have
> briefly discussed this in private emails with Viresh). I'd still
> prefer to have forced sync-printk on suspend/hibernate/etc., though.

Sounds fine to me.

> > But I was curious if we could hit a printk from the wake_up_process().
> > The change above causes using the fair scheduler and there is
> > the following call chain [*]
> > 
> > I see few possible solutions:
> > 
> > 1. Replace the WARN_ONs by printk_deferred().
> > 
> >    This is the usual solution but it would make debugging less convenient.
> 
> what I did internally was a combination of #1 and #3: I introduced a
> dump_stack_deferred() function which is basically (almost) a copy-past
> of dump_stack() from lib/dump_stack.c with the difference that it calls
> printk_deferred(). and added a WARN_ON_DEFERRED() macro.
> 
> 
> > 2. Force synchronous printk inside WARN()/BUG() macros.
> 
> will it help? semaphore up() calls wake_up_process() regardless the context.
> not to mention that we still may have spin_dump() enabled.

Good point. That changes my preferences :-)

> 
> > 3. Force printk_deferred() inside WARN()/BUG() macros via the per-CPU
> >    printk_func.
> > 
> >    It might be elegant. But we do not want this outside the scheduler
> >    code. Therefore we would need special variants of  WARN_*_SCHED()
> >    BUG_*_SCHED() macros.

Also we need to make sure that everything will be done on a single CPU
as the printk_func is per-CPU variable.

> > I personally prefer the 2nd solution. What do you think about it,
> > please?
> 
> I personally think a combo of #1 and #3 is a bit better than plain #2.

I would need to see the code how it looks and if is really works.
But yes, it seems that this is the way to go.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v10 1/2] printk: Make printk() completely async
  2016-08-18  9:33             ` Petr Mladek
@ 2016-08-18  9:51               ` Sergey Senozhatsky
  2016-08-18 10:56                 ` Petr Mladek
  0 siblings, 1 reply; 40+ messages in thread
From: Sergey Senozhatsky @ 2016-08-18  9:51 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Viresh Kumar, Jan Kara, Andrew Morton,
	Sergey Senozhatsky, Jan Kara, Tejun Heo, Tetsuo Handa,
	linux-kernel, Byungchul Park, vlevenetz, Greg Kroah-Hartman

On (08/18/16 11:33), Petr Mladek wrote:
[..]
> > > 2. Force synchronous printk inside WARN()/BUG() macros.
> > 
> > will it help? semaphore up() calls wake_up_process() regardless the context.
> > not to mention that we still may have spin_dump() enabled.
> 
> Good point. That changes my preferences :-)
> 
> > 
> > > 3. Force printk_deferred() inside WARN()/BUG() macros via the per-CPU
> > >    printk_func.
> > > 
> > >    It might be elegant. But we do not want this outside the scheduler
> > >    code. Therefore we would need special variants of  WARN_*_SCHED()
> > >    BUG_*_SCHED() macros.
> 
> Also we need to make sure that everything will be done on a single CPU
> as the printk_func is per-CPU variable.

lib/dump_stack.c dump_stack() disables local IRQs, so we will stay
on the same CPU.

schematically dump_stack_deferred() looks like the one below. I don't
dump registers, that will require one more *_deferred() function.
and as you can see I've replaced show_stack() with save_stack_trace().
it's not entirely universal (as of now), but worked for my needs at
that time.

---

asmlinkage __visible void dump_stack_deferred(void)
{
       unsigned long flags;
       unsigned long stack_trace[16];
       int was_locked;
       int old, cpu;
       int i;

       struct stack_trace trace = {
               .nr_entries = 0,
               .max_entries = ARRAY_SIZE(stack_trace),
               .entries = stack_trace,
               .skip = 0,
       };

       local_irq_save(flags);
retry:
       cpu = smp_processor_id();
       old = atomic_cmpxchg(&dump_lock, -1, cpu);
       if (old == -1) {
               was_locked = 0;
       } else if (old == cpu) {
               was_locked = 1;
       } else {
               cpu_relax();
               goto retry;
       }

       save_stack_trace(&trace);

       for (i = 0; i < trace.nr_entries; i++) {
               printk_deferred("[%p] %pS\n",
                               (void *)stack_trace[i],
                               (void *)stack_trace[i]);
       }

       if (!was_locked)
               atomic_set(&dump_lock, -1);

       local_irq_restore(flags);
}

---

	-ss

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v10 1/2] printk: Make printk() completely async
  2016-08-18  9:51               ` Sergey Senozhatsky
@ 2016-08-18 10:56                 ` Petr Mladek
  2016-08-19  6:32                   ` Sergey Senozhatsky
  0 siblings, 1 reply; 40+ messages in thread
From: Petr Mladek @ 2016-08-18 10:56 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Viresh Kumar, Jan Kara, Andrew Morton, Sergey Senozhatsky,
	Jan Kara, Tejun Heo, Tetsuo Handa, linux-kernel, Byungchul Park,
	vlevenetz, Greg Kroah-Hartman

On Thu 2016-08-18 18:51:44, Sergey Senozhatsky wrote:
> On (08/18/16 11:33), Petr Mladek wrote:
> [..]
> > > > 2. Force synchronous printk inside WARN()/BUG() macros.
> > > 
> > > will it help? semaphore up() calls wake_up_process() regardless the context.
> > > not to mention that we still may have spin_dump() enabled.
> > 
> > Good point. That changes my preferences :-)
> > 
> > > 
> > > > 3. Force printk_deferred() inside WARN()/BUG() macros via the per-CPU
> > > >    printk_func.
> > > > 
> > > >    It might be elegant. But we do not want this outside the scheduler
> > > >    code. Therefore we would need special variants of  WARN_*_SCHED()
> > > >    BUG_*_SCHED() macros.
> > 
> > Also we need to make sure that everything will be done on a single CPU
> > as the printk_func is per-CPU variable.
> 
> lib/dump_stack.c dump_stack() disables local IRQs, so we will stay
> on the same CPU.

It is less important to stay of the same CPU if you use
printk_deferred() directly. But it us must to have if you
use it via the printk_func per-CPU variable.

The advantage of the printk_func trick is that it is transparent.
You do not need to modify any existing functions used by WARN()/BUG()
macros.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v10 1/2] printk: Make printk() completely async
  2016-08-18 10:56                 ` Petr Mladek
@ 2016-08-19  6:32                   ` Sergey Senozhatsky
  2016-08-19  9:54                     ` Petr Mladek
  0 siblings, 1 reply; 40+ messages in thread
From: Sergey Senozhatsky @ 2016-08-19  6:32 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Viresh Kumar, Jan Kara, Andrew Morton,
	Sergey Senozhatsky, Jan Kara, Tejun Heo, Tetsuo Handa,
	linux-kernel, Byungchul Park, vlevenetz, Greg Kroah-Hartman

On (08/18/16 12:56), Petr Mladek wrote:
> On Thu 2016-08-18 18:51:44, Sergey Senozhatsky wrote:
> > On (08/18/16 11:33), Petr Mladek wrote:
> > [..]
> > > > > 2. Force synchronous printk inside WARN()/BUG() macros.
> > > > 
> > > > will it help? semaphore up() calls wake_up_process() regardless the context.
> > > > not to mention that we still may have spin_dump() enabled.
> > > 
> > > Good point. That changes my preferences :-)
> > > 
> > > > 
> > > > > 3. Force printk_deferred() inside WARN()/BUG() macros via the per-CPU
> > > > >    printk_func.
> > > > > 
> > > > >    It might be elegant. But we do not want this outside the scheduler
> > > > >    code. Therefore we would need special variants of  WARN_*_SCHED()
> > > > >    BUG_*_SCHED() macros.
> > > 
> > > Also we need to make sure that everything will be done on a single CPU
> > > as the printk_func is per-CPU variable.
> > 
> > lib/dump_stack.c dump_stack() disables local IRQs, so we will stay
> > on the same CPU.
> 
> It is less important to stay of the same CPU if you use
> printk_deferred() directly. But it us must to have if you
> use it via the printk_func per-CPU variable.
> 
> The advantage of the printk_func trick is that it is transparent.
> You do not need to modify any existing functions used by WARN()/BUG()
> macros.

good point.

so something like below, perhaps. I'm less sure about
deferred BUG()/BUG_ON():

#define DEFERRED_BUG() do { 		\
	printk_deferred_enter();	\
	BUG();				\
	printk_deferred_exit();		\
} while (0)				\

#define DEFERRED_BUG_ON(condition) do { \
	printk_deferred_enter();	\
	BUG_ON(condition);		\
	printk_deferred_exit();		\
} while (0)

depending on .config BUG() may never return back -- passing control
to do_exit(), so printk_deferred_exit() won't be executed. thus we
probably need to have a per-cpu variable that would indicate that
we are in deferred_bug. hm... but do we really need deferred BUG()
in the first place?


Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>

---

 include/asm-generic/bug.h | 50 +++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/printk.h    |  6 ++++++
 kernel/printk/internal.h  | 18 +----------------
 kernel/printk/printk.c    | 42 ++++++++++++++++++++++++++++++++++-----
 4 files changed, 94 insertions(+), 22 deletions(-)

diff --git a/include/asm-generic/bug.h b/include/asm-generic/bug.h
index 6f96247..d72ee1e 100644
--- a/include/asm-generic/bug.h
+++ b/include/asm-generic/bug.h
@@ -60,6 +60,10 @@ struct bug_entry {
  * significant issues that need prompt attention if they should ever
  * appear at runtime.  Use the versions with printk format strings
  * to provide better diagnostics.
+ *
+ * DEFERRED_WARN macros call printk_deferred() to print the messages
+ * and are meant to be used from the contexts where direct printk()
+ * can deadlock the system.
  */
 #ifndef __WARN_TAINT
 extern __printf(3, 4)
@@ -145,6 +149,52 @@ void __warn(const char *file, int line, void *caller, unsigned taint,
 	unlikely(__ret_warn_once);				\
 })
 
+#define DEFERRED_WARN_ON(condition) ({				\
+	int __ret_warn_on = !!(condition);			\
+	if (unlikely(__ret_warn_on)) {				\
+		printk_deferred_enter();			\
+		__WARN();					\
+		printk_deferred_exit();				\
+	}							\
+	unlikely(__ret_warn_on);				\
+})
+
+#define DEFERRED_WARN(condition, format...) ({			\
+	int __ret_warn_on = !!(condition);			\
+	if (unlikely(__ret_warn_on)) {				\
+		printk_deferred_enter();			\
+		__WARN_printf(format);				\
+		printk_deferred_exit();				\
+	}							\
+	unlikely(__ret_warn_on);				\
+})
+
+#define DEFERRED_WARN_ON_ONCE(condition)	({		\
+	static bool __section(.data.unlikely) __warned;		\
+	int __ret_warn_once = !!(condition);			\
+								\
+	if (unlikely(__ret_warn_once && !__warned)) {		\
+		__warned = true;				\
+		printk_deferred_enter();			\
+		WARN_ON(1);					\
+		printk_deferred_exit();				\
+	}							\
+	unlikely(__ret_warn_once);				\
+})
+
+#define DEFERRED_WARN_ONCE(condition, format...)	({	\
+	static bool __section(.data.unlikely) __warned;		\
+	int __ret_warn_once = !!(condition);			\
+								\
+	if (unlikely(__ret_warn_once && !__warned)) {		\
+		__warned = true;				\
+		printk_deferred_enter();			\
+		WARN(1, format);				\
+		printk_deferred_exit();				\
+	}							\
+	unlikely(__ret_warn_once);				\
+})
+
 #else /* !CONFIG_BUG */
 #ifndef HAVE_ARCH_BUG
 #define BUG() do {} while (1)
diff --git a/include/linux/printk.h b/include/linux/printk.h
index 696a56b..5142654 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -193,6 +193,9 @@ void __init setup_log_buf(int early);
 __printf(1, 2) void dump_stack_set_arch_desc(const char *fmt, ...);
 void dump_stack_print_info(const char *log_lvl);
 void show_regs_print_info(const char *log_lvl);
+
+extern void printk_deferred_enter(void);
+extern void printk_deferred_exit(void);
 #else
 static inline __printf(1, 0)
 int vprintk(const char *s, va_list args)
@@ -252,6 +255,9 @@ static inline void dump_stack_print_info(const char *log_lvl)
 static inline void show_regs_print_info(const char *log_lvl)
 {
 }
+
+static void printk_deferred_enter(void) {}
+static void printk_deferred_exit(void) {}
 #endif
 
 extern asmlinkage void dump_stack(void) __cold;
diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h
index 7fd2838..70f1cf2 100644
--- a/kernel/printk/internal.h
+++ b/kernel/printk/internal.h
@@ -19,23 +19,12 @@
 typedef __printf(1, 0) int (*printk_func_t)(const char *fmt, va_list args);
 
 int __printf(1, 0) vprintk_default(const char *fmt, va_list args);
+DECLARE_PER_CPU(printk_func_t, printk_func);
 
 #ifdef CONFIG_PRINTK_NMI
 
 extern raw_spinlock_t logbuf_lock;
 
-/*
- * printk() could not take logbuf_lock in NMI context. Instead,
- * it temporary stores the strings into a per-CPU buffer.
- * The alternative implementation is chosen transparently
- * via per-CPU variable.
- */
-DECLARE_PER_CPU(printk_func_t, printk_func);
-static inline __printf(1, 0) int vprintk_func(const char *fmt, va_list args)
-{
-	return this_cpu_read(printk_func)(fmt, args);
-}
-
 extern atomic_t nmi_message_lost;
 static inline int get_nmi_message_lost(void)
 {
@@ -44,11 +33,6 @@ static inline int get_nmi_message_lost(void)
 
 #else /* CONFIG_PRINTK_NMI */
 
-static inline __printf(1, 0) int vprintk_func(const char *fmt, va_list args)
-{
-	return vprintk_default(fmt, args);
-}
-
 static inline int get_nmi_message_lost(void)
 {
 	return 0;
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 0d3e026..6e260a0 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2026,7 +2026,17 @@ asmlinkage __visible int printk(const char *fmt, ...)
 	int r;
 
 	va_start(args, fmt);
-	r = vprintk_func(fmt, args);
+	/*
+	 * printk() could not take logbuf_lock in NMI context. Instead,
+	 * it temporary stores the strings into a per-CPU buffer. The
+	 * alternative implementation is chosen transparently via per-CPU
+	 * variable.
+	 *
+	 * We also switch printk_func from default to vprintk_deferred()
+	 * when in DEFERRED_WARN()/DEFERRED_BUG(), because it may be
+	 * unsafe to execute vprintk_default().
+	 */
+	r = this_cpu_read(printk_func)(fmt, args);
 	va_end(args);
 
 	return r;
@@ -3037,6 +3047,17 @@ void wake_up_klogd(void)
 	preempt_enable();
 }
 
+static int vprintk_deferred(const char *fmt, va_list args)
+{
+	int r;
+
+	r = vprintk_emit(0, LOGLEVEL_SCHED, NULL, 0, fmt, args);
+	__this_cpu_or(printk_pending, PRINTK_PENDING_OUTPUT);
+	irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
+
+	return r;
+}
+
 int printk_deferred(const char *fmt, ...)
 {
 	va_list args;
@@ -3044,16 +3065,27 @@ int printk_deferred(const char *fmt, ...)
 
 	preempt_disable();
 	va_start(args, fmt);
-	r = vprintk_emit(0, LOGLEVEL_SCHED, NULL, 0, fmt, args);
+	r = vprintk_deferred(fmt, args);
 	va_end(args);
-
-	__this_cpu_or(printk_pending, PRINTK_PENDING_OUTPUT);
-	irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
 	preempt_enable();
 
 	return r;
 }
 
+void printk_deferred_enter(void)
+{
+	preempt_disable();
+	this_cpu_write(printk_func, vprintk_deferred);
+}
+EXPORT_SYMBOL(printk_deferred_enter);
+
+void printk_deferred_exit(void)
+{
+	this_cpu_write(printk_func, vprintk_default);
+	preempt_enable();
+}
+EXPORT_SYMBOL(printk_deferred_exit);
+
 /*
  * printk rate limiting, lifted from the networking subsystem.
  *
-- 
2.9.3

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v10 1/2] printk: Make printk() completely async
  2016-08-19  6:32                   ` Sergey Senozhatsky
@ 2016-08-19  9:54                     ` Petr Mladek
  2016-08-19 19:00                       ` Jan Kara
  0 siblings, 1 reply; 40+ messages in thread
From: Petr Mladek @ 2016-08-19  9:54 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Viresh Kumar, Jan Kara, Andrew Morton, Sergey Senozhatsky,
	Jan Kara, Tejun Heo, Tetsuo Handa, linux-kernel, Byungchul Park,
	vlevenetz, Greg Kroah-Hartman

On Fri 2016-08-19 15:32:36, Sergey Senozhatsky wrote:
> On (08/18/16 12:56), Petr Mladek wrote:
> > The advantage of the printk_func trick is that it is transparent.
> > You do not need to modify any existing functions used by WARN()/BUG()
> > macros.
> 
> good point.
> 
> so something like below, perhaps. I'm less sure about
> deferred BUG()/BUG_ON():
> 
> #define DEFERRED_BUG() do { 		\
> 	printk_deferred_enter();	\
> 	BUG();				\
> 	printk_deferred_exit();		\
> } while (0)				\
> 
> #define DEFERRED_BUG_ON(condition) do { \
> 	printk_deferred_enter();	\
> 	BUG_ON(condition);		\
> 	printk_deferred_exit();		\
> } while (0)
> 
> depending on .config BUG() may never return back -- passing control
> to do_exit(), so printk_deferred_exit() won't be executed. thus we
> probably need to have a per-cpu variable that would indicate that
> we are in deferred_bug. hm... but do we really need deferred BUG()
> in the first place?

Good question. I am not aware of any BUG_ON() that would be called from
wake_up_process() but it is hard to check everything.

A conservative approach would be to force synchronous printk from
BUG_ON().


> Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
> 
> ---
> 
>  include/asm-generic/bug.h | 50 +++++++++++++++++++++++++++++++++++++++++++++++
>  include/linux/printk.h    |  6 ++++++
>  kernel/printk/internal.h  | 18 +----------------
>  kernel/printk/printk.c    | 42 ++++++++++++++++++++++++++++++++++-----
>  4 files changed, 94 insertions(+), 22 deletions(-)
> 
> diff --git a/include/asm-generic/bug.h b/include/asm-generic/bug.h
> index 6f96247..d72ee1e 100644
> --- a/include/asm-generic/bug.h
> +++ b/include/asm-generic/bug.h
> @@ -60,6 +60,10 @@ struct bug_entry {
>   * significant issues that need prompt attention if they should ever
>   * appear at runtime.  Use the versions with printk format strings
>   * to provide better diagnostics.
> + *
> + * DEFERRED_WARN macros call printk_deferred() to print the messages
> + * and are meant to be used from the contexts where direct printk()
> + * can deadlock the system.
>   */
>  #ifndef __WARN_TAINT
>  extern __printf(3, 4)
> @@ -145,6 +149,52 @@ void __warn(const char *file, int line, void *caller, unsigned taint,
>  	unlikely(__ret_warn_once);				\
>  })
>  
> +#define DEFERRED_WARN_ON(condition) ({				\
> +	int __ret_warn_on = !!(condition);			\
> +	if (unlikely(__ret_warn_on)) {				\
> +		printk_deferred_enter();			\
> +		__WARN();					\
> +		printk_deferred_exit();				\
> +	}							\
> +	unlikely(__ret_warn_on);				\
> +})

This looks reasonable to me.


> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 0d3e026..6e260a0 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
>  
> +void printk_deferred_enter(void)
> +{
> +	preempt_disable();
> +	this_cpu_write(printk_func, vprintk_deferred);
> +}
> +EXPORT_SYMBOL(printk_deferred_enter);
> +
> +void printk_deferred_exit(void)
> +{
> +	this_cpu_write(printk_func, vprintk_default);
> +	preempt_enable();
> +}
> +EXPORT_SYMBOL(printk_deferred_exit);
> +

This is racy with printk_nmi_enter() and printk_nmi_exit().
It need to work both ways. It must keep printk_deferred()
even when DEFERRED_WARN_ON() is called in nmi context.
Also it must keep printk_deferred() when the DEFERRED_WARN_ON()
is interrupted by an nmi.

It think that best solution would be to allow nesting.
What about replacing "printk_func" per-CPU variable
with a per-CPU atomic counter. Then we could just
check the counter in vprintk_emit() to see if it is
deferred or not.

The approach with printk_func() was more generic. We thought
that it might be used even for a transparent early_printk().
But it still might be solved even with the per-CPU atomic
counter. We could fallback to the early_printk when a
global flag is set or so.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v10 1/2] printk: Make printk() completely async
  2016-08-19  9:54                     ` Petr Mladek
@ 2016-08-19 19:00                       ` Jan Kara
  2016-08-20  5:24                         ` Sergey Senozhatsky
  2016-08-23 13:48                         ` Petr Mladek
  0 siblings, 2 replies; 40+ messages in thread
From: Jan Kara @ 2016-08-19 19:00 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Viresh Kumar, Jan Kara, Andrew Morton,
	Sergey Senozhatsky, Jan Kara, Tejun Heo, Tetsuo Handa,
	linux-kernel, Byungchul Park, vlevenetz, Greg Kroah-Hartman

On Fri 19-08-16 11:54:55, Petr Mladek wrote:
> On Fri 2016-08-19 15:32:36, Sergey Senozhatsky wrote:
> > On (08/18/16 12:56), Petr Mladek wrote:
> > > The advantage of the printk_func trick is that it is transparent.
> > > You do not need to modify any existing functions used by WARN()/BUG()
> > > macros.
> > 
> > good point.
> > 
> > so something like below, perhaps. I'm less sure about
> > deferred BUG()/BUG_ON():
> > 
> > #define DEFERRED_BUG() do { 		\
> > 	printk_deferred_enter();	\
> > 	BUG();				\
> > 	printk_deferred_exit();		\
> > } while (0)				\
> > 
> > #define DEFERRED_BUG_ON(condition) do { \
> > 	printk_deferred_enter();	\
> > 	BUG_ON(condition);		\
> > 	printk_deferred_exit();		\
> > } while (0)
> > 
> > depending on .config BUG() may never return back -- passing control
> > to do_exit(), so printk_deferred_exit() won't be executed. thus we
> > probably need to have a per-cpu variable that would indicate that
> > we are in deferred_bug. hm... but do we really need deferred BUG()
> > in the first place?
> 
> Good question. I am not aware of any BUG_ON() that would be called from
> wake_up_process() but it is hard to check everything.
> 
> A conservative approach would be to force synchronous printk from
> BUG_ON().

Just a quick thought: Cannot we just do printk_deferred_enter() when we are
about to call into the scheduler from printk code and printk_deferred_exit()
when leaving it? That would look like the least error-prone way how
handling this kind of recursion...

OTOH there's also the other possible direction for the recursion when we
are in the scheduler, holding some scheduler locks, decide to WARN which
enters printk, that ends up calling wake_up_process() which deadlocks
on scheduler locks... I don't see how to handle this type of recursion
inside the printk code itself easily and so far the answer was - use
printk_deferred() in the scheduler and don't use WARN...

Hum, maybe we could add lockdep annotation to a WARN_ON and BUG_ON macros so
that it would grab and release console_sem (even if the condition is false).
That way we'd get lockdep splats for all the possible WARN_ON and BUG_ON
calls that could deadlock.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v10 1/2] printk: Make printk() completely async
  2016-08-19 19:00                       ` Jan Kara
@ 2016-08-20  5:24                         ` Sergey Senozhatsky
  2016-08-22  4:15                           ` Sergey Senozhatsky
  2016-08-23 13:03                           ` Petr Mladek
  2016-08-23 13:48                         ` Petr Mladek
  1 sibling, 2 replies; 40+ messages in thread
From: Sergey Senozhatsky @ 2016-08-20  5:24 UTC (permalink / raw)
  To: Jan Kara
  Cc: Petr Mladek, Sergey Senozhatsky, Viresh Kumar, Andrew Morton,
	Sergey Senozhatsky, Jan Kara, Tejun Heo, Tetsuo Handa,
	linux-kernel, Byungchul Park, vlevenetz, Greg Kroah-Hartman

On (08/19/16 21:00), Jan Kara wrote:
> > > depending on .config BUG() may never return back -- passing control
> > > to do_exit(), so printk_deferred_exit() won't be executed. thus we
> > > probably need to have a per-cpu variable that would indicate that
> > > we are in deferred_bug. hm... but do we really need deferred BUG()
> > > in the first place?
> > 
> > Good question. I am not aware of any BUG_ON() that would be called from
> > wake_up_process() but it is hard to check everything.
> > 
> > A conservative approach would be to force synchronous printk from
> > BUG_ON().
> 
> Just a quick thought: Cannot we just do printk_deferred_enter() when we are
> about to call into the scheduler from printk code and printk_deferred_exit()
> when leaving it? That would look like the least error-prone way how
> handling this kind of recursion...

interesting idea.
printk_deferred_enter() increments preempt count, so there may be additional
obstacles and, as a result, ad-hocs, that scheduler people will sincerely hate.
need to think more.

> OTOH there's also the other possible direction for the recursion when we
> are in the scheduler, holding some scheduler locks, decide to WARN which
> enters printk, that ends up calling wake_up_process() which deadlocks
> on scheduler locks... I don't see how to handle this type of recursion
> inside the printk code itself easily and so far the answer was - use
> printk_deferred() in the scheduler and don't use WARN...

the recursion detection is really tricky, yes. it seems (and I haven't
thought of it good enough) to be a bit simpler when we operate in async
printk mode, because we remove this uncontrollable console_unlock().
so we can do something like this:

vprintk_emit(....)
{
	local_irq_save();

	if (this_cpu_read(in_printk)) {
		log_store(BUG: printk recursion!");
		goto out;
	}

	this_cpu_write(in_printk) = 1;

	raw_spin_lock(&logbuf_lock);
	log_store();
	raw_spin_unlock(&logbuf_lock);

	if (!in_sched) {
		if (console_loglevel != CONSOLE_LOGLEVEL_MOTORMOUTH &&
				can_printk_async()) {
			printk_kthread_need_flush_console = true;
			wake_up_process(printk_kthread);
		}
	}

	this_cpu_write(in_printk) = 0;
out:
	local_irq_restore();
}

async printk mode from this point of view is sort of atomic.
we can even set different values of per-CPU `in_printk' on various
stages of printk, which will permit to have better recursion handling.
for example, if we recurse from raw_spin_unlock(&logbuf_lock) then we
must re-init logbuf_lock, because it's 99% corrupted... and so on. but
I haven't really thought of it yet. it obviously doesn't work for sync
printk mode.

> Hum, maybe we could add lockdep annotation to a WARN_ON and BUG_ON macros so
> that it would grab and release console_sem (even if the condition is false).
> That way we'd get lockdep splats for all the possible WARN_ON and BUG_ON
> calls that could deadlock.

hm.

	-ss

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v10 1/2] printk: Make printk() completely async
  2016-08-20  5:24                         ` Sergey Senozhatsky
@ 2016-08-22  4:15                           ` Sergey Senozhatsky
  2016-08-23 12:19                             ` Petr Mladek
  2016-08-25 21:10                             ` Petr Mladek
  2016-08-23 13:03                           ` Petr Mladek
  1 sibling, 2 replies; 40+ messages in thread
From: Sergey Senozhatsky @ 2016-08-22  4:15 UTC (permalink / raw)
  To: Jan Kara, Petr Mladek
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Viresh Kumar,
	Andrew Morton, Jan Kara, Tejun Heo, Tetsuo Handa, linux-kernel,
	Byungchul Park, vlevenetz, Greg Kroah-Hartman

Hello,

On (08/20/16 14:24), Sergey Senozhatsky wrote:
> On (08/19/16 21:00), Jan Kara wrote:
> > > > depending on .config BUG() may never return back -- passing control
> > > > to do_exit(), so printk_deferred_exit() won't be executed. thus we
> > > > probably need to have a per-cpu variable that would indicate that
> > > > we are in deferred_bug. hm... but do we really need deferred BUG()
> > > > in the first place?
> > > 
> > > Good question. I am not aware of any BUG_ON() that would be called from
> > > wake_up_process() but it is hard to check everything.
> > > 
> > > A conservative approach would be to force synchronous printk from
> > > BUG_ON().
> > 
> > Just a quick thought: Cannot we just do printk_deferred_enter() when we are
> > about to call into the scheduler from printk code and printk_deferred_exit()
> > when leaving it? That would look like the least error-prone way how
> > handling this kind of recursion...
> 
> interesting idea.
> printk_deferred_enter() increments preempt count, so there may be additional
> obstacles and, as a result, ad-hocs, that scheduler people will sincerely hate.
> need to think more.

so we probably can try something like this

---

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 2add7c5..b23f919 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2014,6 +2014,7 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
         * set_current_state() the waiting thread does.
         */
        smp_mb__before_spinlock();
+       printk_deferred_enter();
        raw_spin_lock_irqsave(&p->pi_lock, flags);
        if (!(p->state & state))
                goto out;
@@ -2073,6 +2074,7 @@ stat:
                ttwu_stat(p, cpu, wake_flags);
 out:
        raw_spin_unlock_irqrestore(&p->pi_lock, flags);
+       printk_deferred_exit();
 
        return success;
 }


---

since we are basically interested in wake_up_process() only from
printk() POV. not sure how acceptable 2 * preempt_count and 2 * per-CPU
writes for every try_to_wake_up().


the other thing I just thought of is doing something as follows
!!!not tested, will not compile, just an idea!!!

---

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 6e260a0..bb8d719 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1789,6 +1789,7 @@ asmlinkage int vprintk_emit(int facility, int level,
        printk_delay();
 
        local_irq_save(flags);
+       printk_nmi_enter();
        this_cpu = smp_processor_id();
 
        /*
@@ -1804,6 +1805,7 @@ asmlinkage int vprintk_emit(int facility, int level,
                 */
                if (!oops_in_progress && !lockdep_recursing(current)) {
                        recursion_bug = true;
+                       printk_nmi_exit();
                        local_irq_restore(flags);
                        return 0;
                }
@@ -1920,6 +1922,7 @@ asmlinkage int vprintk_emit(int facility, int level,
        logbuf_cpu = UINT_MAX;
        raw_spin_unlock(&logbuf_lock);
        lockdep_on();
+       printk_nmi_exit();
        local_irq_restore(flags);
 
        /* If called from the scheduler, we can not call up(). */
---

so may be we will not blow up in case of spin_dump() on logbuf_lock.
well, if logbuf_lock was corrupted then nothing will help us -- it's over.
but if raw_spin_lock(&logbuf_lock), for instance, spin_dump()-s because
logbuf_lock was not released after `loops_per_jiffy * HZ' then deferred
printing may help.


probably we can make NMI printk_func to be more general way of using
an alternative-printk buffer and, for example, rename API it to
printk_alternative_enter() and printk_alternative_exit(), not to confuse
anyone. (yes, NMIs are not available on every platform)


and even more. we can start dump_stack() from recursion detection path
to that alternative printk-buffer, which is a bit more helpful than
"BUG: recent printk recursion!"
!!!not tested, will not compile, just an idea!!!

---

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 6e260a0..ebce39a 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1789,6 +1789,7 @@ asmlinkage int vprintk_emit(int facility, int level,
        printk_delay();
 
        local_irq_save(flags);
+       printk_nmi_enter();
        this_cpu = smp_processor_id();
 
        /*
@@ -1804,6 +1805,8 @@ asmlinkage int vprintk_emit(int facility, int level,
                 */
                if (!oops_in_progress && !lockdep_recursing(current)) {
                        recursion_bug = true;
+                       WARN_ON(1);		/* <<< dump_stack to alternative buffer */
+                       printk_nmi_exit();
                        local_irq_restore(flags);
                        return 0;
                }
@@ -1816,14 +1819,8 @@ asmlinkage int vprintk_emit(int facility, int level,
        logbuf_cpu = this_cpu;
 
        if (unlikely(recursion_bug)) {
-               static const char recursion_msg[] =
-                       "BUG: recent printk recursion!";
-
                recursion_bug = false;
-               /* emit KERN_CRIT message */
-               printed_len += log_store(0, 2, LOG_PREFIX|LOG_NEWLINE, 0,
-                                        NULL, 0, recursion_msg,
-                                        strlen(recursion_msg));
+               printk_nmi_flush();
        }
 
        nmi_message_lost = get_nmi_message_lost();
@@ -1920,6 +1917,7 @@ asmlinkage int vprintk_emit(int facility, int level,
        logbuf_cpu = UINT_MAX;
        raw_spin_unlock(&logbuf_lock);
        lockdep_on();
+       printk_nmi_exit();
        local_irq_restore(flags);
 
        /* If called from the scheduler, we can not call up(). */

---

just some quick thoughts.


	-ss

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v10 1/2] printk: Make printk() completely async
  2016-08-22  4:15                           ` Sergey Senozhatsky
@ 2016-08-23 12:19                             ` Petr Mladek
  2016-08-24  1:33                               ` Sergey Senozhatsky
  2016-08-25 21:10                             ` Petr Mladek
  1 sibling, 1 reply; 40+ messages in thread
From: Petr Mladek @ 2016-08-23 12:19 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Jan Kara, Sergey Senozhatsky, Viresh Kumar, Andrew Morton,
	Jan Kara, Tejun Heo, Tetsuo Handa, linux-kernel, Byungchul Park,
	vlevenetz, Greg Kroah-Hartman

On Mon 2016-08-22 13:15:20, Sergey Senozhatsky wrote:
> Hello,
> 
> On (08/20/16 14:24), Sergey Senozhatsky wrote:
> > On (08/19/16 21:00), Jan Kara wrote:
> > > > > depending on .config BUG() may never return back -- passing control
> > > > > to do_exit(), so printk_deferred_exit() won't be executed. thus we
> > > > > probably need to have a per-cpu variable that would indicate that
> > > > > we are in deferred_bug. hm... but do we really need deferred BUG()
> > > > > in the first place?
> > > > 
> > > > Good question. I am not aware of any BUG_ON() that would be called from
> > > > wake_up_process() but it is hard to check everything.
> > > > 
> > > > A conservative approach would be to force synchronous printk from
> > > > BUG_ON().
> > > 
> > > Just a quick thought: Cannot we just do printk_deferred_enter() when we are
> > > about to call into the scheduler from printk code and printk_deferred_exit()
> > > when leaving it? That would look like the least error-prone way how
> > > handling this kind of recursion...
> > 
> > interesting idea.
> > printk_deferred_enter() increments preempt count, so there may be additional
> > obstacles and, as a result, ad-hocs, that scheduler people will sincerely hate.
> > need to think more.
> 
> the other thing I just thought of is doing something as follows
> !!!not tested, will not compile, just an idea!!!
> 
> ---
> 
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 6e260a0..bb8d719 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -1789,6 +1789,7 @@ asmlinkage int vprintk_emit(int facility, int level,
>         printk_delay();
>  
>         local_irq_save(flags);
> +       printk_nmi_enter();
>         this_cpu = smp_processor_id();

Huh, this looks very interesting but I am afraid that it will not fly.
The problem is that vprintk_nmi() is safe only when it is used
exclusively in NMI.

The following could happen with your code:

/**** normar context ****/
vprintk_emit()
  printk_nmi_enter()
  ...
  wake_up_process()
    WARN()
      printk()
	vprintk_nmi()
	  vsnprintf(..., "0123456789")

	    /* real NMI comes after writing "01234" */
	    /**** NMI context ****/
	    vprintk_nmi();
	      vsnprintf(..., "abcdefghijklmno");

	    /**** normal context ****/
	    /* we finish writing "56789" into the buffer */

=> part of the message from NMI gets broken "abcde56789klmno".

The lockless handling of the NMI per-CPU buffer already is not
trivial. I would be afraid to add more hacks to make
it writable in all contexts.

I am sorry about the bad news. This was so promising on the first
look.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v10 1/2] printk: Make printk() completely async
  2016-08-20  5:24                         ` Sergey Senozhatsky
  2016-08-22  4:15                           ` Sergey Senozhatsky
@ 2016-08-23 13:03                           ` Petr Mladek
  1 sibling, 0 replies; 40+ messages in thread
From: Petr Mladek @ 2016-08-23 13:03 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Jan Kara, Sergey Senozhatsky, Viresh Kumar, Andrew Morton,
	Jan Kara, Tejun Heo, Tetsuo Handa, linux-kernel, Byungchul Park,
	vlevenetz, Greg Kroah-Hartman

On Sat 2016-08-20 14:24:30, Sergey Senozhatsky wrote:
> On (08/19/16 21:00), Jan Kara wrote:
> > > > depending on .config BUG() may never return back -- passing control
> > > > to do_exit(), so printk_deferred_exit() won't be executed. thus we
> > > > probably need to have a per-cpu variable that would indicate that
> > > > we are in deferred_bug. hm... but do we really need deferred BUG()
> > > > in the first place?
> > > 
> > > Good question. I am not aware of any BUG_ON() that would be called from
> > > wake_up_process() but it is hard to check everything.
> > > 
> > > A conservative approach would be to force synchronous printk from
> > > BUG_ON().
> > 
> > Just a quick thought: Cannot we just do printk_deferred_enter() when we are
> > about to call into the scheduler from printk code and printk_deferred_exit()
> > when leaving it? That would look like the least error-prone way how
> > handling this kind of recursion...
> 
> interesting idea.
> printk_deferred_enter() increments preempt count, so there may be additional
> obstacles and, as a result, ad-hocs, that scheduler people will sincerely hate.
> need to think more.

I wonder if this would be acceptable at least for
wake_up_process(). It seems to be the only scheduler function that we
are interested in. And we might call it from vprintk

> > OTOH there's also the other possible direction for the recursion when we
> > are in the scheduler, holding some scheduler locks, decide to WARN which
> > enters printk, that ends up calling wake_up_process() which deadlocks
> > on scheduler locks... I don't see how to handle this type of recursion
> > inside the printk code itself easily and so far the answer was - use
> > printk_deferred() in the scheduler and don't use WARN...
> 
> the recursion detection is really tricky, yes. it seems (and I haven't
> thought of it good enough) to be a bit simpler when we operate in async
> printk mode, because we remove this uncontrollable console_unlock().
> so we can do something like this:
> 
> vprintk_emit(....)
> {
> 	local_irq_save();
> 
> 	if (this_cpu_read(in_printk)) {
> 		log_store(BUG: printk recursion!");
> 		goto out;
> 	}

This does not quarantee that we have the logbug_lock. We might endup
here from the raw_spin_lock() call and the lock might be owned by
another CPU.

I am afraid that we could only set some global variable here.

> 
> 	this_cpu_write(in_printk) = 1;
> 
> 	raw_spin_lock(&logbuf_lock);
> 	log_store();
> 	raw_spin_unlock(&logbuf_lock);
> 
> 	if (!in_sched) {
> 		if (console_loglevel != CONSOLE_LOGLEVEL_MOTORMOUTH &&
> 				can_printk_async()) {
> 			printk_kthread_need_flush_console = true;
> 			wake_up_process(printk_kthread);
> 		}
> 	}
> 
> 	this_cpu_write(in_printk) = 0;
> out:
> 	local_irq_restore();
> }
>
> async printk mode from this point of view is sort of atomic.

This would prevent using printk_deferred() from the scheduler code.

A solution would be to set the per-CPU variable only around the
wake_up_process() call. Well, it is orthogonal to using
printk_deferred_enter() around calling wake_up_process().

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v10 1/2] printk: Make printk() completely async
  2016-08-19 19:00                       ` Jan Kara
  2016-08-20  5:24                         ` Sergey Senozhatsky
@ 2016-08-23 13:48                         ` Petr Mladek
  1 sibling, 0 replies; 40+ messages in thread
From: Petr Mladek @ 2016-08-23 13:48 UTC (permalink / raw)
  To: Jan Kara
  Cc: Sergey Senozhatsky, Viresh Kumar, Andrew Morton,
	Sergey Senozhatsky, Jan Kara, Tejun Heo, Tetsuo Handa,
	linux-kernel, Byungchul Park, vlevenetz, Greg Kroah-Hartman

On Fri 2016-08-19 21:00:07, Jan Kara wrote:
> On Fri 19-08-16 11:54:55, Petr Mladek wrote:
> > On Fri 2016-08-19 15:32:36, Sergey Senozhatsky wrote:
> > > On (08/18/16 12:56), Petr Mladek wrote:
> > > > The advantage of the printk_func trick is that it is transparent.
> > > > You do not need to modify any existing functions used by WARN()/BUG()
> > > > macros.
> > > 
> > > good point.
> > > 
> > > so something like below, perhaps. I'm less sure about
> > > deferred BUG()/BUG_ON():
> > > 
> > > #define DEFERRED_BUG() do { 		\
> > > 	printk_deferred_enter();	\
> > > 	BUG();				\
> > > 	printk_deferred_exit();		\
> > > } while (0)				\
> > > 
> > > #define DEFERRED_BUG_ON(condition) do { \
> > > 	printk_deferred_enter();	\
> > > 	BUG_ON(condition);		\
> > > 	printk_deferred_exit();		\
> > > } while (0)
> > > 
> > > depending on .config BUG() may never return back -- passing control
> > > to do_exit(), so printk_deferred_exit() won't be executed. thus we
> > > probably need to have a per-cpu variable that would indicate that
> > > we are in deferred_bug. hm... but do we really need deferred BUG()
> > > in the first place?
> > 
> > Good question. I am not aware of any BUG_ON() that would be called from
> > wake_up_process() but it is hard to check everything.
> > 
> > A conservative approach would be to force synchronous printk from
> > BUG_ON().
> 
> Just a quick thought: Cannot we just do printk_deferred_enter() when we are
> about to call into the scheduler from printk code and printk_deferred_exit()
> when leaving it? That would look like the least error-prone way how
> handling this kind of recursion...
> 
> OTOH there's also the other possible direction for the recursion when we
> are in the scheduler, holding some scheduler locks, decide to WARN which
> enters printk, that ends up calling wake_up_process() which deadlocks
> on scheduler locks... I don't see how to handle this type of recursion
> inside the printk code itself easily and so far the answer was - use
> printk_deferred() in the scheduler and don't use WARN...
> 
> Hum, maybe we could add lockdep annotation to a WARN_ON and BUG_ON macros so
> that it would grab and release console_sem (even if the condition is false).
> That way we'd get lockdep splats for all the possible WARN_ON and BUG_ON
> calls that could deadlock.

The idea is interesting but I think that we do not want the fake
grab/release of the console_sem.

We use console_trylock() in vprintk_emit(). Please note the "try"
variant. So it is safe to call a nested printk() from the console code.

IMHO, we want to avoid calling console from the scheduler code because:

  1. console is slow and we do not want to block the scheduler.

  2. console_unlock() calls wake_up_process() and we do not want
     a deadlock by the scheduler locks.


Therefore I think that we want to detect something specific
from the scheduler that is also reachable from WARN()/printk().

Best Regards,
Petr

PS: My brain rotated several times this day around these problems.
I hope that my opinion still makes some sense :-)

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v10 1/2] printk: Make printk() completely async
  2016-08-23 12:19                             ` Petr Mladek
@ 2016-08-24  1:33                               ` Sergey Senozhatsky
  0 siblings, 0 replies; 40+ messages in thread
From: Sergey Senozhatsky @ 2016-08-24  1:33 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Jan Kara, Sergey Senozhatsky, Viresh Kumar,
	Andrew Morton, Jan Kara, Tejun Heo, Tetsuo Handa, linux-kernel,
	Byungchul Park, vlevenetz, Greg Kroah-Hartman

On (08/23/16 14:19), Petr Mladek wrote:
> > the other thing I just thought of is doing something as follows
> > !!!not tested, will not compile, just an idea!!!
> > 
> > ---
> > 
> > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> > index 6e260a0..bb8d719 100644
> > --- a/kernel/printk/printk.c
> > +++ b/kernel/printk/printk.c
> > @@ -1789,6 +1789,7 @@ asmlinkage int vprintk_emit(int facility, int level,
> >         printk_delay();
> >  
> >         local_irq_save(flags);
> > +       printk_nmi_enter();
> >         this_cpu = smp_processor_id();
> 
> Huh, this looks very interesting but I am afraid that it will not fly.
> The problem is that vprintk_nmi() is safe only when it is used
> exclusively in NMI.
> 
> The following could happen with your code:

sure. I took NMI as a quick and easy example that did not require a lot
of words to explain because you guys are familiar with NMI printing.
the thing is - we don't have to share the same buffer with the actual NMI,
but instead have our own small 'alternative-printk' per-cpu buffers (a page
or two, perhaps). it's up to `printk_func' that we set up at the entry point
and flush function that we execute later. vprintk_nmi() writes to
this_cpu_ptr(&nmi_print_seq), vprintk_alter() will write to
this_cpu_ptr(&alt_print_seq), for example. the same with flushing:
printk_nmi_flush reads in from per_cpu(nmi_print_seq); alter-printk flush
can read from per_cpu(alt_print_seq).

[..]
> The lockless handling of the NMI per-CPU buffer already is not
> trivial. I would be afraid to add more hacks to make
> it writable in all contexts.

yes. we can reuse the idea of NMI printk and most of the code. the thing
we really care is `struct nmi_seq_buf' assigned to this particular CPU,
and there can be several seq_buffer-s.

> I am sorry about the bad news. This was so promising on the first
> look.

it's not dead yet ;)

	-ss

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v10 1/2] printk: Make printk() completely async
  2016-08-22  4:15                           ` Sergey Senozhatsky
  2016-08-23 12:19                             ` Petr Mladek
@ 2016-08-25 21:10                             ` Petr Mladek
  2016-08-26  1:56                               ` Sergey Senozhatsky
  1 sibling, 1 reply; 40+ messages in thread
From: Petr Mladek @ 2016-08-25 21:10 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Jan Kara, Sergey Senozhatsky, Viresh Kumar, Andrew Morton,
	Jan Kara, Tejun Heo, Tetsuo Handa, linux-kernel, Byungchul Park,
	vlevenetz, Greg Kroah-Hartman

On Mon 2016-08-22 13:15:20, Sergey Senozhatsky wrote:
> Hello,
> 
> On (08/20/16 14:24), Sergey Senozhatsky wrote:
> > On (08/19/16 21:00), Jan Kara wrote:
> > > > > depending on .config BUG() may never return back -- passing control
> > > > > to do_exit(), so printk_deferred_exit() won't be executed. thus we
> > > > > probably need to have a per-cpu variable that would indicate that
> > > > > we are in deferred_bug. hm... but do we really need deferred BUG()
> > > > > in the first place?
> > > > 
> since we are basically interested in wake_up_process() only from
> printk() POV. not sure how acceptable 2 * preempt_count and 2 * per-CPU
> writes for every try_to_wake_up().
> 
> 
> the other thing I just thought of is doing something as follows
> !!!not tested, will not compile, just an idea!!!
> 
> ---
> 
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 6e260a0..bb8d719 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -1789,6 +1789,7 @@ asmlinkage int vprintk_emit(int facility, int level,
>         printk_delay();
>  
>         local_irq_save(flags);
> +       printk_nmi_enter();
>         this_cpu = smp_processor_id();
>  
>         /*
> @@ -1804,6 +1805,7 @@ asmlinkage int vprintk_emit(int facility, int level,
>                  */
>                 if (!oops_in_progress && !lockdep_recursing(current)) {
>                         recursion_bug = true;
> +                       printk_nmi_exit();
>                         local_irq_restore(flags);
>                         return 0;
>                 }
> @@ -1920,6 +1922,7 @@ asmlinkage int vprintk_emit(int facility, int level,
>         logbuf_cpu = UINT_MAX;
>         raw_spin_unlock(&logbuf_lock);
>         lockdep_on();
> +       printk_nmi_exit();
>         local_irq_restore(flags);
>  
>         /* If called from the scheduler, we can not call up(). */

I was so taken by the idea of temporary forcing a lockless and
"trivial" printk implementation that I missed one thing.

Your patch use the alternative printk() variant around logbuf_lock.
But this is not the problem with wake_up_process(). printk_deferred()
takes logbuf_lock without problems.

Our problem is with calling wake_up_process() recursively. The
deadlock is in the scheduler locks.

But the patch still inspired me. What about blocking the problematic
wake_up_process() call by a per-cpu variable. I mean something like
this completely untested code:

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index ca9733b802ce..93915eb1fd0d 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1708,6 +1708,8 @@ static size_t cont_print_text(char *text, size_t size)
        return textlen;
 }
 
+DEFINE_PER_CPU(bool, printk_wakeup);
+
 asmlinkage int vprintk_emit(int facility, int level,
                            const char *dict, size_t dictlen,
                            const char *fmt, va_list args)
@@ -1902,8 +1904,17 @@ asmlinkage int vprintk_emit(int facility, int level,
        lockdep_off();
 
        if  (printk_kthread && !in_panic) {
+               bool __percpu *printk_wakeup_ptr;
+
                /* Offload printing to a schedulable context. */
-               wake_up_process(printk_kthread);
+               local_irq_save(flags);
+               printk_wake_up_ptr = this_cpu_ptr(&printk_wake_up);
+               if (!*printk_wakeup_ptr) {
+                       *printk_wake_up_ptr = true;
+                       wake_up_process(printk_kthread);
+                       *printk_wake_up_ptr = false;
+               }
+               local_irq_restore(flags);
                goto out_lockdep;
        } else {
                /*


We might eventually hide this into a wake_up_process_safe() or so.

Also we might need to use it also in console_unlock() to avoid similar
recursion there as well.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v10 1/2] printk: Make printk() completely async
  2016-08-25 21:10                             ` Petr Mladek
@ 2016-08-26  1:56                               ` Sergey Senozhatsky
  2016-08-26  8:20                                 ` Sergey Senozhatsky
  2016-08-30  9:29                                 ` Petr Mladek
  0 siblings, 2 replies; 40+ messages in thread
From: Sergey Senozhatsky @ 2016-08-26  1:56 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Jan Kara, Sergey Senozhatsky, Viresh Kumar,
	Andrew Morton, Jan Kara, Tejun Heo, Tetsuo Handa, linux-kernel,
	Byungchul Park, vlevenetz, Greg Kroah-Hartman

On (08/25/16 23:10), Petr Mladek wrote:
[..]
> I was so taken by the idea of temporary forcing a lockless and
> "trivial" printk implementation that I missed one thing.
> 
> Your patch use the alternative printk() variant around logbuf_lock.
> But this is not the problem with wake_up_process(). printk_deferred()
> takes logbuf_lock without problems.

you didn't miss anything, I think I wasn't too descriptive and that caused
some confusion. this patch is not a replacement of wake_up_process() patch
posted earlier in the loop, but an addition to it. not only every WARN/BUG
issued from wake_up_process() will do no good, but every lock we take is
potentially dangerous as well. In the simplest case because of $LOCK-debug.c
files in kernel/locking (spin_lock in our case); in the worst case --
because of WARNs issued by log_store() and friends (there may be custom
modifications) or by violations of spinlock atomicity requirements.

For example,

	vprintk_emit()
		local_irq_save()
		raw_spin_lock()
		text_len = vscnprintf(text, sizeof(textbuf), fmt, args)
		{
			vsnprintf()
			{
				if (WARN_ON_ONCE(size > INT_MAX))
					return 0;
			}
		}
		...

this is a rather unlikely event, sure, there must be some sort of
memory corruption or something else, but the thing is -- if it will
happen, printk() will not be willing to help.

wake_up_process() change, posted earlier, is using a deferred version of
WARN macro, but we definitely can (and we better do) switch to lockless
alternative printk() in both cases and don't bother with new macros.
replacing all of the existing ones with 'safe' deferred versions is
a difficult task, but keeping track of a newly introduced ones is even
harder (if possible at all).

> +DEFINE_PER_CPU(bool, printk_wakeup);
> +
>  asmlinkage int vprintk_emit(int facility, int level,
>                             const char *dict, size_t dictlen,
>                             const char *fmt, va_list args)
> @@ -1902,8 +1904,17 @@ asmlinkage int vprintk_emit(int facility, int level,
>         lockdep_off();
>  
>         if  (printk_kthread && !in_panic) {
> +               bool __percpu *printk_wakeup_ptr;
> +
>                 /* Offload printing to a schedulable context. */
> -               wake_up_process(printk_kthread);
> +               local_irq_save(flags);
> +               printk_wake_up_ptr = this_cpu_ptr(&printk_wake_up);
> +               if (!*printk_wakeup_ptr) {
> +                       *printk_wake_up_ptr = true;
> +                       wake_up_process(printk_kthread);
> +                       *printk_wake_up_ptr = false;
> +               }
> +               local_irq_restore(flags);
>                 goto out_lockdep;
>         } else {

this can do, thanks.
I would probably prefer, for the time being, to have a single mechanism
that we will use in both cases. something like this:

vprintk_emit()
{
	alt_printk_enter();
	...
	log_store();
	...
	alt_printk_exit();

	wakep_up_process() /* direct from async printk,
			      or indirect from console_unlock()->up() */
		alt_printk_enter();
		... enqueue task
		alt_printk_exit();
}

and we need to have some sort of rollback to default printk() if
BUG() goes to panic() (both on HAVE_ARCH_BUG and !HAVE_ARCH_BUG
platforms):

static void oops_end(...)
{
...
	if (in_interrupt())
		panic("Fatal exception in interrupt");
	if (panic_on_oops)
		panic("Fatal exception");
	if (signr)
		do_exit(signr);
}

not so sure about do_exit(). we are specifically talking here about
wake_up_process()->try_to_wake_up(), which does all of its job under
raw_spin_lock_irqsave(&p->pi_lock), so IF there is a BUG() that does
do_exit() /* hard to imagine that */, then nothing will help us out,
I think.

	-ss

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v10 1/2] printk: Make printk() completely async
  2016-08-26  1:56                               ` Sergey Senozhatsky
@ 2016-08-26  8:20                                 ` Sergey Senozhatsky
  2016-08-30  9:29                                 ` Petr Mladek
  1 sibling, 0 replies; 40+ messages in thread
From: Sergey Senozhatsky @ 2016-08-26  8:20 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Jan Kara, Sergey Senozhatsky, Viresh Kumar, Andrew Morton,
	Jan Kara, Tejun Heo, Tetsuo Handa, linux-kernel, Byungchul Park,
	vlevenetz, Greg Kroah-Hartman, Sergey Senozhatsky

On (08/26/16 10:56), Sergey Senozhatsky wrote:


> but every lock we take is potentially dangerous as well.
...
> vprintk_emit()
> {
> 	alt_printk_enter();
> 	...
> 	log_store();
> 	...
> 	alt_printk_exit();
> 
> 	wakep_up_process() /* direct from async printk,
> 			      or indirect from console_unlock()->up() */
> 		alt_printk_enter();
> 		... enqueue task
> 		alt_printk_exit();
> }


OTOH, after a very quick thought, up() also takes a spin lock, which
may spindump. so I'll probably prefer to keep the entire alt-printk
thing entirely in printk(). something like this

vprintk_emit()
{
	alt_printk_enter()
	log_store()
	alt_printk_exit()

	if (async_printk)
	{
		alt_printk_enter()
		wake_up_process()
		alt_printk_exit()
	} else {
		if (console_trylock()) {
			console_unlock()
			{
				....
				alt_printk_enter()
				up()
				alt_printk_exit()
			}
		}
	}
}

this leaves console_trylock() `unprotected'. so I guess I'll do
something like this:

	} else {
		int ret;

		alt_printk_enter()
		ret = console_trylock();
		alt_printk_exit()

		if (ret)
			console_unlock();
	}

a bit ugly, but well, it is what it is. will think more about it.

	-ss

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v10 1/2] printk: Make printk() completely async
  2016-08-26  1:56                               ` Sergey Senozhatsky
  2016-08-26  8:20                                 ` Sergey Senozhatsky
@ 2016-08-30  9:29                                 ` Petr Mladek
  2016-08-31  2:31                                   ` Sergey Senozhatsky
  1 sibling, 1 reply; 40+ messages in thread
From: Petr Mladek @ 2016-08-30  9:29 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Jan Kara, Sergey Senozhatsky, Viresh Kumar, Andrew Morton,
	Jan Kara, Tejun Heo, Tetsuo Handa, linux-kernel, Byungchul Park,
	vlevenetz, Greg Kroah-Hartman

On Fri 2016-08-26 10:56:41, Sergey Senozhatsky wrote:
> On (08/25/16 23:10), Petr Mladek wrote:
> [..]
> > I was so taken by the idea of temporary forcing a lockless and
> > "trivial" printk implementation that I missed one thing.
> > 
> > Your patch use the alternative printk() variant around logbuf_lock.
> > But this is not the problem with wake_up_process(). printk_deferred()
> > takes logbuf_lock without problems.
> 
> you didn't miss anything, I think I wasn't too descriptive and that caused
> some confusion. this patch is not a replacement of wake_up_process() patch
> posted earlier in the loop, but an addition to it. not only every WARN/BUG
> issued from wake_up_process() will do no good, but every lock we take is
> potentially dangerous as well. In the simplest case because of $LOCK-debug.c
> files in kernel/locking (spin_lock in our case); in the worst case --
> because of WARNs issued by log_store() and friends (there may be custom
> modifications) or by violations of spinlock atomicity requirements.
> 
> For example,
> 
> 	vprintk_emit()
> 		local_irq_save()
> 		raw_spin_lock()
> 		text_len = vscnprintf(text, sizeof(textbuf), fmt, args)
> 		{
> 			vsnprintf()
> 			{
> 				if (WARN_ON_ONCE(size > INT_MAX))
> 					return 0;
> 			}
> 		}
> 		...
> 
> this is a rather unlikely event, sure, there must be some sort of
> memory corruption or something else, but the thing is -- if it will
> happen, printk() will not be willing to help.
> 
> wake_up_process() change, posted earlier, is using a deferred version of
> WARN macro, but we definitely can (and we better do) switch to lockless
> alternative printk() in both cases and don't bother with new macros.
> replacing all of the existing ones with 'safe' deferred versions is
> a difficult task, but keeping track of a newly introduced ones is even
> harder (if possible at all).

I see. It makes some sense. I would like to be on the safe side. I am
just afraid that adding yet another per-CPU buffer is too complex.
It adds quite some complexity to the code. And it even more scatters
the messages so that it will be harder to get them from the
crash dump or flush them to the console when the system goes down.

It took few years to get in the solution for NMIs even when
it fixed real life deadlocks for many people and customers.
I am afraid that it is not realistic to get in similar complex
code to fix rather theoretical problems.

Sigh, I waited few days with this comment. I do not want to sound
like a broken record. I have hoped that anyone else could say
an opinion.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v10 1/2] printk: Make printk() completely async
  2016-08-30  9:29                                 ` Petr Mladek
@ 2016-08-31  2:31                                   ` Sergey Senozhatsky
  2016-08-31  9:38                                     ` Petr Mladek
  0 siblings, 1 reply; 40+ messages in thread
From: Sergey Senozhatsky @ 2016-08-31  2:31 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Jan Kara, Sergey Senozhatsky, Viresh Kumar,
	Andrew Morton, Jan Kara, Tejun Heo, Tetsuo Handa, linux-kernel,
	Byungchul Park, vlevenetz, Greg Kroah-Hartman

On (08/30/16 11:29), Petr Mladek wrote:
> > you didn't miss anything, I think I wasn't too descriptive and that caused
> > some confusion. this patch is not a replacement of wake_up_process() patch
> > posted earlier in the loop, but an addition to it. not only every WARN/BUG
> > issued from wake_up_process() will do no good, but every lock we take is
> > potentially dangerous as well. In the simplest case because of $LOCK-debug.c
> > files in kernel/locking (spin_lock in our case); in the worst case --
> > because of WARNs issued by log_store() and friends (there may be custom
> > modifications) or by violations of spinlock atomicity requirements.
> > 
> > For example,
> > 
> > 	vprintk_emit()
> > 		local_irq_save()
> > 		raw_spin_lock()
> > 		text_len = vscnprintf(text, sizeof(textbuf), fmt, args)
> > 		{
> > 			vsnprintf()
> > 			{
> > 				if (WARN_ON_ONCE(size > INT_MAX))
> > 					return 0;
> > 			}
> > 		}
> > 		...
> > 
> > this is a rather unlikely event, sure, there must be some sort of
> > memory corruption or something else, but the thing is -- if it will
> > happen, printk() will not be willing to help.
> > 
> > wake_up_process() change, posted earlier, is using a deferred version of
> > WARN macro, but we definitely can (and we better do) switch to lockless
> > alternative printk() in both cases and don't bother with new macros.
> > replacing all of the existing ones with 'safe' deferred versions is
> > a difficult task, but keeping track of a newly introduced ones is even
> > harder (if possible at all).
> 
> I see. It makes some sense. I would like to be on the safe side. I am
> just afraid that adding yet another per-CPU buffer is too complex.
> It adds quite some complexity to the code. And it even more scatters
> the messages so that it will be harder to get them from the
> crash dump or flush them to the console when the system goes down.
> 
> It took few years to get in the solution for NMIs even when
> it fixed real life deadlocks for many people and customers.
> I am afraid that it is not realistic to get in similar complex
> code to fix rather theoretical problems.

well, I still can try it in my spare time. we can't fix printk() without
ever touching it, can we? so far we basically only acknowledge the
existing printk() problems. we can do better than that, I think.

> Sigh, I waited few days with this comment. I do not want to sound
> like a broken record.

no worries

	-ss

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v10 1/2] printk: Make printk() completely async
  2016-08-31  2:31                                   ` Sergey Senozhatsky
@ 2016-08-31  9:38                                     ` Petr Mladek
  2016-08-31 12:52                                       ` Sergey Senozhatsky
  0 siblings, 1 reply; 40+ messages in thread
From: Petr Mladek @ 2016-08-31  9:38 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Jan Kara, Sergey Senozhatsky, Viresh Kumar, Andrew Morton,
	Jan Kara, Tejun Heo, Tetsuo Handa, linux-kernel, Byungchul Park,
	vlevenetz, Greg Kroah-Hartman

On Wed 2016-08-31 11:31:35, Sergey Senozhatsky wrote:
> On (08/30/16 11:29), Petr Mladek wrote:
> > > you didn't miss anything, I think I wasn't too descriptive and that caused
> > > some confusion. this patch is not a replacement of wake_up_process() patch
> > > posted earlier in the loop, but an addition to it. not only every WARN/BUG
> > > issued from wake_up_process() will do no good, but every lock we take is
> > > potentially dangerous as well. In the simplest case because of $LOCK-debug.c
> > > files in kernel/locking (spin_lock in our case); in the worst case --
> > > because of WARNs issued by log_store() and friends (there may be custom
> > > modifications) or by violations of spinlock atomicity requirements.
> > > 
> > > For example,
> > > 
> > > 	vprintk_emit()
> > > 		local_irq_save()
> > > 		raw_spin_lock()
> > > 		text_len = vscnprintf(text, sizeof(textbuf), fmt, args)
> > > 		{
> > > 			vsnprintf()
> > > 			{
> > > 				if (WARN_ON_ONCE(size > INT_MAX))
> > > 					return 0;
> > > 			}
> > > 		}
> > > 		...
> > > 
> > > this is a rather unlikely event, sure, there must be some sort of
> > > memory corruption or something else, but the thing is -- if it will
> > > happen, printk() will not be willing to help.
> > > 
> > > wake_up_process() change, posted earlier, is using a deferred version of
> > > WARN macro, but we definitely can (and we better do) switch to lockless
> > > alternative printk() in both cases and don't bother with new macros.
> > > replacing all of the existing ones with 'safe' deferred versions is
> > > a difficult task, but keeping track of a newly introduced ones is even
> > > harder (if possible at all).
> > 
> > I see. It makes some sense. I would like to be on the safe side. I am
> > just afraid that adding yet another per-CPU buffer is too complex.
> > It adds quite some complexity to the code. And it even more scatters
> > the messages so that it will be harder to get them from the
> > crash dump or flush them to the console when the system goes down.
> > 
> > It took few years to get in the solution for NMIs even when
> > it fixed real life deadlocks for many people and customers.
> > I am afraid that it is not realistic to get in similar complex
> > code to fix rather theoretical problems.
> 
> well, I still can try it in my spare time. we can't fix printk() without
> ever touching it, can we? so far we basically only acknowledge the
> existing printk() problems. we can do better than that, I think.

Ah, I do not want to discourage you from finding a solution for these
problems. I just wanted to point out problems with this particular
path of thinking (more per-CPU buffers, shuffling data between
them and the main buffer and console). But I might be wrong.

Sigh, there are many problems with printk(). I think the we recently
discussed the following problems:

  1. Hung task or blocked irq handler when preemption/irqs
     are disabled and there are too many messages pushed to
     the console.

  2. Potential deadlocks when calling wake_up_process() by
     async printk and console_unlock().

  3. Clean up the console handling to split manipulation with
     consoles setting and pushing the messages. By other words,
     allow to push the console messages only when wanted.

  4. Messed output with continuous lines.


They are related but only partly. IMHO, it is not realistic to
fix all the problems in a single patchset. I wonder how to move
forward.

Our primary target was to solve the 1st problem with the async printk.
It has stalled because we hit the other areas. Let's look at them
from this point of view.


Ad 2. The potential deadlock with wake_up_process(). It pooped up
      with using async printk during the suspend.

      But it is not new! up() called by console_unlock() has the
      same problem. I thought that it was different because
      console_trylock() would prevent recursion but I was wrong.
      There seems to be similar deadlock:

      console_unlock()
	up_console_sem()
	  up()
	    __up()
	      raw_spin_lock_irqsave(&sem->lock, flags);
	      wake_up_process()
		WARN()
		  printk()
		    vprintk_emit()
		      console_trylock()
			down_trylock_console_sem()
			  __down_trylock_console_sem)()
			    down_trylock()
			      raw_spin_lock_irqsave(&sem->lock, flags);

      DEADLOCK: because sem->lock is already taken by __up()

      We hit the deadlock quickly when the printk kthread used SCHED_FIFO.
      But it must be almost impossible to hit it (hit a WARN() with
      the normal scheduler. Otherwise, people would hit it also by
      console_unlock().

      => If we use normal scheduler for the printk() thread, we should
      be on the safe side. The deadlock should get fixed but it
      will be enough to fix it later separately.


Ad 3. The clean up of the console code might change some things. But it
      it is a huge task with unclear output. Anyway, we want to push
      messages to the console from printk(), so it will not help
      to avoid the async printk. It is a completely separate problem
      from my point of view.


Ad 4. The problems with messaged continues lines got a bit more
      visible with the async printk.

      It is because a partially flushed cont buffer is blocked
      until the rest of the line is flushed. This happens
      more likely when the messages are flushed by a separate
      process. I have proposed a fix that read the end of the
      partially printed line from the main ring buffer. It
      made the code easier and might be a low risk change.

      In each case, I would not mix it with a perfect solution
      of the continuous lines. IMHO, it would be too complex
      and it is not worth it.

Did I miss anything?

I wonder how to separate the problems and make them more manageable.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v10 1/2] printk: Make printk() completely async
  2016-08-31  9:38                                     ` Petr Mladek
@ 2016-08-31 12:52                                       ` Sergey Senozhatsky
  2016-09-01  8:58                                         ` Petr Mladek
  0 siblings, 1 reply; 40+ messages in thread
From: Sergey Senozhatsky @ 2016-08-31 12:52 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Jan Kara, Sergey Senozhatsky, Viresh Kumar,
	Andrew Morton, Jan Kara, Tejun Heo, Tetsuo Handa, linux-kernel,
	Byungchul Park, vlevenetz, Greg Kroah-Hartman

On (08/31/16 11:38), Petr Mladek wrote:
[..]
> Ah, I do not want to discourage you from finding a solution for these
> problems.

oh, Petr, I didn't mean it. that was a poor/incorrect wording on my side.

> I just wanted to point out problems with this particular
> path of thinking (more per-CPU buffers, shuffling data between
> them and the main buffer and console).

sure, and I do appreciate your input.

that's a good summary.

> Sigh, there are many problems with printk(). I think the we recently
> discussed the following problems:
> 
>   1. Hung task or blocked irq handler when preemption/irqs
>      are disabled and there are too many messages pushed to
>      the console.

    1.1. deferred printing, for instance due to sched throttling

    1.2. unlucky printk() from IRQ handler, that succeeded in grabbing the
console semaphore via console_trylock().

once there is a moderate printk() flood and some of console_unlock()
calls can be executed with preemption disabled, chances of sched
throttling do increase. I've seen this many times in the past.


>   2. Potential deadlocks when calling wake_up_process() by
>      async printk and console_unlock().

    * there are many reasons to those recursive printk() calls -- some
can be addressed, some cannot. for instance, it doesn't matter how many
per-CPU buffers we use for alternative printk() once the logbuf_lock is
corrupted.

another `deadlock' example would be:

SyS_ioctl
 do_vfs_ioctl
  tty_ioctl
   n_tty_ioctl
    tty_mode_ioctl
     set_termios
      tty_set_termios
       uart_set_termios
        uart_change_speed
         FOO_serial_set_termios
          spin_lock_irqsave(&port->lock)     // lock the output port
          ....
          !! WARN() or pr_err() or printk()
              vprintk_emit()
               /* console_trylock() */
               console_unlock()
                call_console_drivers()
                 FOO_write()
                  spin_lock_irqsave(&port->lock)     // already locked


>   3. Clean up the console handling to split manipulation with
>      consoles setting and pushing the messages. By other words,
>      allow to push the console messages only when wanted.

yes, that's a hard one. not only console_unlock() may force a completely
random task to do the printing loop, but console_lock() will force that task
to stay in TASK_UNINTERRUPTIBLE as long as the semaphore is busy. if that
task ends up being a user space process that must reply to systemd watchdog
(user space) signals then things can get confusing. apart from the fact that
that user space process can spend XX seconds printing kernel's messages from
console_unlock().

even a single unfortunate console_trylock() and console_unlock() can lockup
the system, etc. basically because they void the printk_kthread.

so there are several outcomes from reworking console locking, but
the results may vary. depending on particular setups.


>   4. Messed output with continuous lines.

    5. not 100% guaranteed printing on panic
    not entirely related to printk(), but to console output mechanism in
general. we have console_flush_on_panic() which ignores console semaphore
state, to increase our chances of seeing the backtrace. however, there are
more that just one lock involved: logbuf_lock, serial driver locks. so we may
start zap_locks() in console_flush_on_panic() to re-init the logbuf_lock,
but underlying serial driver's locks are still in unclear state. most of
the drivers (if not all of them) take the port->lock under disabled IRQs,
so if panic-CPU is not the one that holds the port->lock then the port->lock
owner CPU will probably unlock the spin_lock before processing its STOP_IPI.
if it's the port->lock CPU that panic() the system (nmi_panic() or BUG())
then things can be bad.


> Our primary target was to solve the 1st problem with the async printk.
> It has stalled because we hit the other areas. Let's look at them
> from this point of view.
>
> Ad 2. The potential deadlock with wake_up_process(). It pooped up
>       with using async printk during the suspend.

right... and no per-CPU buffer could have helped here.

>       But it is not new! up() called by console_unlock() has the
>       same problem. I thought that it was different because
>       console_trylock() would prevent recursion but I was wrong.

correct. Byungchul hit that problem awhile ago https://lkml.org/lkml/2016/2/17/102

>       => If we use normal scheduler for the printk() thread, we should
>       be on the safe side. The deadlock should get fixed but it
>       will be enough to fix it later separately.

agree.

> Ad 4. The problems with messaged continues lines got a bit more
>       visible with the async printk.

yep. I also agree that it might be not so serious to fix it now (if ever).

> Did I miss anything?
> 
> I wonder how to separate the problems and make them more manageable.

so I was thinking for a moment about doing the recursion detection rework
before the async_printk. just because better recursion detection is a nice
thing to have in the first place and it probably may help us catching some
of the surprises that async_printk might have. but it probably will be more
problematic than I thought.

then async_printk. I have a refreshed series on my hands, addressing
Viresh's reports. it certainly makes things better, but it doesn't
eliminate all of the lockups/etc sources. a console_unlock() doing
wake_up_process(printk_kthread) would make it better.

	-ss

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v10 1/2] printk: Make printk() completely async
  2016-08-31 12:52                                       ` Sergey Senozhatsky
@ 2016-09-01  8:58                                         ` Petr Mladek
  2016-09-02  7:58                                           ` Sergey Senozhatsky
  0 siblings, 1 reply; 40+ messages in thread
From: Petr Mladek @ 2016-09-01  8:58 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Sergey Senozhatsky, Jan Kara, Viresh Kumar, Andrew Morton,
	Jan Kara, Tejun Heo, Tetsuo Handa, linux-kernel, Byungchul Park,
	vlevenetz, Greg Kroah-Hartman

On Wed 2016-08-31 21:52:24, Sergey Senozhatsky wrote:
> On (08/31/16 11:38), Petr Mladek wrote:
> >   2. Potential deadlocks when calling wake_up_process() by
> >      async printk and console_unlock().
> 
>     * there are many reasons to those recursive printk() calls -- some
> can be addressed, some cannot. for instance, it doesn't matter how many
> per-CPU buffers we use for alternative printk() once the logbuf_lock is
> corrupted.

Yup and BTW: Peter Zijlstra wants to avoid zapping locks whenever
possible because it corrupts the state. It might solve the actual
state but it might cause deadlock by the double unlock.

> another `deadlock' example would be:
> 
> SyS_ioctl
>  do_vfs_ioctl
>   tty_ioctl
>    n_tty_ioctl
>     tty_mode_ioctl
>      set_termios
>       tty_set_termios
>        uart_set_termios
>         uart_change_speed
>          FOO_serial_set_termios
>           spin_lock_irqsave(&port->lock)     // lock the output port
>           ....
>           !! WARN() or pr_err() or printk()
>               vprintk_emit()
>                /* console_trylock() */
>                console_unlock()
>                 call_console_drivers()
>                  FOO_write()
>                   spin_lock_irqsave(&port->lock)     // already
>  locked

Great catch! From the already mentioned solutions, I would prefer
using deferred variants of WARN()/BUG()/printk() on these locations.
Together with using lockdep to find these locations.

Also there is the Peter Zijlstra's idea of using a lockless
"early" console to debug the situations where it happens.
It might make sense to make such a console easy to use.

I am unable to find any other generic solution that would prevent this
from the printk() side at the moment.

>     5. not 100% guaranteed printing on panic
>     not entirely related to printk(), but to console output mechanism in
> general. we have console_flush_on_panic() which ignores console semaphore
> state, to increase our chances of seeing the backtrace. however, there are
> more that just one lock involved: logbuf_lock, serial driver locks. so we may
> start zap_locks() in console_flush_on_panic() to re-init the logbuf_lock,
> but underlying serial driver's locks are still in unclear state. most of
> the drivers (if not all of them) take the port->lock under disabled IRQs,
> so if panic-CPU is not the one that holds the port->lock then the port->lock
> owner CPU will probably unlock the spin_lock before processing its STOP_IPI.
> if it's the port->lock CPU that panic() the system (nmi_panic() or BUG())
> then things can be bad.

That might be very hard to solve in general as well. Again the PeterZ's
idea with the lockless console might help here.

> > I wonder how to separate the problems and make them more manageable.
> 
> so I was thinking for a moment about doing the recursion detection rework
> before the async_printk. just because better recursion detection is a nice
> thing to have in the first place and it probably may help us catching some
> of the surprises that async_printk might have. but it probably will be more
> problematic than I thought.
> 
> then async_printk. I have a refreshed series on my hands, addressing
> Viresh's reports. it certainly makes things better, but it doesn't
> eliminate all of the lockups/etc sources.

We must separate historical possible lockups and new regressions.
Only regressions should block the async printk series. Old
bugs should be fixed separately to keep the series manageable.

Anyway, I think that the async printk will make sense even
when we solve all the other issues. If async printk does not
cause regressions, why not make it in.


> a console_unlock() doing
> wake_up_process(printk_kthread) would make it better.

I am not sure what you mean by this.

Thanks for working on it.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v10 1/2] printk: Make printk() completely async
  2016-09-01  8:58                                         ` Petr Mladek
@ 2016-09-02  7:58                                           ` Sergey Senozhatsky
  2016-09-02 15:15                                             ` Petr Mladek
  0 siblings, 1 reply; 40+ messages in thread
From: Sergey Senozhatsky @ 2016-09-02  7:58 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Jan Kara, Viresh Kumar,
	Andrew Morton, Jan Kara, Tejun Heo, Tetsuo Handa, linux-kernel,
	Byungchul Park, vlevenetz, Greg Kroah-Hartman

On (09/01/16 10:58), Petr Mladek wrote:
> On Wed 2016-08-31 21:52:24, Sergey Senozhatsky wrote:
> > On (08/31/16 11:38), Petr Mladek wrote:
> > >   2. Potential deadlocks when calling wake_up_process() by
> > >      async printk and console_unlock().
> > 
> >     * there are many reasons to those recursive printk() calls -- some
> > can be addressed, some cannot. for instance, it doesn't matter how many
> > per-CPU buffers we use for alternative printk() once the logbuf_lock is
> > corrupted.
> 
> Yup and BTW: Peter Zijlstra wants to avoid zapping locks whenever
> possible because it corrupts the state. It might solve the actual
> state but it might cause deadlock by the double unlock.

yes, don't really want to zap_locks() either.

[..]
> Great catch! From the already mentioned solutions, I would prefer
> using deferred variants of WARN()/BUG()/printk() on these locations.
> Together with using lockdep to find these locations.

hmm... need to think more. one of the problems is that we would have to
periodically "scan" for new WARNs/BUGs/etc doing all the types of random
.configs

> Also there is the Peter Zijlstra's idea of using a lockless
> "early" console to debug the situations where it happens.
> It might make sense to make such a console easy to use.

aha, not really familiar with early console.

> I am unable to find any other generic solution that would prevent this
> from the printk() side at the moment.
> 
> >     5. not 100% guaranteed printing on panic
[..]
> That might be very hard to solve in general as well. Again the PeterZ's
> idea with the lockless console might help here.

"need to google it".

> > > I wonder how to separate the problems and make them more manageable.
> > 
> > so I was thinking for a moment about doing the recursion detection rework
> > before the async_printk. just because better recursion detection is a nice
> > thing to have in the first place and it probably may help us catching some
> > of the surprises that async_printk might have. but it probably will be more
> > problematic than I thought.
> > 
> > then async_printk. I have a refreshed series on my hands, addressing
> > Viresh's reports. it certainly makes things better, but it doesn't
> > eliminate all of the lockups/etc sources.
> 
> We must separate historical possible lockups and new regressions.
> Only regressions should block the async printk series. Old
> bugs should be fixed separately to keep the series manageable.

agree.

> Anyway, I think that the async printk will make sense even
> when we solve all the other issues. If async printk does not
> cause regressions, why not make it in.

sure.

> > a console_unlock() doing
> > wake_up_process(printk_kthread) would make it better.
> 
> I am not sure what you mean by this.

I meant that this thing

	local_irq_save() // or preempt_disable()
	...
	if (console_trylock())
		console_unlock();
	...
	local_irq_restore() // or preempt_enable()


can easily lockup the system if console_trylock() was successful and there
are enough messages to print. printk_kthread can't help, because here we
basically enforce the `old' behavior. we have async printk, but not async
console output. tweaking console_unlock() to offload the actual printing loop
to printk_kthread would make the entire console output async:

	static void console_sync_flush_and_unlock(void)
	{
		for (;;) {
		...
			call_console_drivers();
		...
		}
	}

	void console_unlock(void)
	{
		if (!MOTORMOUTH && can_printk_async()) {
			up();
			wake_up_process(printk_kthread);
			return;
		}
		console_sync_flush_and_unlock();
	}

	-ss

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v10 1/2] printk: Make printk() completely async
  2016-09-02  7:58                                           ` Sergey Senozhatsky
@ 2016-09-02 15:15                                             ` Petr Mladek
  2016-09-06  7:16                                               ` Sergey Senozhatsky
  0 siblings, 1 reply; 40+ messages in thread
From: Petr Mladek @ 2016-09-02 15:15 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Sergey Senozhatsky, Jan Kara, Viresh Kumar, Andrew Morton,
	Jan Kara, Tejun Heo, Tetsuo Handa, linux-kernel, Byungchul Park,
	vlevenetz, Greg Kroah-Hartman

On Fri 2016-09-02 16:58:08, Sergey Senozhatsky wrote:
> On (09/01/16 10:58), Petr Mladek wrote:
> > On Wed 2016-08-31 21:52:24, Sergey Senozhatsky wrote:
> > > a console_unlock() doing
> > > wake_up_process(printk_kthread) would make it better.
> > 
> > I am not sure what you mean by this.
> 
> I meant that this thing
> 
> 	local_irq_save() // or preempt_disable()
> 	...
> 	if (console_trylock())
> 		console_unlock();
> 	...
> 	local_irq_restore() // or preempt_enable()

I see.

> can easily lockup the system if console_trylock() was successful and there
> are enough messages to print. printk_kthread can't help, because here we
> basically enforce the `old' behavior. we have async printk, but not async
> console output. tweaking console_unlock() to offload the actual printing loop
> to printk_kthread would make the entire console output async:
> 
> 	static void console_sync_flush_and_unlock(void)
> 	{
> 		for (;;) {
> 		...
> 			call_console_drivers();
> 		...
> 		}
> 	}
> 
> 	void console_unlock(void)
> 	{
> 		if (!MOTORMOUTH && can_printk_async()) {
> 			up();
> 			wake_up_process(printk_kthread);
> 			return;
> 		}
> 		console_sync_flush_and_unlock();
> 	}

Something like this would make sense. But I would do it in a separate
patch(set). We need to go through all console_unlock() callers and
make sure that they are fine with the potential async behavior.
I would not complicate the async printk patchset by this.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v10 1/2] printk: Make printk() completely async
  2016-09-02 15:15                                             ` Petr Mladek
@ 2016-09-06  7:16                                               ` Sergey Senozhatsky
  0 siblings, 0 replies; 40+ messages in thread
From: Sergey Senozhatsky @ 2016-09-06  7:16 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Jan Kara, Viresh Kumar,
	Andrew Morton, Jan Kara, Tejun Heo, Tetsuo Handa, linux-kernel,
	Byungchul Park, vlevenetz, Greg Kroah-Hartman

Hello,

On (09/02/16 17:15), Petr Mladek wrote:
[..]
> > I meant that this thing
> > 
> > 	local_irq_save() // or preempt_disable()
> > 	...
> > 	if (console_trylock())
> > 		console_unlock();
> > 	...
> > 	local_irq_restore() // or preempt_enable()
> 
> I see.
> 
> > can easily lockup the system if console_trylock() was successful and there
> > are enough messages to print. printk_kthread can't help, because here we
> > basically enforce the `old' behavior. we have async printk, but not async
> > console output. tweaking console_unlock() to offload the actual printing loop
> > to printk_kthread would make the entire console output async:
> > 
> > 	static void console_sync_flush_and_unlock(void)
> > 	{
> > 		for (;;) {
> > 		...
> > 			call_console_drivers();
> > 		...
> > 		}
> > 	}
> > 
> > 	void console_unlock(void)
> > 	{
> > 		if (!MOTORMOUTH && can_printk_async()) {
> > 			up();
> > 			wake_up_process(printk_kthread);
> > 			return;
> > 		}
> > 		console_sync_flush_and_unlock();
> > 	}
> 
> Something like this would make sense. But I would do it in a separate
> patch(set). We need to go through all console_unlock() callers and
> make sure that they are fine with the potential async behavior.
> I would not complicate the async printk patchset by this.

sure. just added one more item to the list.

	-ss

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v10 1/2] printk: Make printk() completely async
@ 2016-08-23  3:32 Andreas Mohr
  0 siblings, 0 replies; 40+ messages in thread
From: Andreas Mohr @ 2016-08-23  3:32 UTC (permalink / raw)
  To: Sergey Senozhatsky, Jan Kara, Petr Mladek, linux-kernel

[I *AGAIN* seem to have lost access to linux-kernel subscription,
thus no proper In-Reply-To: reply possible, ARGH, sorry]

Some random yet maybe still helpful observations from
reading this mail content
(Sat, 20 Aug 2016 14:24:30 +0900)
and reviewing trunk printk.c:

. vprintk_emit() static variables are not grouped
  (logbuf_cpu plus comment should probably be near other static:s
  since they are related,
  and perhaps add a separation line after static:s group
  to emphasize the distinction)
. many, many ugly static:s sitting in this file (build unit),
  i.e. no proper instance definition, with the result of e.g.
  being unable to easily convert things into log-multi-instance handling
  (not necessarily required, admittedly).
  Probably should group related static:s into helper struct:s instead
  and then instantiate these struct:s,
  to be passed as "session instance this"
  into properly *non*-free-running log functions
  (prime free-running functions where this is obvious are e.g.
  logbuf_has_space(), log_make_free_space()).
  Having hard-coded static:s sitting at random locations within a file
  IMHO simply is *MUCH* less flexible/clean than
  having things properly struct-*grouped* and thus
  flexibly *reusable*/*rearrangeable*.

Note that there also is the verbose/lengthy comment
/*
 * The logbuf_lock protects kmsg buffer, indices, counters.  This can be taken
 * within the scheduler's rq lock. It must be released before calling
 * console_unlock() or anything else that might wake up a process.
 */
which would immediately lose large parts of its reason for existence
once things were done in a clean/structured/elegant/self-documenting manner
where all/many lock-affected variables (quote: "kmsg buffer, indices, counters")
would be grouped in one/few properly component-grouped struct:s and
where an *outer* struct definition
(to be passed as an "instance this" to implementation functions)
would then be used to treat/cure
the existing multi-context/multi-threading access disease,
by containing
one/few members of the struct(s) and
one *sibling* lock member, for access to
these inner lock-free implementation struct:s
in those unfortunate use situations
where we need to deal with multi-context access and thus
a lock needs to be used.
--> immediately visible *cleanly structured* implementation hierarchy,
plus the *reason* for this hierarchy.

Performance aspects IMHO shouldn't play much of a role here,
despite this being printk implementation code
(inefficiencies caused by modernization delay due to
insufficiently modernizable/layerable code
play a much larger role IMHO)


Put differently: the file seems somewhat chaotic
(i.e., with sufficiently large amounts of unrelated/distracting noise
within relevant scope width of developer examination efforts)
to sufficiently *prevent* people from easily reviewing it
and thus being able to come to implementation modernization conclusions
in a much faster manner,
thus it likely should better be refactored prior to
messing up/complicating things
while trying to solve some behaviour issues.

I'm usually focussing on achieving
non-redundancy / hierarchy *symmetry* / consistency -
with some minor luck
important treatment clues will then follow easily,
due to
much reduced total implementation examination scope.


. boot_delay_msec():
  . system_state != SYSTEM_BOOTING conditional has wrong evaluation order, which leads to configurations with boot_delay != 0 having performance characteristics *different* (always needs to check one conditional more!) from == 0 configurations (once system_state has proceeded beyond SYSTEM_BOOTING i.e. *normal*(!) system operation, that is)
  . time_after(): perhaps it would be useful to use ktime (non-jiffies-based) APIs instead?
    (open questions: algorithmic performance, dependency availability at boot time?)

. the entire code in printk.c seems very busy/complex (as can also be witnessed via the impressively[not?] long list of #include:s) -
quite likely the code is doing some annoying layer violations,
seemingly drawing in as many dependencies as it can, rather than
implementing tiny, focussed helper functions
which then would get invoked by an *outer*, *aggregating* layer
which simply knows how to invoke these functions due to
consulting all relevant global, aggregation-related configuration aspects
*before* invoking these implementation-focussed functions;
prime sample: vprintk_emit(),
with its weirdly scheduler-dependency-affected epilogue parts -
quite likely it should do internal cleanly targeted queuing-only *layer implementation* stuff,
with "wakeup"/"scheduler"-related dependency crap then being done *outside* of
this focussed helper,
at suitable crucial situations/times during system management;
it all boils down to a matter of:
which distinct dependency components do we have in our system, and
how do we *avoid* having them all reference each other (and Santa Claus)
in an overly complex manner,
and with several global/*outer* system configuration state parts then also
*internally* interspersed to boot?
[printk_delay() is another minor example of dependency inversion, where the
global/*outer* aggregation-related configuration aspect printk_delay_msec
is handled *internally*, i.e. a probably unholy inversion against
properly separated implementation hierarchies]

. use of logbuf_lock seems to be randomly spread all over the file, rather than
  being locally contained in one linear block of
  small, reviewable number of
  clean, concise "logbuf_lock-augmenting layer" helper functions

. dump_stack_print_info() / show_regs_print_info() (etc.?) seem to be
  *users* of a successful printk() *implementation* and thus
  most certainly should not have any business whatsoever
  also sitting within
  the (already sufficiently complex / unreviewable - 3.3k lines!!)
  printk() *implementation mechanics* file

HTH & HAND,

Andreas Mohr

^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2016-09-06  7:16 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-04 16:57 [PATCH v10 0/2] printk: Make printk() completely async Sergey Senozhatsky
2016-04-04 16:57 ` [PATCH v10 1/2] " Sergey Senozhatsky
2016-04-04 22:51   ` Andrew Morton
2016-04-05  5:17     ` Sergey Senozhatsky
2016-04-05  7:39       ` Sergey Senozhatsky
2016-04-06  0:19         ` Sergey Senozhatsky
2016-04-06  8:27     ` Jan Kara
2016-04-07  9:48       ` Sergey Senozhatsky
2016-04-07 12:08         ` Sergey Senozhatsky
2016-04-07 13:15           ` Jan Kara
2016-08-10 21:17       ` Viresh Kumar
2016-08-12  9:44         ` Petr Mladek
2016-08-15 14:26           ` Vladislav Levenetz
2016-08-16  9:04             ` Petr Mladek
2016-08-18  2:27           ` Sergey Senozhatsky
2016-08-18  9:33             ` Petr Mladek
2016-08-18  9:51               ` Sergey Senozhatsky
2016-08-18 10:56                 ` Petr Mladek
2016-08-19  6:32                   ` Sergey Senozhatsky
2016-08-19  9:54                     ` Petr Mladek
2016-08-19 19:00                       ` Jan Kara
2016-08-20  5:24                         ` Sergey Senozhatsky
2016-08-22  4:15                           ` Sergey Senozhatsky
2016-08-23 12:19                             ` Petr Mladek
2016-08-24  1:33                               ` Sergey Senozhatsky
2016-08-25 21:10                             ` Petr Mladek
2016-08-26  1:56                               ` Sergey Senozhatsky
2016-08-26  8:20                                 ` Sergey Senozhatsky
2016-08-30  9:29                                 ` Petr Mladek
2016-08-31  2:31                                   ` Sergey Senozhatsky
2016-08-31  9:38                                     ` Petr Mladek
2016-08-31 12:52                                       ` Sergey Senozhatsky
2016-09-01  8:58                                         ` Petr Mladek
2016-09-02  7:58                                           ` Sergey Senozhatsky
2016-09-02 15:15                                             ` Petr Mladek
2016-09-06  7:16                                               ` Sergey Senozhatsky
2016-08-23 13:03                           ` Petr Mladek
2016-08-23 13:48                         ` Petr Mladek
2016-04-04 16:57 ` [PATCH v10 2/2] printk: Make wake_up_klogd_work_func() async Sergey Senozhatsky
2016-08-23  3:32 [PATCH v10 1/2] printk: Make printk() completely async Andreas Mohr

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.