All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC][PATCH v2 0/2] printk: Make printk() completely async
@ 2016-03-05 10:55 Sergey Senozhatsky
  2016-03-05 10:55 ` [RFC][PATCH v2 1/2] " Sergey Senozhatsky
  2016-03-05 10:55 ` [RFC][PATCH v2 2/2] printk: Skip messages on oops Sergey Senozhatsky
  0 siblings, 2 replies; 28+ messages in thread
From: Sergey Senozhatsky @ 2016-03-05 10:55 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Jan Kara, Petr Mladek, Tejun Heo, Tetsuo Handa, linux-kernel,
	Sergey Senozhatsky, Sergey Senozhatsky

Hello,

 RFC

 The patch set is a slightly updated Jan Kara's patches. The majority
of the changes are in 0001 (described below in details).

I've performed some tests and haven't spotted any problems, I'll
certainly do more tests. Any feedback is appreciated.

I had additional patches on top of the series, but I decided not to post
them. I'm unhappy with console semaphore, this must be a separate mail
loop and a patch set, some day.


The patch set is against next-20160304 with reverted NMI patches.


v2 (Sergey):
- use dedicated printk workqueue with WQ_MEM_RECLAIM bit
- fallback to system-wide workqueue only if allocation of printk_wq has
  failed
- do not use system_wq as a fallback wq. both console_lock() and onsole_unlock()
  can spend a significant amount of time; so we need to use system_long_wq.
- rework sync/!sync detection logic
  a) we can have deferred (in_sched) messages before we allocate printk_wq,
     so the only way to handle those messages is via IRQ context
  b) even in printk.synchronous mode, deferred messages must not be printed
     directly, and should go via IRQ context
  c) even if we allocated printk_wq and have !sync_printk mode, we must route
     deferred messages via IRQ context
- so this adds additional bool flags to vprint_emit() and introduces a new
  pending bit to `printk_pending'
- fix build on !PRINTK configs

Jan Kara (2):
  printk: Make printk() completely async
  printk: Skip messages on oops

 Documentation/kernel-parameters.txt |  10 ++
 kernel/printk/printk.c              | 230 +++++++++++++++++++++++++++---------
 2 files changed, 181 insertions(+), 59 deletions(-)

-- 
2.8.0.rc0

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [RFC][PATCH v2 1/2] printk: Make printk() completely async
  2016-03-05 10:55 [RFC][PATCH v2 0/2] printk: Make printk() completely async Sergey Senozhatsky
@ 2016-03-05 10:55 ` Sergey Senozhatsky
  2016-03-06  6:32   ` Sergey Senozhatsky
  2016-03-05 10:55 ` [RFC][PATCH v2 2/2] printk: Skip messages on oops Sergey Senozhatsky
  1 sibling, 1 reply; 28+ messages in thread
From: Sergey Senozhatsky @ 2016-03-05 10:55 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Jan Kara, Petr Mladek, Tejun Heo, Tetsuo Handa, linux-kernel,
	Sergey Senozhatsky, Sergey Senozhatsky, Jan Kara

From: Jan Kara <jack@suse.cz>

Currently, printk() sometimes waits for message to be printed to console
and sometimes it does not (when console_sem is held by some other
process). In case printk() grabs console_sem and starts printing to
console, it prints messages from kernel printk buffer until the buffer
is empty. When serial console is attached, printing is slow and thus
other CPUs in the system have plenty of time to append new messages to
the buffer while one CPU is printing. Thus the CPU can spend unbounded
amount of time doing printing in console_unlock().  This is especially
serious problem if the printk() calling console_unlock() was called with
interrupts disabled.

In practice users have observed a CPU can spend tens of seconds printing
in console_unlock() (usually during boot when hundreds of SCSI devices
are discovered) resulting in RCU stalls (CPU doing printing doesn't
reach quiescent state for a long time), softlockup reports (IPIs for the
printing CPU don't get served and thus other CPUs are spinning waiting
for the printing CPU to process IPIs), and eventually a machine death
(as messages from stalls and lockups append to printk buffer faster than
we are able to print). So these machines are unable to boot with serial
console attached. Another observed issue is that due to slow printk,
hardware discovery is slow and udev times out before kernel manages to
discover all the attached HW. Also during artificial stress testing SATA
disk disappears from the system because its interrupts aren't served for
too long.

This patch makes printk() completely asynchronous (similar to what
printk_deferred() did until now). It appends message to the kernel
printk buffer and queues work to do the printing to console. This has
the advantage that printing always happens from a schedulable contex and
thus we don't lockup any particular CPU or even interrupts. Also it has
the advantage that printk() is fast and thus kernel booting is not
slowed down by slow serial console. Disadvantage of this method is that
in case of crash there is higher chance that important messages won't
appear in console output (we may need working scheduling to print
message to console). We somewhat mitigate this risk by switching printk
to the original method of immediate printing to console if oops is in
progress.  Also for debugging purposes we provide printk.synchronous
kernel parameter which resorts to the original printk behavior.

printk() is expected to work under different conditions and in different
scenarios, including corner cases of OOM when all of the workers are busy
(e.g. allocating memory). Thus by default printk() uses its own dedicated
workqueue with WQ_MEM_RECLAIM bit set. It falls back to system_long_wq
(console_unlock() is time unbound) only if it has failed to allocate
printk_wq. Another thing to mention, is that deferred printk() messages
may appear before printk_wq is allocated, so in the very beginning we
have to printk deferred messages the old way -- in IRQ context.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
---
 Documentation/kernel-parameters.txt |  10 ++
 kernel/printk/printk.c              | 194 +++++++++++++++++++++++++-----------
 2 files changed, 146 insertions(+), 58 deletions(-)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index e0a21e4..5b01941 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -3108,6 +3108,16 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
 	printk.time=	Show timing data prefixed to each printk message line
 			Format: <bool>  (1/Y/y=enable, 0/N/n=disable)
 
+	printk.synchronous=
+			By default kernel messages are printed to console
+			asynchronously (except during early boot or when oops
+			is happening). That avoids kernel stalling behind slow
+			serial console and thus avoids softlockups, interrupt
+			timeouts, or userspace timing out during heavy printing.
+			However for debugging problems, printing messages to
+			console immediately may be desirable. This option
+			enables such behavior.
+
 	processor.max_cstate=	[HW,ACPI]
 			Limit processor to maximum C-state
 			max_cstate=9 overrides any DMI blacklist limit.
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index d5fd844..843f3fd 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -46,6 +46,7 @@
 #include <linux/utsname.h>
 #include <linux/ctype.h>
 #include <linux/uio.h>
+#include <linux/workqueue.h>
 
 #include <asm/uaccess.h>
 #include <asm-generic/sections.h>
@@ -284,6 +285,108 @@ static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN);
 static char *log_buf = __log_buf;
 static u32 log_buf_len = __LOG_BUF_LEN;
 
+/*
+ * When true, printing to console will happen synchronously unless someone else
+ * is already printing messages.
+ */
+static bool __read_mostly printk_sync;
+module_param_named(synchronous, printk_sync, bool, S_IRUGO | S_IWUSR);
+MODULE_PARM_DESC(synchronous, "make printing to console synchronous");
+
+static struct workqueue_struct *printk_wq;
+
+#define PRINTK_PENDING_WAKEUP	(1<<0)
+#define PRINTK_PENDING_IRQ_OUTPUT	(1<<1)
+#define PRINTK_PENDING_WQ_OUTPUT	(1<<2)
+
+static DEFINE_PER_CPU(int, printk_pending);
+
+static int __init init_printk_workqueue(void)
+{
+	if (printk_sync)
+		return 0;
+
+	printk_wq = alloc_workqueue("printk_wq", WQ_MEM_RECLAIM, 0);
+	/*
+	 * Fallback to one of system-wide workqueues if we failed to
+	 * allocate a dedicated one. console_unlock() may take some
+	 * time to flush all of the messages, so we better use
+	 * system_long_wq.
+	 */
+	if (!printk_wq) {
+		pr_err("printk: Fallback to system_long workqueue\n");
+		printk_wq = system_long_wq;
+	}
+
+	return 0;
+}
+early_initcall(init_printk_workqueue);
+
+static void printing_work_func(struct work_struct *work)
+{
+	console_lock();
+	console_unlock();
+}
+
+static DECLARE_WORK(printing_work, printing_work_func);
+
+static void wake_up_klogd_work_func(struct irq_work *irq_work)
+{
+	int pending = __this_cpu_xchg(printk_pending, 0);
+
+	/*
+	 * We just schedule regular work to do the printing from irq work. We
+	 * don't want to do printing here directly as that happens with
+	 * interrupts disabled and thus is bad for interrupt latency. We also
+	 * don't want to queue regular work from vprintk_emit() as that gets
+	 * called in various difficult contexts where schedule_work() could
+	 * deadlock.
+	 */
+	if (pending & PRINTK_PENDING_WQ_OUTPUT)
+		queue_work(printk_wq, &printing_work);
+
+	/*
+	 * Deferred printk messages may come too early when we don't have
+	 * workqueue allocated yet. We can't print deferred messages directly,
+	 * because this may deadlock, so in the very beginning we have to
+	 * print deferred messages from IRQ context.
+	 */
+	if (pending & PRINTK_PENDING_IRQ_OUTPUT) {
+		if (console_trylock())
+			console_unlock();
+	}
+
+	if (pending & PRINTK_PENDING_WAKEUP)
+		wake_up_interruptible(&log_wait);
+}
+
+static DEFINE_PER_CPU(struct irq_work, wake_up_klogd_work) = {
+	.func = wake_up_klogd_work_func,
+	.flags = IRQ_WORK_LAZY,
+};
+
+void wake_up_klogd(void)
+{
+	preempt_disable();
+	if (waitqueue_active(&log_wait)) {
+		this_cpu_or(printk_pending, PRINTK_PENDING_WAKEUP);
+		irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
+	}
+	preempt_enable();
+}
+
+int printk_deferred(const char *fmt, ...)
+{
+	va_list args;
+	int r;
+
+	va_start(args, fmt);
+	r = vprintk_emit(0, LOGLEVEL_SCHED, NULL, 0, fmt, args);
+	va_end(args);
+
+	return r;
+}
+
 /* Return log buffer address */
 char *log_buf_addr_get(void)
 {
@@ -1609,6 +1712,8 @@ asmlinkage int vprintk_emit(int facility, int level,
 			    const char *dict, size_t dictlen,
 			    const char *fmt, va_list args)
 {
+	/* cpu currently holding logbuf_lock in this function */
+	static unsigned int logbuf_cpu = UINT_MAX;
 	static bool recursion_bug;
 	static char textbuf[LOG_LINE_MAX];
 	char *text = textbuf;
@@ -1618,12 +1723,13 @@ asmlinkage int vprintk_emit(int facility, int level,
 	int this_cpu;
 	int printed_len = 0;
 	bool in_sched = false;
-	/* cpu currently holding logbuf_lock in this function */
-	static unsigned int logbuf_cpu = UINT_MAX;
+	bool in_panic = console_loglevel == CONSOLE_LOGLEVEL_MOTORMOUTH;
+	bool sync_print = printk_sync;
 
 	if (level == LOGLEVEL_SCHED) {
 		level = LOGLEVEL_DEFAULT;
 		in_sched = true;
+		sync_print = false;
 	}
 
 	boot_delay_msec(level);
@@ -1752,10 +1858,36 @@ asmlinkage int vprintk_emit(int facility, int level,
 	logbuf_cpu = UINT_MAX;
 	raw_spin_unlock(&logbuf_lock);
 	lockdep_on();
+	/*
+	 * By default we print message to console asynchronously so that kernel
+	 * doesn't get stalled due to slow serial console. That can lead to
+	 * softlockups, lost interrupts, or userspace timing out under heavy
+	 * printing load.
+	 *
+	 * However we resort to synchronous printing of messages during early
+	 * boot, when synchronous printing was explicitly requested by
+	 * kernel parameter, or when console_verbose() was called to print
+	 * everything during panic / oops.
+	 *
+	 * Deferred messages (in_sched), however, must not be printed
+	 * synchronously.
+	 */
+	if (!sync_print) {
+		if (printk_wq && !in_panic) {
+			__this_cpu_or(printk_pending,
+					PRINTK_PENDING_WQ_OUTPUT);
+			irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
+		} else if (in_sched) {
+			__this_cpu_or(printk_pending,
+					PRINTK_PENDING_IRQ_OUTPUT);
+			irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
+		} else {
+			sync_print = true;
+		}
+	}
 	local_irq_restore(flags);
 
-	/* If called from the scheduler, we can not call up(). */
-	if (!in_sched) {
+	if (sync_print) {
 		lockdep_off();
 		/*
 		 * Try to acquire and then immediately release the console
@@ -2724,60 +2856,6 @@ late_initcall(printk_late_init);
 
 #if defined CONFIG_PRINTK
 /*
- * Delayed printk version, for scheduler-internal messages:
- */
-#define PRINTK_PENDING_WAKEUP	0x01
-#define PRINTK_PENDING_OUTPUT	0x02
-
-static DEFINE_PER_CPU(int, printk_pending);
-
-static void wake_up_klogd_work_func(struct irq_work *irq_work)
-{
-	int pending = __this_cpu_xchg(printk_pending, 0);
-
-	if (pending & PRINTK_PENDING_OUTPUT) {
-		/* If trylock fails, someone else is doing the printing */
-		if (console_trylock())
-			console_unlock();
-	}
-
-	if (pending & PRINTK_PENDING_WAKEUP)
-		wake_up_interruptible(&log_wait);
-}
-
-static DEFINE_PER_CPU(struct irq_work, wake_up_klogd_work) = {
-	.func = wake_up_klogd_work_func,
-	.flags = IRQ_WORK_LAZY,
-};
-
-void wake_up_klogd(void)
-{
-	preempt_disable();
-	if (waitqueue_active(&log_wait)) {
-		this_cpu_or(printk_pending, PRINTK_PENDING_WAKEUP);
-		irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
-	}
-	preempt_enable();
-}
-
-int printk_deferred(const char *fmt, ...)
-{
-	va_list args;
-	int r;
-
-	preempt_disable();
-	va_start(args, fmt);
-	r = vprintk_emit(0, LOGLEVEL_SCHED, NULL, 0, fmt, args);
-	va_end(args);
-
-	__this_cpu_or(printk_pending, PRINTK_PENDING_OUTPUT);
-	irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
-	preempt_enable();
-
-	return r;
-}
-
-/*
  * printk rate limiting, lifted from the networking subsystem.
  *
  * This enforces a rate limit: not more than 10 kernel messages
-- 
2.8.0.rc0

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC][PATCH v2 2/2] printk: Skip messages on oops
  2016-03-05 10:55 [RFC][PATCH v2 0/2] printk: Make printk() completely async Sergey Senozhatsky
  2016-03-05 10:55 ` [RFC][PATCH v2 1/2] " Sergey Senozhatsky
@ 2016-03-05 10:55 ` Sergey Senozhatsky
  1 sibling, 0 replies; 28+ messages in thread
From: Sergey Senozhatsky @ 2016-03-05 10:55 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Jan Kara, Petr Mladek, Tejun Heo, Tetsuo Handa, linux-kernel,
	Sergey Senozhatsky, Sergey Senozhatsky, Jan Kara

From: Jan Kara <jack@suse.cz>

When there are too many messages in the kernel printk buffer it can take
very long to print them to console (especially when using slow serial
console). This is undesirable during oops so when we encounter oops and
there are more than 100 messages to print, print just the newest 100
messages and then the oops message.

[ss: fix tiny config build]
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
---
 kernel/printk/printk.c | 36 +++++++++++++++++++++++++++++++++++-
 1 file changed, 35 insertions(+), 1 deletion(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 843f3fd..0bda01e 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -268,6 +268,9 @@ static u64 console_seq;
 static u32 console_idx;
 static enum log_flags console_prev;
 
+/* current record sequence when oops happened */
+static u64 oops_start_seq;
+
 /* the next printk record to read after the last 'clear' command */
 static u64 clear_seq;
 static u32 clear_idx;
@@ -1773,6 +1776,9 @@ asmlinkage int vprintk_emit(int facility, int level,
 					 strlen(recursion_msg));
 	}
 
+	if (oops_in_progress && !oops_start_seq)
+		oops_start_seq = log_next_seq;
+
 	/*
 	 * The printf needs to come first; we need the syslog
 	 * prefix which might be passed-in as a parameter.
@@ -2001,6 +2007,7 @@ static u64 syslog_seq;
 static u32 syslog_idx;
 static u64 console_seq;
 static u32 console_idx;
+static u64 oops_start_seq;
 static enum log_flags syslog_prev;
 static u64 log_first_seq;
 static u32 log_first_idx;
@@ -2326,6 +2333,12 @@ out:
 	raw_spin_unlock_irqrestore(&logbuf_lock, flags);
 }
 
+/*
+ * When oops happens and there are more messages to be printed in the printk
+ * buffer that this, skip some mesages and print only this many newest messages.
+ */
+#define PRINT_MSGS_BEFORE_OOPS 100
+
 /**
  * console_unlock - unlock the console system
  *
@@ -2394,7 +2407,28 @@ again:
 			seen_seq = log_next_seq;
 		}
 
-		if (console_seq < log_first_seq) {
+		/*
+		 * If oops happened and there are more than
+		 * PRINT_MSGS_BEFORE_OOPS messages pending before oops message,
+		 * skip them to make oops appear faster.
+		 */
+		if (oops_start_seq &&
+		    console_seq + PRINT_MSGS_BEFORE_OOPS < oops_start_seq) {
+			len = sprintf(text,
+				"** %u printk messages dropped due to oops ** ",
+				(unsigned)(oops_start_seq - console_seq -
+					   PRINT_MSGS_BEFORE_OOPS));
+			if (console_seq < log_first_seq) {
+				console_seq = log_first_seq;
+				console_idx = log_first_idx;
+			}
+			while (console_seq <
+			       oops_start_seq - PRINT_MSGS_BEFORE_OOPS) {
+				console_idx = log_next(console_idx);
+				console_seq++;
+			}
+			console_prev = 0;
+		} else if (console_seq < log_first_seq) {
 			len = sprintf(text, "** %u printk messages dropped ** ",
 				      (unsigned)(log_first_seq - console_seq));
 
-- 
2.8.0.rc0

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async
  2016-03-05 10:55 ` [RFC][PATCH v2 1/2] " Sergey Senozhatsky
@ 2016-03-06  6:32   ` Sergey Senozhatsky
  2016-03-06  7:18     ` Tetsuo Handa
  0 siblings, 1 reply; 28+ messages in thread
From: Sergey Senozhatsky @ 2016-03-06  6:32 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Jan Kara, Petr Mladek, Tejun Heo, Tetsuo Handa, linux-kernel,
	Sergey Senozhatsky, Sergey Senozhatsky, Jan Kara

On (03/05/16 19:55), Sergey Senozhatsky wrote:
[..]
> +static int __init init_printk_workqueue(void)
> +{
> +	if (printk_sync)
> +		return 0;
> +
> +	printk_wq = alloc_workqueue("printk_wq", WQ_MEM_RECLAIM, 0);
> +	/*
> +	 * Fallback to one of system-wide workqueues if we failed to
> +	 * allocate a dedicated one. console_unlock() may take some
> +	 * time to flush all of the messages, so we better use
> +	 * system_long_wq.
> +	 */
> +	if (!printk_wq) {
> +		pr_err("printk: Fallback to system_long workqueue\n");
> +		printk_wq = system_long_wq;
> +	}
> +
> +	return 0;
> +}
> +early_initcall(init_printk_workqueue);
  ^^^^^^
better be late_initcall(). init_workqueues(), which creates pwq_cache kmem_cache,
is early initcall.

===8<===8<===8<====

>From d9cf33958febd863ceb25c59d447aa9500427c30 Mon Sep 17 00:00:00 2001
From: Jan Kara <jack@suse.cz>
Subject: [PATCH 1/2] printk: Make printk() completely async

Currently, printk() sometimes waits for message to be printed to console
and sometimes it does not (when console_sem is held by some other
process). In case printk() grabs console_sem and starts printing to
console, it prints messages from kernel printk buffer until the buffer
is empty. When serial console is attached, printing is slow and thus
other CPUs in the system have plenty of time to append new messages to
the buffer while one CPU is printing. Thus the CPU can spend unbounded
amount of time doing printing in console_unlock().  This is especially
serious problem if the printk() calling console_unlock() was called with
interrupts disabled.

In practice users have observed a CPU can spend tens of seconds printing
in console_unlock() (usually during boot when hundreds of SCSI devices
are discovered) resulting in RCU stalls (CPU doing printing doesn't
reach quiescent state for a long time), softlockup reports (IPIs for the
printing CPU don't get served and thus other CPUs are spinning waiting
for the printing CPU to process IPIs), and eventually a machine death
(as messages from stalls and lockups append to printk buffer faster than
we are able to print). So these machines are unable to boot with serial
console attached. Another observed issue is that due to slow printk,
hardware discovery is slow and udev times out before kernel manages to
discover all the attached HW. Also during artificial stress testing SATA
disk disappears from the system because its interrupts aren't served for
too long.

This patch makes printk() completely asynchronous (similar to what
printk_deferred() did until now). It appends message to the kernel
printk buffer and queues work to do the printing to console. This has
the advantage that printing always happens from a schedulable contex and
thus we don't lockup any particular CPU or even interrupts. Also it has
the advantage that printk() is fast and thus kernel booting is not
slowed down by slow serial console. Disadvantage of this method is that
in case of crash there is higher chance that important messages won't
appear in console output (we may need working scheduling to print
message to console). We somewhat mitigate this risk by switching printk
to the original method of immediate printing to console if oops is in
progress.  Also for debugging purposes we provide printk.synchronous
kernel parameter which resorts to the original printk behavior.

printk() is expected to work under different conditions and in different
scenarios, including corner cases of OOM when all of the workers are busy
(e.g. allocating memory). Thus by default printk() uses its own dedicated
workqueue with WQ_MEM_RECLAIM bit set. It falls back to system_long_wq
(console_unlock() is time unbound) only if it has failed to allocate
printk_wq. Another thing to mention, is that deferred printk() messages
may appear before printk_wq is allocated, so in the very beginning we
have to printk deferred messages the old way -- in IRQ context.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
---
 Documentation/kernel-parameters.txt |  10 ++
 kernel/printk/printk.c              | 194 +++++++++++++++++++++++++-----------
 2 files changed, 146 insertions(+), 58 deletions(-)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index e0a21e4..5b01941 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -3108,6 +3108,16 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
 	printk.time=	Show timing data prefixed to each printk message line
 			Format: <bool>  (1/Y/y=enable, 0/N/n=disable)
 
+	printk.synchronous=
+			By default kernel messages are printed to console
+			asynchronously (except during early boot or when oops
+			is happening). That avoids kernel stalling behind slow
+			serial console and thus avoids softlockups, interrupt
+			timeouts, or userspace timing out during heavy printing.
+			However for debugging problems, printing messages to
+			console immediately may be desirable. This option
+			enables such behavior.
+
 	processor.max_cstate=	[HW,ACPI]
 			Limit processor to maximum C-state
 			max_cstate=9 overrides any DMI blacklist limit.
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index d5fd844..d287fd7 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -46,6 +46,7 @@
 #include <linux/utsname.h>
 #include <linux/ctype.h>
 #include <linux/uio.h>
+#include <linux/workqueue.h>
 
 #include <asm/uaccess.h>
 #include <asm-generic/sections.h>
@@ -284,6 +285,108 @@ static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN);
 static char *log_buf = __log_buf;
 static u32 log_buf_len = __LOG_BUF_LEN;
 
+/*
+ * When true, printing to console will happen synchronously unless someone else
+ * is already printing messages.
+ */
+static bool __read_mostly printk_sync;
+module_param_named(synchronous, printk_sync, bool, S_IRUGO | S_IWUSR);
+MODULE_PARM_DESC(synchronous, "make printing to console synchronous");
+
+static struct workqueue_struct *printk_wq;
+
+#define PRINTK_PENDING_WAKEUP	(1<<0)
+#define PRINTK_PENDING_IRQ_OUTPUT	(1<<1)
+#define PRINTK_PENDING_WQ_OUTPUT	(1<<2)
+
+static DEFINE_PER_CPU(int, printk_pending);
+
+static int __init init_printk_workqueue(void)
+{
+	if (printk_sync)
+		return 0;
+
+	printk_wq = alloc_workqueue("printk_wq", WQ_MEM_RECLAIM, 0);
+	/*
+	 * Fallback to one of system-wide workqueues if we failed to
+	 * allocate a dedicated one. console_unlock() may take some
+	 * time to flush all of the messages, so we better use
+	 * system_long_wq.
+	 */
+	if (!printk_wq) {
+		pr_err("printk: Fallback to system_long workqueue\n");
+		printk_wq = system_long_wq;
+	}
+
+	return 0;
+}
+late_initcall(init_printk_workqueue);
+
+static void printing_work_func(struct work_struct *work)
+{
+	console_lock();
+	console_unlock();
+}
+
+static DECLARE_WORK(printing_work, printing_work_func);
+
+static void wake_up_klogd_work_func(struct irq_work *irq_work)
+{
+	int pending = __this_cpu_xchg(printk_pending, 0);
+
+	/*
+	 * We just schedule regular work to do the printing from irq work. We
+	 * don't want to do printing here directly as that happens with
+	 * interrupts disabled and thus is bad for interrupt latency. We also
+	 * don't want to queue regular work from vprintk_emit() as that gets
+	 * called in various difficult contexts where schedule_work() could
+	 * deadlock.
+	 */
+	if (pending & PRINTK_PENDING_WQ_OUTPUT)
+		queue_work(printk_wq, &printing_work);
+
+	/*
+	 * Deferred printk messages may come too early when we don't have
+	 * workqueue allocated yet. We can't print deferred messages directly,
+	 * because this may deadlock, so in the very beginning we have to
+	 * print deferred messages from IRQ context.
+	 */
+	if (pending & PRINTK_PENDING_IRQ_OUTPUT) {
+		if (console_trylock())
+			console_unlock();
+	}
+
+	if (pending & PRINTK_PENDING_WAKEUP)
+		wake_up_interruptible(&log_wait);
+}
+
+static DEFINE_PER_CPU(struct irq_work, wake_up_klogd_work) = {
+	.func = wake_up_klogd_work_func,
+	.flags = IRQ_WORK_LAZY,
+};
+
+void wake_up_klogd(void)
+{
+	preempt_disable();
+	if (waitqueue_active(&log_wait)) {
+		this_cpu_or(printk_pending, PRINTK_PENDING_WAKEUP);
+		irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
+	}
+	preempt_enable();
+}
+
+int printk_deferred(const char *fmt, ...)
+{
+	va_list args;
+	int r;
+
+	va_start(args, fmt);
+	r = vprintk_emit(0, LOGLEVEL_SCHED, NULL, 0, fmt, args);
+	va_end(args);
+
+	return r;
+}
+
 /* Return log buffer address */
 char *log_buf_addr_get(void)
 {
@@ -1609,6 +1712,8 @@ asmlinkage int vprintk_emit(int facility, int level,
 			    const char *dict, size_t dictlen,
 			    const char *fmt, va_list args)
 {
+	/* cpu currently holding logbuf_lock in this function */
+	static unsigned int logbuf_cpu = UINT_MAX;
 	static bool recursion_bug;
 	static char textbuf[LOG_LINE_MAX];
 	char *text = textbuf;
@@ -1618,12 +1723,13 @@ asmlinkage int vprintk_emit(int facility, int level,
 	int this_cpu;
 	int printed_len = 0;
 	bool in_sched = false;
-	/* cpu currently holding logbuf_lock in this function */
-	static unsigned int logbuf_cpu = UINT_MAX;
+	bool in_panic = console_loglevel == CONSOLE_LOGLEVEL_MOTORMOUTH;
+	bool sync_print = printk_sync;
 
 	if (level == LOGLEVEL_SCHED) {
 		level = LOGLEVEL_DEFAULT;
 		in_sched = true;
+		sync_print = false;
 	}
 
 	boot_delay_msec(level);
@@ -1752,10 +1858,36 @@ asmlinkage int vprintk_emit(int facility, int level,
 	logbuf_cpu = UINT_MAX;
 	raw_spin_unlock(&logbuf_lock);
 	lockdep_on();
+	/*
+	 * By default we print message to console asynchronously so that kernel
+	 * doesn't get stalled due to slow serial console. That can lead to
+	 * softlockups, lost interrupts, or userspace timing out under heavy
+	 * printing load.
+	 *
+	 * However we resort to synchronous printing of messages during early
+	 * boot, when synchronous printing was explicitly requested by
+	 * kernel parameter, or when console_verbose() was called to print
+	 * everything during panic / oops.
+	 *
+	 * Deferred messages (in_sched), however, must not be printed
+	 * synchronously.
+	 */
+	if (!sync_print) {
+		if (printk_wq && !in_panic) {
+			__this_cpu_or(printk_pending,
+					PRINTK_PENDING_WQ_OUTPUT);
+			irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
+		} else if (in_sched) {
+			__this_cpu_or(printk_pending,
+					PRINTK_PENDING_IRQ_OUTPUT);
+			irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
+		} else {
+			sync_print = true;
+		}
+	}
 	local_irq_restore(flags);
 
-	/* If called from the scheduler, we can not call up(). */
-	if (!in_sched) {
+	if (sync_print) {
 		lockdep_off();
 		/*
 		 * Try to acquire and then immediately release the console
@@ -2724,60 +2856,6 @@ late_initcall(printk_late_init);
 
 #if defined CONFIG_PRINTK
 /*
- * Delayed printk version, for scheduler-internal messages:
- */
-#define PRINTK_PENDING_WAKEUP	0x01
-#define PRINTK_PENDING_OUTPUT	0x02
-
-static DEFINE_PER_CPU(int, printk_pending);
-
-static void wake_up_klogd_work_func(struct irq_work *irq_work)
-{
-	int pending = __this_cpu_xchg(printk_pending, 0);
-
-	if (pending & PRINTK_PENDING_OUTPUT) {
-		/* If trylock fails, someone else is doing the printing */
-		if (console_trylock())
-			console_unlock();
-	}
-
-	if (pending & PRINTK_PENDING_WAKEUP)
-		wake_up_interruptible(&log_wait);
-}
-
-static DEFINE_PER_CPU(struct irq_work, wake_up_klogd_work) = {
-	.func = wake_up_klogd_work_func,
-	.flags = IRQ_WORK_LAZY,
-};
-
-void wake_up_klogd(void)
-{
-	preempt_disable();
-	if (waitqueue_active(&log_wait)) {
-		this_cpu_or(printk_pending, PRINTK_PENDING_WAKEUP);
-		irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
-	}
-	preempt_enable();
-}
-
-int printk_deferred(const char *fmt, ...)
-{
-	va_list args;
-	int r;
-
-	preempt_disable();
-	va_start(args, fmt);
-	r = vprintk_emit(0, LOGLEVEL_SCHED, NULL, 0, fmt, args);
-	va_end(args);
-
-	__this_cpu_or(printk_pending, PRINTK_PENDING_OUTPUT);
-	irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
-	preempt_enable();
-
-	return r;
-}
-
-/*
  * printk rate limiting, lifted from the networking subsystem.
  *
  * This enforces a rate limit: not more than 10 kernel messages
-- 
2.8.0.rc0.1.gd285ab0

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async
  2016-03-06  6:32   ` Sergey Senozhatsky
@ 2016-03-06  7:18     ` Tetsuo Handa
  2016-03-06  9:35       ` Sergey Senozhatsky
  0 siblings, 1 reply; 28+ messages in thread
From: Tetsuo Handa @ 2016-03-06  7:18 UTC (permalink / raw)
  To: sergey.senozhatsky.work, akpm
  Cc: jack, pmladek, tj, linux-kernel, sergey.senozhatsky, jack

Sergey Senozhatsky wrote:
> printk() is expected to work under different conditions and in different
> scenarios, including corner cases of OOM when all of the workers are busy
> (e.g. allocating memory). Thus by default printk() uses its own dedicated
> workqueue with WQ_MEM_RECLAIM bit set. It falls back to system_long_wq
> (console_unlock() is time unbound) only if it has failed to allocate
> printk_wq. Another thing to mention, is that deferred printk() messages
> may appear before printk_wq is allocated, so in the very beginning we
> have to printk deferred messages the old way -- in IRQ context.

I think we should not depend on system_long_wq which does not have
WQ_MEM_RECLAIM bit. If workqueue allocation upon boot fails (due to ENOMEM),
such systems won't be able to start userspace programs.

Moreover, I don't like use of a workqueue even if printk_wq was allocated
with WQ_MEM_RECLAIM bit. As you can see in the discussion of the OOM reaper,
the OOM reaper chose a dedicated kernel thread rather than a workqueue
( http://lkml.kernel.org/r/1454505240-23446-2-git-send-email-mhocko@kernel.org ).

Blocking actual printing until ongoing workqueue item calls schedule_timeout_*()
is not nice (see commit 373ccbe59270 and 564e81a57f97). Use of WQ_MEM_RECLAIM
means we add a task_struct for that workqueue. Thus, using a kernel thread does
not change total number of task_struct compared to WQ_MEM_RECLAIM approach.
I think actual printing should occur as soon as possible rather than randomly
deferred until workqueue item sleeps.

> +static void printing_work_func(struct work_struct *work)
> +{
> +	console_lock();
> +	console_unlock();
> +}

Is this safe? If somebody invokes the OOM killer between console_lock()
and console_unlock(), won't this cause OOM killer messages not printed?

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async
  2016-03-06  7:18     ` Tetsuo Handa
@ 2016-03-06  9:35       ` Sergey Senozhatsky
  2016-03-06 11:06         ` Tetsuo Handa
  0 siblings, 1 reply; 28+ messages in thread
From: Sergey Senozhatsky @ 2016-03-06  9:35 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: sergey.senozhatsky.work, akpm, jack, pmladek, tj, linux-kernel,
	sergey.senozhatsky, jack

On (03/06/16 16:18), Tetsuo Handa wrote:
> Sergey Senozhatsky wrote:
> > printk() is expected to work under different conditions and in different
> > scenarios, including corner cases of OOM when all of the workers are busy
> > (e.g. allocating memory). Thus by default printk() uses its own dedicated
> > workqueue with WQ_MEM_RECLAIM bit set. It falls back to system_long_wq
> > (console_unlock() is time unbound) only if it has failed to allocate
> > printk_wq. Another thing to mention, is that deferred printk() messages
> > may appear before printk_wq is allocated, so in the very beginning we
> > have to printk deferred messages the old way -- in IRQ context.
> 
> I think we should not depend on system_long_wq which does not have
> WQ_MEM_RECLAIM bit. If workqueue allocation upon boot fails (due to ENOMEM),
> such systems won't be able to start userspace programs.

well, yes. system_long_wq is a fallback, but probably just BUG_ON(!printk_wq)
would be a better thing to do.

> Moreover, I don't like use of a workqueue even if printk_wq was allocated
> with WQ_MEM_RECLAIM bit. As you can see in the discussion of the OOM reaper,
> the OOM reaper chose a dedicated kernel thread rather than a workqueue
> ( http://lkml.kernel.org/r/1454505240-23446-2-git-send-email-mhocko@kernel.org ).
>
> Blocking actual printing until ongoing workqueue item calls schedule_timeout_*()
> is not nice (see commit 373ccbe59270 and 564e81a57f97).

printk and console_drivers are expected to be 'callable' from any
context, including IRQs, or under spin_locks, etc. so neither part of
	printk->console_unlock()->console_driver->write()
can sleep.

console_driver->write() is quite often something like this

	foo_write()
	{
		spin_lock_irqsave(&port->lock);
		uart_console_write( ... foo_put_char);
		spin_unlock_iqrestore(&port->lock);
	}

and foo_put_char(...), perhaps, waits for device to become ready
transaction

	while (!(UART_GET_STATUS(port) & TXEMPTY))
		cpu_relax();

and then sends out the character to the device

	UART_SET_DATA(port, (unsigned char)ch)



so an already executing printk_wq item shouldn't block because of
OOM happening on another CPU.

do you mean a new worker allocation delay and a MAYDAY timer delay?


> Use of WQ_MEM_RECLAIM means we add a task_struct for that workqueue. Thus, using
> a kernel thread does not change total number of task_struct compared to WQ_MEM_RECLAIM
> approach. I think actual printing should occur as soon as possible rather than randomly
> deferred until workqueue item sleeps.

hm, need to take a look.


> > +static void printing_work_func(struct work_struct *work)
> > +{
> > +	console_lock();
> > +	console_unlock();
> > +}
> 
> Is this safe? If somebody invokes the OOM killer between console_lock()
> and console_unlock(), won't this cause OOM killer messages not printed?

between console_lock() and console_unlock() nothing can steal that lock
from worker; so once a worker grabbed the console_lock() it will print
(and console_lock() does not mean "lock logbuf", printk() from other CPUs
can add messages to the logbuf while worker loops in console_unlock()).

if you mean something like

	console_lock();
	for (...) {
		do_foo() {
			...
				pr_err(" ... foo message ...\n");
			...
		}
	}
	console_unlock();

then yes, nothing will be printed until that process executes console_unlock(),
because it's console_unlock() that pushes the messages to console drivers.
(console_lock()/console_unlock() have other problems and issues, which are
not addressed in this patch set). there is only one function that ignores the
state of console semaphore:
	panic()->console_flush_on_panic()

but that's a separate thing, not related to this patch set. if you mean something
else here, then please more details.

	-ss

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async
  2016-03-06  9:35       ` Sergey Senozhatsky
@ 2016-03-06 11:06         ` Tetsuo Handa
  2016-03-06 13:27           ` Sergey Senozhatsky
  0 siblings, 1 reply; 28+ messages in thread
From: Tetsuo Handa @ 2016-03-06 11:06 UTC (permalink / raw)
  To: sergey.senozhatsky.work
  Cc: akpm, jack, pmladek, tj, linux-kernel, sergey.senozhatsky, jack

Sergey Senozhatsky wrote:
> On (03/06/16 16:18), Tetsuo Handa wrote:
> > Moreover, I don't like use of a workqueue even if printk_wq was allocated
> > with WQ_MEM_RECLAIM bit. As you can see in the discussion of the OOM reaper,
> > the OOM reaper chose a dedicated kernel thread rather than a workqueue
> > ( http://lkml.kernel.org/r/1454505240-23446-2-git-send-email-mhocko@kernel.org ).
> >
> > Blocking actual printing until ongoing workqueue item calls schedule_timeout_*()
> > is not nice (see commit 373ccbe59270 and 564e81a57f97).
> 
> do you mean a new worker allocation delay and a MAYDAY timer delay?
> 

I don't know what MAYDAY is. I'm talking about a situation where printing_work
work item is not processed (i.e. printing_work_func() is not called) until
current work item calls schedule_timeout_*().

We had a problem that since vmstat_work work item was using system_wq,
vmstat_work work item was not processed (i.e. vmstat_update() was not called) if
kworker was looping inside memory allocator without calling schedule_timeout_*()
due to disk_events_workfn() doing GFP_NOIO allocation).

----------
[  271.579276] MemAlloc: kworker/0:56(7399) gfp=0x2400000 order=0 delay=129294
[  271.581237]  ffff88007c78fa08 ffff8800778f8c80 ffff88007c790000 ffff8800778f8c80
[  271.583329]  0000000002400000 0000000000000000 ffff8800778f8c80 ffff88007c78fa20
[  271.585391]  ffffffff8162aa9d 0000000000000001 ffff88007c78fa30 ffffffff8162aac7
[  271.587463] Call Trace:
[  271.588512]  [<ffffffff8162aa9d>] preempt_schedule_common+0x18/0x2b
[  271.590243]  [<ffffffff8162aac7>] _cond_resched+0x17/0x20
[  271.591830]  [<ffffffff8111fafe>] __alloc_pages_nodemask+0x64e/0xcc0
[  271.593561]  [<ffffffff8116a3b2>] ? __kmalloc+0x22/0x190
[  271.595119]  [<ffffffff81160ce7>] alloc_pages_current+0x87/0x110
[  271.596778]  [<ffffffff812e95f4>] bio_copy_kern+0xc4/0x180
[  271.598342]  [<ffffffff810a6a00>] ? wait_woken+0x80/0x80
[  271.599878]  [<ffffffff812f25f0>] blk_rq_map_kern+0x70/0x130
[  271.601481]  [<ffffffff812ece35>] ? blk_get_request+0x75/0xe0
[  271.603100]  [<ffffffff814433fd>] scsi_execute+0x12d/0x160
[  271.604657]  [<ffffffff81443524>] scsi_execute_req_flags+0x84/0xf0
[  271.606339]  [<ffffffffa01db742>] sr_check_events+0xb2/0x2a0 [sr_mod]
[  271.608141]  [<ffffffff8109cbfc>] ? set_next_entity+0x6c/0x6a0
[  271.609830]  [<ffffffffa01cf163>] cdrom_check_events+0x13/0x30 [cdrom]
[  271.611610]  [<ffffffffa01dbb85>] sr_block_check_events+0x25/0x30 [sr_mod]
[  271.613429]  [<ffffffff812fc7eb>] disk_check_events+0x5b/0x150
[  271.615065]  [<ffffffff812fc8f1>] disk_events_workfn+0x11/0x20
[  271.616699]  [<ffffffff810827c5>] process_one_work+0x135/0x310
[  271.618321]  [<ffffffff81082abb>] worker_thread+0x11b/0x4a0
[  271.620018]  [<ffffffff810829a0>] ? process_one_work+0x310/0x310
[  271.622022]  [<ffffffff81087e53>] kthread+0xd3/0xf0
[  271.623533]  [<ffffffff81087d80>] ? kthread_create_on_node+0x1a0/0x1a0
[  271.625487]  [<ffffffff8162f09f>] ret_from_fork+0x3f/0x70
[  271.627175]  [<ffffffff81087d80>] ? kthread_create_on_node+0x1a0/0x1a0
----------

Even if you use printk_wq with WQ_MEM_RECLAIM for printing_work work item,
printing_work_func() will not be called until current work item calls
schedule_timeout_*(). That will be an undesirable random delay. If you use
a dedicated kernel thread rather than a dedicated workqueue with WQ_MEM_RECLAIM,
we can avoid this random delay.

> > > +static void printing_work_func(struct work_struct *work)
> > > +{
> > > +	console_lock();
> > > +	console_unlock();
> > > +}
> > 
> > Is this safe? If somebody invokes the OOM killer between console_lock()
> > and console_unlock(), won't this cause OOM killer messages not printed?
> 
> if you mean something like
> 
> 	console_lock();
> 	for (...) {
> 		do_foo() {
> 			...
> 				pr_err(" ... foo message ...\n");
> 			...
> 		}
> 	}
> 	console_unlock();
> 
> then yes, nothing will be printed until that process executes console_unlock(),
> because it's console_unlock() that pushes the messages to console drivers.

Yes, I meant a sequence like

  console_lock();
  ptr = kmalloc(GFP_KERNEL);
  kfree(ptr);
  console_unlock();

and kmalloc() prints OOM killer messages rather than failing that allocation.
Are we sure that there is no such usage?

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async
  2016-03-06 11:06         ` Tetsuo Handa
@ 2016-03-06 13:27           ` Sergey Senozhatsky
  2016-03-06 14:54             ` Tetsuo Handa
  2016-03-07  8:22             ` Jan Kara
  0 siblings, 2 replies; 28+ messages in thread
From: Sergey Senozhatsky @ 2016-03-06 13:27 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: sergey.senozhatsky.work, akpm, jack, pmladek, tj, linux-kernel,
	sergey.senozhatsky, jack

On (03/06/16 20:06), Tetsuo Handa wrote:
[..]
> > do you mean a new worker allocation delay and a MAYDAY timer delay?
> > 
> 
> I don't know what MAYDAY is. I'm talking about a situation where printing_work
> work item is not processed (i.e. printing_work_func() is not called) until
> current work item calls schedule_timeout_*().
> 
> We had a problem that since vmstat_work work item was using system_wq,
> vmstat_work work item was not processed (i.e. vmstat_update() was not called) if
> kworker was looping inside memory allocator without calling schedule_timeout_*()
> due to disk_events_workfn() doing GFP_NOIO allocation).

hm, just for note, none of system-wide wqs seem to have a ->rescuer thread
(WQ_MEM_RECLAIM).

[..]
> Even if you use printk_wq with WQ_MEM_RECLAIM for printing_work work item,
> printing_work_func() will not be called until current work item calls
> schedule_timeout_*(). That will be an undesirable random delay. If you use
> a dedicated kernel thread rather than a dedicated workqueue with WQ_MEM_RECLAIM,
> we can avoid this random delay.

hm. yes, seems that it may take some time until workqueue wakeup() a ->rescuer thread.
need to look more.

[..]
> > 	console_lock();
> > 	for (...) {
> > 		do_foo() {
> > 			...
> > 				pr_err(" ... foo message ...\n");
> > 			...
> > 		}
> > 	}
> > 	console_unlock();
> > 
> > then yes, nothing will be printed until that process executes console_unlock(),
> > because it's console_unlock() that pushes the messages to console drivers.
> 
> Yes, I meant a sequence like
> 
>   console_lock();
>   ptr = kmalloc(GFP_KERNEL);
>   kfree(ptr);
>   console_unlock();
> 
> and kmalloc() prints OOM killer messages rather than failing that allocation.
> Are we sure that there is no such usage?

such usage is quite possible.

problems that I have with console_lock()/console_unlock() is that
these functions serve a double purpose: exclusive printk() lock and a
console_drivers list lock.

**** I haven't really thought about it yet, but I want to split it. ****

console_lock()/console_unlock() can be executed by user space processes
(inside system calls). for example:

SyS_open()
 ...
  chrdev_open()
   tty_open()
    console_device()
     console_lock()
     console_unlock()
      for (;;) {
       call_console_drivers()
      }

or doing `cat /proc/consoles`

SyS_read()
 vfs_read()
  proc_reg_read()
   seq_read()
    c_stop()
     console_unlock()
      for (;;) {
       call_console_drivers()
      }

which can introduce two nasty problems:

1) console_lock() may put user space process in TASK_UNINTERRUPTIBLE for
   unknown period of time -- until current console_sem owner will not
   finish print loop in console_unlock(). no signals, heartbeats, etc.
   will be processed by this user space process.

2) user space process may have to spend an unknown period of time in
   console_unlock() later, pushing "3rd party" kernel messages to
   console drivers. again, not really good.

(kthreads can suffer here too, sure).


in the examples above, a process just wanted to iterate the console_drivers
list in read access mode. so, for instance, in

struct tty_driver *console_device(int *index)
{
        struct console *c;
        struct tty_driver *driver = NULL;

        console_lock();
        for_each_console(c) {
                if (!c->device)
                        continue;
                driver = c->device(c, index);
                if (driver)
                        break;
        }
        console_unlock();
        return driver;
}

instead of console_lock()/console_unlock()->call_console_drivers()
it could do (very schematically)

	read_lock_console();
          for_each_console(c) {
                  if (!c->device)
                          continue;
                  driver = c->device(c, index);
                  if (driver)
                          break;
          }
	read_unlock_console();

and in functions that modify the list, the lock can be acquired in
write mode. example,

int unregister_console(struct console *console)
{
        write_lock_console();
...
                for (a=console_drivers->next, b=console_drivers ;
                     a; b=a, a=b->next) {
                        if (a == console) {
                                b->next = a->next;
                                res = 0;
                                break;
                        }
                }
        }
...
        if (console_drivers != NULL && console->flags & CON_CONSDEV)
                console_drivers->flags |= CON_CONSDEV;

        console->flags &= ~CON_ENABLED;
        write_unlock_unlock();



printk(), thus, will take its own "another" exclusive lock, to guarantee
that only one process can call_console_drivers(), and it will take the
console_drivers list lock in read mode. so other process(-es) that also
want to access console_drivers list in read mode will not wait in
TASK_UNINTERRUPTIBLE.

of course, this means that console_unlock() now does not print
anything to console drivers. it's printk() duty to do this.

and the last statement can be very hard to sell; because I don't know
for sure if there is a code in the kernel that depends on the fact that
console_lock() stops printk() and that console_unlock() prints all of
the printk messages.


that "lock split" also can fix another *theoretical* scenario: suppose,
that for some reason someone's setup has both a) huge number of printk()
calls and b) a relatively huge number of console_lock()/console_unlock()
calls, all happening simultaneously. while async printk helps in (a),
case (b) "detours" the printk async code; and if cpus at lest some of the
cpus that do console_lock()/console_unlock() also disable IRQs or
preemption, then lockups and all the bad things are very much likely.


just an idea.

	-ss

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async
  2016-03-06 13:27           ` Sergey Senozhatsky
@ 2016-03-06 14:54             ` Tetsuo Handa
  2016-03-07  8:22             ` Jan Kara
  1 sibling, 0 replies; 28+ messages in thread
From: Tetsuo Handa @ 2016-03-06 14:54 UTC (permalink / raw)
  To: sergey.senozhatsky.work
  Cc: akpm, jack, pmladek, tj, linux-kernel, sergey.senozhatsky, jack

Sergey Senozhatsky wrote:
> such usage is quite possible.
> 
> problems that I have with console_lock()/console_unlock() is that
> these functions serve a double purpose: exclusive printk() lock and a
> console_drivers list lock.

Yes, I don't like it too.

> 
> **** I haven't really thought about it yet, but I want to split it. ****
> 

Since writing to console does not call schedule(), I think
rcu_read_lock()/rcu_read_unlock()/synchronize_rcu() (or synchronize_rcu_*() ?)
can manage it without using
read_lock_console()/read_unlock_console()/write_lock_console()/write_unlock_console().

Replacing console_lock()/console_unlock() for protecting console_drivers list
with RCU might be helpful.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async
  2016-03-06 13:27           ` Sergey Senozhatsky
  2016-03-06 14:54             ` Tetsuo Handa
@ 2016-03-07  8:22             ` Jan Kara
  2016-03-07 10:12               ` Sergey Senozhatsky
  2016-03-07 15:42               ` Tejun Heo
  1 sibling, 2 replies; 28+ messages in thread
From: Jan Kara @ 2016-03-07  8:22 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Tetsuo Handa, akpm, jack, pmladek, tj, linux-kernel,
	sergey.senozhatsky, jack

On Sun 06-03-16 22:27:03, Sergey Senozhatsky wrote:
> On (03/06/16 20:06), Tetsuo Handa wrote:
> [..]
> > > do you mean a new worker allocation delay and a MAYDAY timer delay?
> > > 
> > 
> > I don't know what MAYDAY is. I'm talking about a situation where printing_work
> > work item is not processed (i.e. printing_work_func() is not called) until
> > current work item calls schedule_timeout_*().
> > 
> > We had a problem that since vmstat_work work item was using system_wq,
> > vmstat_work work item was not processed (i.e. vmstat_update() was not called) if
> > kworker was looping inside memory allocator without calling schedule_timeout_*()
> > due to disk_events_workfn() doing GFP_NOIO allocation).
> 
> hm, just for note, none of system-wide wqs seem to have a ->rescuer thread
> (WQ_MEM_RECLAIM).
> 
> [..]
> > Even if you use printk_wq with WQ_MEM_RECLAIM for printing_work work item,
> > printing_work_func() will not be called until current work item calls
> > schedule_timeout_*(). That will be an undesirable random delay. If you use
> > a dedicated kernel thread rather than a dedicated workqueue with WQ_MEM_RECLAIM,
> > we can avoid this random delay.
> 
> hm. yes, seems that it may take some time until workqueue wakeup() a ->rescuer thread.
> need to look more.

Yes, it takes some time (0.1s or 2 jiffies) before workqueue code gives up
creating a worker process and wakes up rescuer thread. However I don't see
that as a problem...

> > > 	console_lock();
> > > 	for (...) {
> > > 		do_foo() {
> > > 			...
> > > 				pr_err(" ... foo message ...\n");
> > > 			...
> > > 		}
> > > 	}
> > > 	console_unlock();
> > > 
> > > then yes, nothing will be printed until that process executes console_unlock(),
> > > because it's console_unlock() that pushes the messages to console drivers.
> > 
> > Yes, I meant a sequence like
> > 
> >   console_lock();
> >   ptr = kmalloc(GFP_KERNEL);
> >   kfree(ptr);
> >   console_unlock();
> > 
> > and kmalloc() prints OOM killer messages rather than failing that allocation.
> > Are we sure that there is no such usage?
> 
> such usage is quite possible.
> 
> problems that I have with console_lock()/console_unlock() is that
> these functions serve a double purpose: exclusive printk() lock and a
> console_drivers list lock.

Well, but changing how console locking works is a separate issue, isn't it?
So please as a separate patch set if you want to try it.

Actually I don't think changing the locking will be so easy.
console_lock/unlock is used e.g. for console blanking where you need to
block any printing while you call ->unblank() for each console. That being
said I don't think improvement is impossible, just given my experiences
with console / printk code there will be surprises waiting for you :).

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async
  2016-03-07  8:22             ` Jan Kara
@ 2016-03-07 10:12               ` Sergey Senozhatsky
  2016-03-07 10:52                 ` Jan Kara
  2016-03-07 11:10                 ` Tetsuo Handa
  2016-03-07 15:42               ` Tejun Heo
  1 sibling, 2 replies; 28+ messages in thread
From: Sergey Senozhatsky @ 2016-03-07 10:12 UTC (permalink / raw)
  To: Jan Kara
  Cc: Sergey Senozhatsky, Tetsuo Handa, akpm, jack, pmladek, tj,
	linux-kernel, sergey.senozhatsky

Hello,

On (03/07/16 09:22), Jan Kara wrote:
[..]
> > hm, just for note, none of system-wide wqs seem to have a ->rescuer thread
> > (WQ_MEM_RECLAIM).
> > 
> > [..]
> > > Even if you use printk_wq with WQ_MEM_RECLAIM for printing_work work item,
> > > printing_work_func() will not be called until current work item calls
> > > schedule_timeout_*(). That will be an undesirable random delay. If you use
> > > a dedicated kernel thread rather than a dedicated workqueue with WQ_MEM_RECLAIM,
> > > we can avoid this random delay.
> > 
> > hm. yes, seems that it may take some time until workqueue wakeup() a ->rescuer thread.
> > need to look more.
> 
> Yes, it takes some time (0.1s or 2 jiffies) before workqueue code gives up
> creating a worker process and wakes up rescuer thread. However I don't see
> that as a problem...

yes, that's why I asked Tetsuo whether his concern was a wq's MAYDAY timer
delay. the two commits that Tetsuo pointed at earlier in he loop (373ccbe59270
and 564e81a57f97) solved the problem by switching to WQ_MEM_RECLAIM wq.
I've slightly tested OOM-kill on my desktop system and haven't spotted any
printk delays (well, a test on desktop is not really representative, of
course).


the only thing that so far grabbed my attention - is

	__this_cpu_or(printk_pending)
	irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));

a _theoretical_ corner case here is when we have only one CPU doing a bunch
of printk()s and this CPUs disables irqs in advance
	local_irq_save
	for (...)
		printk()
	local_irq_restore()

if no other CPUs see `printk_pending' then nothing will be printed up
until local_irq_restore() (assuming that IRQ disable time is withing
the hardlockup detection threshold). if any other CPUs concurrently
execute printk then we are fine, but
	a) if none -- then we probably have a small change in behaviour
and
	b) UP systems

[..]
> > such usage is quite possible.
> > 
> > problems that I have with console_lock()/console_unlock() is that
> > these functions serve a double purpose: exclusive printk() lock and a
> > console_drivers list lock.
> 
> Well, but changing how console locking works is a separate issue, isn't it?
> So please as a separate patch set if you want to try it.

absolutely agree, this is a separate thing.


> Actually I don't think changing the locking will be so easy.

again, agree. splitting any lock is always tricky and risky.
especially if we talk about console_sem. it can easily add up
new deadlocks, make some fbcon unhappy, etc. etc.

register_console()
	write_lock_console_lock()
		if (error)
			printk()
				printk_lock()
					read_lock_console_lock() <- eadlock

and so on and so forth; I'm not very enthusiastic to change this at
the moment.

	-ss

> console_lock/unlock is used e.g. for console blanking where you need to
> block any printing while you call ->unblank() for each console. That being
> said I don't think improvement is impossible, just given my experiences
> with console / printk code there will be surprises waiting for you :).
> 
> 								Honza
> -- 
> Jan Kara <jack@suse.com>
> SUSE Labs, CR
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async
  2016-03-07 10:12               ` Sergey Senozhatsky
@ 2016-03-07 10:52                 ` Jan Kara
  2016-03-07 12:16                   ` Jan Kara
  2016-03-07 14:40                   ` Sergey Senozhatsky
  2016-03-07 11:10                 ` Tetsuo Handa
  1 sibling, 2 replies; 28+ messages in thread
From: Jan Kara @ 2016-03-07 10:52 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Jan Kara, Tetsuo Handa, akpm, jack, pmladek, tj, linux-kernel,
	sergey.senozhatsky

On Mon 07-03-16 19:12:33, Sergey Senozhatsky wrote:
> Hello,
> 
> On (03/07/16 09:22), Jan Kara wrote:
> [..]
> > > hm, just for note, none of system-wide wqs seem to have a ->rescuer thread
> > > (WQ_MEM_RECLAIM).
> > > 
> > > [..]
> > > > Even if you use printk_wq with WQ_MEM_RECLAIM for printing_work work item,
> > > > printing_work_func() will not be called until current work item calls
> > > > schedule_timeout_*(). That will be an undesirable random delay. If you use
> > > > a dedicated kernel thread rather than a dedicated workqueue with WQ_MEM_RECLAIM,
> > > > we can avoid this random delay.
> > > 
> > > hm. yes, seems that it may take some time until workqueue wakeup() a ->rescuer thread.
> > > need to look more.
> > 
> > Yes, it takes some time (0.1s or 2 jiffies) before workqueue code gives up
> > creating a worker process and wakes up rescuer thread. However I don't see
> > that as a problem...
> 
> yes, that's why I asked Tetsuo whether his concern was a wq's MAYDAY timer
> delay. the two commits that Tetsuo pointed at earlier in he loop (373ccbe59270
> and 564e81a57f97) solved the problem by switching to WQ_MEM_RECLAIM wq.
> I've slightly tested OOM-kill on my desktop system and haven't spotted any
> printk delays (well, a test on desktop is not really representative, of
> course).
> 
> 
> the only thing that so far grabbed my attention - is
> 
> 	__this_cpu_or(printk_pending)
> 	irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
> 
> a _theoretical_ corner case here is when we have only one CPU doing a bunch
> of printk()s and this CPUs disables irqs in advance
> 	local_irq_save
> 	for (...)
> 		printk()
> 	local_irq_restore()
> 
> if no other CPUs see `printk_pending' then nothing will be printed up
> until local_irq_restore() (assuming that IRQ disable time is withing
> the hardlockup detection threshold). if any other CPUs concurrently
> execute printk then we are fine, but
> 	a) if none -- then we probably have a small change in behaviour
> and
> 	b) UP systems

So for UP systems, we should by default disable async printing anyway I
suppose. It is just a pointless overhead. So please just make printk_sync
default to true if !CONFIG_SMP.

When IRQs are disabled, you're right we will have a change in behavior. I
don't see an easy way of avoiding delaying of printk until IRQs get
enabled. I don't want to queue work directly because that creates
possibility for lock recursion in queue_work(). And playing some tricks
with irq_works isn't easy either - you cannot actually rely on any other
CPU doing anything (even a timer tick) because of NOHZ.

So if this will be a problem in practice, using a kthread will probably be
the easiest solution.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async
  2016-03-07 10:12               ` Sergey Senozhatsky
  2016-03-07 10:52                 ` Jan Kara
@ 2016-03-07 11:10                 ` Tetsuo Handa
  2016-03-07 14:36                   ` Sergey Senozhatsky
  1 sibling, 1 reply; 28+ messages in thread
From: Tetsuo Handa @ 2016-03-07 11:10 UTC (permalink / raw)
  To: sergey.senozhatsky.work
  Cc: akpm, jack, pmladek, tj, linux-kernel, sergey.senozhatsky

Sergey Senozhatsky wrote:
> Hello,
> 
> On (03/07/16 09:22), Jan Kara wrote:
> [..]
> > > hm, just for note, none of system-wide wqs seem to have a ->rescuer thread
> > > (WQ_MEM_RECLAIM).
> > > 
> > > [..]
> > > > Even if you use printk_wq with WQ_MEM_RECLAIM for printing_work work item,
> > > > printing_work_func() will not be called until current work item calls
> > > > schedule_timeout_*(). That will be an undesirable random delay. If you use
> > > > a dedicated kernel thread rather than a dedicated workqueue with WQ_MEM_RECLAIM,
> > > > we can avoid this random delay.
> > > 
> > > hm. yes, seems that it may take some time until workqueue wakeup() a ->rescuer thread.
> > > need to look more.
> > 
> > Yes, it takes some time (0.1s or 2 jiffies) before workqueue code gives up
> > creating a worker process and wakes up rescuer thread. However I don't see
> > that as a problem...
> 
> yes, that's why I asked Tetsuo whether his concern was a wq's MAYDAY timer
> delay. the two commits that Tetsuo pointed at earlier in he loop (373ccbe59270
> and 564e81a57f97) solved the problem by switching to WQ_MEM_RECLAIM wq.
> I've slightly tested OOM-kill on my desktop system and haven't spotted any
> printk delays (well, a test on desktop is not really representative, of
> course).

I wanted to tell that if kworker is running a buggy function that calls
cond_resched() but does not call schedule_timeout_*() for very long time,
such delay can become many seconds. WQ_MEM_RECLAIM is a requirement for
waking up when kworker called schedule_timeout_*(). WQ_MEM_RECLAIM wq can
still cause huge delay if kworker does not call schedule_timeout_*().
Not specific to OOM-killer or vmstat.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async
  2016-03-07 10:52                 ` Jan Kara
@ 2016-03-07 12:16                   ` Jan Kara
  2016-03-07 12:37                     ` Tetsuo Handa
                                       ` (2 more replies)
  2016-03-07 14:40                   ` Sergey Senozhatsky
  1 sibling, 3 replies; 28+ messages in thread
From: Jan Kara @ 2016-03-07 12:16 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Jan Kara, Tetsuo Handa, akpm, jack, pmladek, tj, linux-kernel,
	sergey.senozhatsky

On Mon 07-03-16 11:52:48, Jan Kara wrote:
> On Mon 07-03-16 19:12:33, Sergey Senozhatsky wrote:
> > Hello,
> > 
> > On (03/07/16 09:22), Jan Kara wrote:
> > [..]
> > > > hm, just for note, none of system-wide wqs seem to have a ->rescuer thread
> > > > (WQ_MEM_RECLAIM).
> > > > 
> > > > [..]
> > > > > Even if you use printk_wq with WQ_MEM_RECLAIM for printing_work work item,
> > > > > printing_work_func() will not be called until current work item calls
> > > > > schedule_timeout_*(). That will be an undesirable random delay. If you use
> > > > > a dedicated kernel thread rather than a dedicated workqueue with WQ_MEM_RECLAIM,
> > > > > we can avoid this random delay.
> > > > 
> > > > hm. yes, seems that it may take some time until workqueue wakeup() a ->rescuer thread.
> > > > need to look more.
> > > 
> > > Yes, it takes some time (0.1s or 2 jiffies) before workqueue code gives up
> > > creating a worker process and wakes up rescuer thread. However I don't see
> > > that as a problem...
> > 
> > yes, that's why I asked Tetsuo whether his concern was a wq's MAYDAY timer
> > delay. the two commits that Tetsuo pointed at earlier in he loop (373ccbe59270
> > and 564e81a57f97) solved the problem by switching to WQ_MEM_RECLAIM wq.
> > I've slightly tested OOM-kill on my desktop system and haven't spotted any
> > printk delays (well, a test on desktop is not really representative, of
> > course).
> > 
> > 
> > the only thing that so far grabbed my attention - is
> > 
> > 	__this_cpu_or(printk_pending)
> > 	irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
> > 
> > a _theoretical_ corner case here is when we have only one CPU doing a bunch
> > of printk()s and this CPUs disables irqs in advance
> > 	local_irq_save
> > 	for (...)
> > 		printk()
> > 	local_irq_restore()
> > 
> > if no other CPUs see `printk_pending' then nothing will be printed up
> > until local_irq_restore() (assuming that IRQ disable time is withing
> > the hardlockup detection threshold). if any other CPUs concurrently
> > execute printk then we are fine, but
> > 	a) if none -- then we probably have a small change in behaviour
> > and
> > 	b) UP systems
> 
> So for UP systems, we should by default disable async printing anyway I
> suppose. It is just a pointless overhead. So please just make printk_sync
> default to true if !CONFIG_SMP.
> 
> When IRQs are disabled, you're right we will have a change in behavior. I
> don't see an easy way of avoiding delaying of printk until IRQs get
> enabled. I don't want to queue work directly because that creates
> possibility for lock recursion in queue_work(). And playing some tricks
> with irq_works isn't easy either - you cannot actually rely on any other
> CPU doing anything (even a timer tick) because of NOHZ.
> 
> So if this will be a problem in practice, using a kthread will probably be
> the easiest solution.

Hum, and thinking more about it: Considering that WQ_MEM_RECLAIM workqueues
create kthread anyway as a rescuer thread, it may be the simplest to just
go back to using a single kthread for printing. What do you think?

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async
  2016-03-07 12:16                   ` Jan Kara
@ 2016-03-07 12:37                     ` Tetsuo Handa
  2016-03-07 15:10                     ` Sergey Senozhatsky
  2016-03-09  6:09                     ` Sergey Senozhatsky
  2 siblings, 0 replies; 28+ messages in thread
From: Tetsuo Handa @ 2016-03-07 12:37 UTC (permalink / raw)
  To: jack, sergey.senozhatsky.work
  Cc: akpm, jack, pmladek, tj, linux-kernel, sergey.senozhatsky

Jan Kara wrote:
> > So for UP systems, we should by default disable async printing anyway I
> > suppose. It is just a pointless overhead. So please just make printk_sync
> > default to true if !CONFIG_SMP.
> > 
> > When IRQs are disabled, you're right we will have a change in behavior. I
> > don't see an easy way of avoiding delaying of printk until IRQs get
> > enabled. I don't want to queue work directly because that creates
> > possibility for lock recursion in queue_work(). And playing some tricks
> > with irq_works isn't easy either - you cannot actually rely on any other
> > CPU doing anything (even a timer tick) because of NOHZ.
> > 
> > So if this will be a problem in practice, using a kthread will probably be
> > the easiest solution.
> 
> Hum, and thinking more about it: Considering that WQ_MEM_RECLAIM workqueues
> create kthread anyway as a rescuer thread, it may be the simplest to just
> go back to using a single kthread for printing. What do you think?
> 

I think a plain kthread is better. printing_work_func() can become a buggy
function. console_unlock() from printing_work_func() can occupy kworker for
many seconds due to looping with cond_resched() (i.e. without calling
schedule_timeout_*()) from schedulable context which can in turn defer
other work items.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async
  2016-03-07 11:10                 ` Tetsuo Handa
@ 2016-03-07 14:36                   ` Sergey Senozhatsky
  0 siblings, 0 replies; 28+ messages in thread
From: Sergey Senozhatsky @ 2016-03-07 14:36 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: sergey.senozhatsky.work, akpm, jack, pmladek, tj, linux-kernel,
	sergey.senozhatsky

On (03/07/16 20:10), Tetsuo Handa wrote:
[..]
> > > > hm. yes, seems that it may take some time until workqueue wakeup() a ->rescuer thread.
> > > > need to look more.
> > > 
> > > Yes, it takes some time (0.1s or 2 jiffies) before workqueue code gives up
> > > creating a worker process and wakes up rescuer thread. However I don't see
> > > that as a problem...
> > 
> > yes, that's why I asked Tetsuo whether his concern was a wq's MAYDAY timer
> > delay. the two commits that Tetsuo pointed at earlier in he loop (373ccbe59270
> > and 564e81a57f97) solved the problem by switching to WQ_MEM_RECLAIM wq.
> > I've slightly tested OOM-kill on my desktop system and haven't spotted any
> > printk delays (well, a test on desktop is not really representative, of
> > course).
> 
> I wanted to tell that if kworker is running a buggy function that calls
> cond_resched() but does not call schedule_timeout_*() for very long time,
> such delay can become many seconds. WQ_MEM_RECLAIM is a requirement for
> waking up when kworker called schedule_timeout_*(). WQ_MEM_RECLAIM wq can
> still cause huge delay if kworker does not call schedule_timeout_*().
> Not specific to OOM-killer or vmstat.

your point is taken. thanks.

	-ss

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async
  2016-03-07 10:52                 ` Jan Kara
  2016-03-07 12:16                   ` Jan Kara
@ 2016-03-07 14:40                   ` Sergey Senozhatsky
  1 sibling, 0 replies; 28+ messages in thread
From: Sergey Senozhatsky @ 2016-03-07 14:40 UTC (permalink / raw)
  To: Jan Kara
  Cc: Sergey Senozhatsky, Tetsuo Handa, akpm, jack, pmladek, tj,
	linux-kernel, sergey.senozhatsky

On (03/07/16 11:52), Jan Kara wrote:
[..]
> So for UP systems, we should by default disable async printing anyway I
> suppose. It is just a pointless overhead. So please just make printk_sync
> default to true if !CONFIG_SMP.

ok, thanks.

> When IRQs are disabled, you're right we will have a change in behavior. I
> don't see an easy way of avoiding delaying of printk until IRQs get
> enabled. I don't want to queue work directly because that creates
> possibility for lock recursion in queue_work(). And playing some tricks
> with irq_works isn't easy either - you cannot actually rely on any other
> CPU doing anything (even a timer tick) because of NOHZ.

yes. I thought about some sort of PRINTK_IPI on smp, and queue work from
PRINTK_IPI. which is a bit insane.

> So if this will be a problem in practice, using a kthread will probably be
> the easiest solution.

probably kthread is the way to go then. Tetsuo's bad experience with
workqueues also sounds a bit alarming.

I'll post a new patch soon.

	-ss

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async
  2016-03-07 12:16                   ` Jan Kara
  2016-03-07 12:37                     ` Tetsuo Handa
@ 2016-03-07 15:10                     ` Sergey Senozhatsky
  2016-03-07 15:49                       ` Tejun Heo
  2016-03-09  6:09                     ` Sergey Senozhatsky
  2 siblings, 1 reply; 28+ messages in thread
From: Sergey Senozhatsky @ 2016-03-07 15:10 UTC (permalink / raw)
  To: Jan Kara
  Cc: Sergey Senozhatsky, Tetsuo Handa, akpm, jack, pmladek, tj,
	linux-kernel, sergey.senozhatsky

On (03/07/16 13:16), Jan Kara wrote:
[..]
> > So for UP systems, we should by default disable async printing anyway I
> > suppose. It is just a pointless overhead. So please just make printk_sync
> > default to true if !CONFIG_SMP.
> > 
> > When IRQs are disabled, you're right we will have a change in behavior. I
> > don't see an easy way of avoiding delaying of printk until IRQs get
> > enabled. I don't want to queue work directly because that creates
> > possibility for lock recursion in queue_work(). And playing some tricks
> > with irq_works isn't easy either - you cannot actually rely on any other
> > CPU doing anything (even a timer tick) because of NOHZ.
> > 
> > So if this will be a problem in practice, using a kthread will probably be
> > the easiest solution.
> 
> Hum, and thinking more about it: Considering that WQ_MEM_RECLAIM workqueues
> create kthread anyway as a rescuer thread, it may be the simplest to just
> go back to using a single kthread for printing. What do you think?

A new version. Switched to [printk] kthread.
Minimally tested only.

===8<===8<===

>From 6f697488b20b356e4103ddcc4f8b0ec1fa6a9531 Mon Sep 17 00:00:00 2001
From: Jan Kara <jack@suse.cz>
Date: Mon, 7 Mar 2016 23:53:46 +0900
Subject: [PATCH 1/2] printk: Make printk() completely async

Currently, printk() sometimes waits for message to be printed to console
and sometimes it does not (when console_sem is held by some other
process). In case printk() grabs console_sem and starts printing to
console, it prints messages from kernel printk buffer until the buffer
is empty. When serial console is attached, printing is slow and thus
other CPUs in the system have plenty of time to append new messages to
the buffer while one CPU is printing. Thus the CPU can spend unbounded
amount of time doing printing in console_unlock().  This is especially
serious problem if the printk() calling console_unlock() was called with
interrupts disabled.

In practice users have observed a CPU can spend tens of seconds printing
in console_unlock() (usually during boot when hundreds of SCSI devices
are discovered) resulting in RCU stalls (CPU doing printing doesn't
reach quiescent state for a long time), softlockup reports (IPIs for the
printing CPU don't get served and thus other CPUs are spinning waiting
for the printing CPU to process IPIs), and eventually a machine death
(as messages from stalls and lockups append to printk buffer faster than
we are able to print). So these machines are unable to boot with serial
console attached. Another observed issue is that due to slow printk,
hardware discovery is slow and udev times out before kernel manages to
discover all the attached HW. Also during artificial stress testing SATA
disk disappears from the system because its interrupts aren't served for
too long.

This patch makes printk() completely asynchronous (similar to what
printk_deferred() did until now). It appends message to the kernel
printk buffer and queues work to do the printing to console. This has
the advantage that printing always happens from a schedulable contex and
thus we don't lockup any particular CPU or even interrupts. Also it has
the advantage that printk() is fast and thus kernel booting is not
slowed down by slow serial console. Disadvantage of this method is that
in case of crash there is higher chance that important messages won't
appear in console output (we may need working scheduling to print
message to console). We somewhat mitigate this risk by switching printk
to the original method of immediate printing to console if oops is in
progress.  Also for debugging purposes we provide printk.synchronous
kernel parameter which resorts to the original printk behavior.

printk() is expected to work under different conditions and in different
scenarios, including corner cases of OOM when all of the workers are busy
(e.g. allocating memory), thus printk() uses its own dedicated printing
kthread, rather than relying on workqueue (even with WQ_MEM_RECLAIM bit
set we potentially can receive delays in printing until workqueue
declares a ->mayday, as noted by Tetsuo Handa). Another thing to mention,
is that deferred printk() messages may appear before @printk_thread is
allocated, so in the very beginning we have to print deferred messages
the old way -- via IRQs.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
---
 Documentation/kernel-parameters.txt |  10 ++
 kernel/printk/printk.c              | 193 +++++++++++++++++++++++++-----------
 2 files changed, 145 insertions(+), 58 deletions(-)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index e0a21e4..5b01941 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -3108,6 +3108,16 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
 	printk.time=	Show timing data prefixed to each printk message line
 			Format: <bool>  (1/Y/y=enable, 0/N/n=disable)
 
+	printk.synchronous=
+			By default kernel messages are printed to console
+			asynchronously (except during early boot or when oops
+			is happening). That avoids kernel stalling behind slow
+			serial console and thus avoids softlockups, interrupt
+			timeouts, or userspace timing out during heavy printing.
+			However for debugging problems, printing messages to
+			console immediately may be desirable. This option
+			enables such behavior.
+
 	processor.max_cstate=	[HW,ACPI]
 			Limit processor to maximum C-state
 			max_cstate=9 overrides any DMI blacklist limit.
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index d5fd844..323566b 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -46,6 +46,7 @@
 #include <linux/utsname.h>
 #include <linux/ctype.h>
 #include <linux/uio.h>
+#include <linux/kthread.h>
 
 #include <asm/uaccess.h>
 #include <asm-generic/sections.h>
@@ -284,6 +285,98 @@ static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN);
 static char *log_buf = __log_buf;
 static u32 log_buf_len = __LOG_BUF_LEN;
 
+/*
+ * When true, printing to console will happen synchronously unless someone else
+ * is already printing messages.
+ *
+ * The default value on UP systems is 'true'.
+ */
+static bool __read_mostly printk_sync = !IS_ENABLED(CONFIG_SMP);
+module_param_named(synchronous, printk_sync, bool, S_IRUGO | S_IWUSR);
+MODULE_PARM_DESC(synchronous, "make printing to console synchronous");
+
+/* Printing kthread for async vprintk_emit() */
+static struct task_struct *printk_thread;
+/* Wait for printing wakeups from async vprintk_emit() */
+static DECLARE_WAIT_QUEUE_HEAD(printing_wait);
+
+static int printing_func(void *data)
+{
+	while (1) {
+		DECLARE_WAITQUEUE(wait, current);
+		set_current_state(TASK_INTERRUPTIBLE);
+		add_wait_queue(&printing_wait, &wait);
+
+		schedule();
+		remove_wait_queue(&printing_wait, &wait);
+
+		console_lock();
+		console_unlock();
+	}
+
+	return 0;
+}
+
+static int __init init_printk_thread(void)
+{
+	if (printk_sync)
+		return 0;
+
+	printk_thread = kthread_run(printing_func, NULL, "printk");
+	BUG_ON(IS_ERR(printk_thread));
+	return 0;
+}
+late_initcall(init_printk_thread);
+
+/*
+ * Delayed printk version, for scheduler-internal messages:
+ */
+#define PRINTK_PENDING_WAKEUP	(1<<0)
+#define PRINTK_PENDING_OUTPUT	(1<<1)
+
+static DEFINE_PER_CPU(int, printk_pending);
+
+static void wake_up_klogd_work_func(struct irq_work *irq_work)
+{
+	int pending = __this_cpu_xchg(printk_pending, 0);
+
+	if (pending & PRINTK_PENDING_OUTPUT) {
+		/* If trylock fails, someone else is doing the printing */
+		if (console_trylock())
+			console_unlock();
+	}
+
+	if (pending & PRINTK_PENDING_WAKEUP)
+		wake_up_interruptible(&log_wait);
+}
+
+static DEFINE_PER_CPU(struct irq_work, wake_up_klogd_work) = {
+	.func = wake_up_klogd_work_func,
+	.flags = IRQ_WORK_LAZY,
+};
+
+void wake_up_klogd(void)
+{
+	preempt_disable();
+	if (waitqueue_active(&log_wait)) {
+		this_cpu_or(printk_pending, PRINTK_PENDING_WAKEUP);
+		irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
+	}
+	preempt_enable();
+}
+
+int printk_deferred(const char *fmt, ...)
+{
+	va_list args;
+	int r;
+
+	va_start(args, fmt);
+	r = vprintk_emit(0, LOGLEVEL_SCHED, NULL, 0, fmt, args);
+	va_end(args);
+
+	return r;
+}
+
 /* Return log buffer address */
 char *log_buf_addr_get(void)
 {
@@ -1609,6 +1702,8 @@ asmlinkage int vprintk_emit(int facility, int level,
 			    const char *dict, size_t dictlen,
 			    const char *fmt, va_list args)
 {
+	/* cpu currently holding logbuf_lock in this function */
+	static unsigned int logbuf_cpu = UINT_MAX;
 	static bool recursion_bug;
 	static char textbuf[LOG_LINE_MAX];
 	char *text = textbuf;
@@ -1618,12 +1713,17 @@ asmlinkage int vprintk_emit(int facility, int level,
 	int this_cpu;
 	int printed_len = 0;
 	bool in_sched = false;
-	/* cpu currently holding logbuf_lock in this function */
-	static unsigned int logbuf_cpu = UINT_MAX;
+	bool in_panic = console_loglevel == CONSOLE_LOGLEVEL_MOTORMOUTH;
+	bool sync_print = printk_sync;
 
 	if (level == LOGLEVEL_SCHED) {
 		level = LOGLEVEL_DEFAULT;
+		/*
+		 * Deferred sched messages must not be printed
+		 * synchronously regardless the @printk_sync or @in_panic.
+		 */
 		in_sched = true;
+		sync_print = false;
 	}
 
 	boot_delay_msec(level);
@@ -1752,10 +1852,41 @@ asmlinkage int vprintk_emit(int facility, int level,
 	logbuf_cpu = UINT_MAX;
 	raw_spin_unlock(&logbuf_lock);
 	lockdep_on();
+	/*
+	 * By default we print message to console asynchronously so that kernel
+	 * doesn't get stalled due to slow serial console. That can lead to
+	 * softlockups, lost interrupts, or userspace timing out under heavy
+	 * printing load.
+	 *
+	 * However we resort to synchronous printing of messages during early
+	 * boot, when synchronous printing was explicitly requested by
+	 * kernel parameter, or when console_verbose() was called to print
+	 * everything during panic / oops.
+	 */
+	if (!sync_print) {
+		if (printk_thread && !in_panic) {
+			/*
+			 * Wakeup the printing kthread and offload printing
+			 * to a schedulable context.
+			 */
+			wake_up(&printing_wait);
+		} else if (in_sched) {
+			/*
+			 * @in_sched messages may come too early, when we don't
+			 * yet have @printk_thread. We can't print deferred
+			 * messages directly, because this may deadlock, route
+			 * them via IRQ context.
+			 */
+			__this_cpu_or(printk_pending,
+					PRINTK_PENDING_OUTPUT);
+			irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
+		} else {
+			sync_print = true;
+		}
+	}
 	local_irq_restore(flags);
 
-	/* If called from the scheduler, we can not call up(). */
-	if (!in_sched) {
+	if (sync_print) {
 		lockdep_off();
 		/*
 		 * Try to acquire and then immediately release the console
@@ -2724,60 +2855,6 @@ late_initcall(printk_late_init);
 
 #if defined CONFIG_PRINTK
 /*
- * Delayed printk version, for scheduler-internal messages:
- */
-#define PRINTK_PENDING_WAKEUP	0x01
-#define PRINTK_PENDING_OUTPUT	0x02
-
-static DEFINE_PER_CPU(int, printk_pending);
-
-static void wake_up_klogd_work_func(struct irq_work *irq_work)
-{
-	int pending = __this_cpu_xchg(printk_pending, 0);
-
-	if (pending & PRINTK_PENDING_OUTPUT) {
-		/* If trylock fails, someone else is doing the printing */
-		if (console_trylock())
-			console_unlock();
-	}
-
-	if (pending & PRINTK_PENDING_WAKEUP)
-		wake_up_interruptible(&log_wait);
-}
-
-static DEFINE_PER_CPU(struct irq_work, wake_up_klogd_work) = {
-	.func = wake_up_klogd_work_func,
-	.flags = IRQ_WORK_LAZY,
-};
-
-void wake_up_klogd(void)
-{
-	preempt_disable();
-	if (waitqueue_active(&log_wait)) {
-		this_cpu_or(printk_pending, PRINTK_PENDING_WAKEUP);
-		irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
-	}
-	preempt_enable();
-}
-
-int printk_deferred(const char *fmt, ...)
-{
-	va_list args;
-	int r;
-
-	preempt_disable();
-	va_start(args, fmt);
-	r = vprintk_emit(0, LOGLEVEL_SCHED, NULL, 0, fmt, args);
-	va_end(args);
-
-	__this_cpu_or(printk_pending, PRINTK_PENDING_OUTPUT);
-	irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
-	preempt_enable();
-
-	return r;
-}
-
-/*
  * printk rate limiting, lifted from the networking subsystem.
  *
  * This enforces a rate limit: not more than 10 kernel messages
-- 
2.8.0.rc0

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async
  2016-03-07  8:22             ` Jan Kara
  2016-03-07 10:12               ` Sergey Senozhatsky
@ 2016-03-07 15:42               ` Tejun Heo
  1 sibling, 0 replies; 28+ messages in thread
From: Tejun Heo @ 2016-03-07 15:42 UTC (permalink / raw)
  To: Jan Kara
  Cc: Sergey Senozhatsky, Tetsuo Handa, akpm, jack, pmladek,
	linux-kernel, sergey.senozhatsky

Hello,

On Mon, Mar 07, 2016 at 09:22:30AM +0100, Jan Kara wrote:
> > > I don't know what MAYDAY is. I'm talking about a situation where printing_work
> > > work item is not processed (i.e. printing_work_func() is not called) until
> > > current work item calls schedule_timeout_*().

That was because the work item was percpu and not marked
CPU_INTENSIVE.  Either using an unbound or CPU_INTENSIVE workqueue
should be enough.

> > > We had a problem that since vmstat_work work item was using system_wq,
> > > vmstat_work work item was not processed (i.e. vmstat_update() was not called) if
> > > kworker was looping inside memory allocator without calling schedule_timeout_*()
> > > due to disk_events_workfn() doing GFP_NOIO allocation).
> > 
> > hm, just for note, none of system-wide wqs seem to have a ->rescuer thread
> > (WQ_MEM_RECLAIM).

Because WQ_MEM_RECLAIM only guarantees concurrency of 1, it doesn't
make sense to set it to a shared workqueue.  A dedicated workquee
should be created per domain which needs forward progress guarantee.

> > hm. yes, seems that it may take some time until workqueue wakeup() a ->rescuer thread.
> > need to look more.
> 
> Yes, it takes some time (0.1s or 2 jiffies) before workqueue code gives up
> creating a worker process and wakes up rescuer thread. However I don't see
> that as a problem...

I don't think it matters.  At that point, the system should already be
thrashing heavily and everything is crawling anyway.  A couple jiffies
delay isn't gonna be noticeable.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async
  2016-03-07 15:10                     ` Sergey Senozhatsky
@ 2016-03-07 15:49                       ` Tejun Heo
  2016-03-08 10:21                         ` Sergey Senozhatsky
  0 siblings, 1 reply; 28+ messages in thread
From: Tejun Heo @ 2016-03-07 15:49 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Jan Kara, Sergey Senozhatsky, Tetsuo Handa, akpm, jack, pmladek,
	linux-kernel

Hello,

On Tue, Mar 08, 2016 at 12:10:47AM +0900, Sergey Senozhatsky wrote:
> A new version. Switched to [printk] kthread.

There are some benefits to using a percpu workqueue with CPU_INTENSIVE
set or an unbound workqueue.  It'd need WQ_RESCUER so it'd still
create a dedicated thread which is used under heavy memory pressure
but workqueue will usaully give you local (cpu or node) worker.  One
reason to use kthread directly tho is minimizing the amount of
dependency which is pretty important for printk.  If we decide to use
kthread instead of workqueue, let's please do it for the right reason.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async
  2016-03-07 15:49                       ` Tejun Heo
@ 2016-03-08 10:21                         ` Sergey Senozhatsky
  2016-03-11 17:22                           ` Tejun Heo
  0 siblings, 1 reply; 28+ messages in thread
From: Sergey Senozhatsky @ 2016-03-08 10:21 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Sergey Senozhatsky, Jan Kara, Sergey Senozhatsky, Tetsuo Handa,
	akpm, jack, pmladek, linux-kernel

Hello Tejun,

On (03/07/16 10:49), Tejun Heo wrote:
> On Tue, Mar 08, 2016 at 12:10:47AM +0900, Sergey Senozhatsky wrote:
> > A new version. Switched to [printk] kthread.
> 
> There are some benefits to using a percpu workqueue with CPU_INTENSIVE
> set or an unbound workqueue.  It'd need WQ_RESCUER so it'd still
> create a dedicated thread which is used under heavy memory pressure
> but workqueue will usaully give you local (cpu or node) worker.  One
> reason to use kthread directly tho is minimizing the amount of
> dependency which is pretty important for printk.  If we decide to use
> kthread instead of workqueue, let's please do it for the right reason.

thanks.

I'd personally prefer to go with the "less dependency" option -- a dedicated
kthread, I think. mostly for the sake of simplicity. I agree with the point
that console_unlock() has unpredictable execution time, and in general case
we would have a busy kworker (or sleeping in console_lock() or doing
cond_resched()) and an idle extra WQ_RESCUER kthread, with activation rules
that don't depend on printk. printk with dedicated printk-kthread seems
easier to control. how does it sound?

	-ss

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async
  2016-03-07 12:16                   ` Jan Kara
  2016-03-07 12:37                     ` Tetsuo Handa
  2016-03-07 15:10                     ` Sergey Senozhatsky
@ 2016-03-09  6:09                     ` Sergey Senozhatsky
  2016-03-10  9:27                       ` Jan Kara
  2016-03-10  9:53                       ` Petr Mladek
  2 siblings, 2 replies; 28+ messages in thread
From: Sergey Senozhatsky @ 2016-03-09  6:09 UTC (permalink / raw)
  To: Jan Kara
  Cc: Sergey Senozhatsky, Tetsuo Handa, akpm, jack, pmladek, tj,
	linux-kernel, sergey.senozhatsky

Hello Jan,

On (03/07/16 13:16), Jan Kara wrote:
[..]
> > So if this will be a problem in practice, using a kthread will probably be
> > the easiest solution.
> 
> Hum, and thinking more about it: Considering that WQ_MEM_RECLAIM workqueues
> create kthread anyway as a rescuer thread, it may be the simplest to just
> go back to using a single kthread for printing. What do you think?

I have this patch on top of the series now. In short, it closes one more
possibility of lockups -- console_lock()/console_unlock() calls. the patch
splits console_unlock() in two parts:
-- the fast one just wake up printing kthread
-- the slow one does call_console_drivers() loop

I think it sort of makes sense to tweak the patch below a bit and fold it
into 0001, and move _some_ of the vprintk_emit() checks to console_unlock().

very schematically, after folding, vprintk_emit() will be

	if (in_sched) {
		if (!printk_sync && printk_thread)
			wake_up()
		else
			irq_work_queue()
	}
	
	if (!in_sched)
		if (console_trylock())
			console_unlock()

and console_unlock() will be

	if (!in_panic && !printk_sync && printk_thread) {
		up_console_sem()
		wake_up()
	} else {
		console_unlock_for_printk()
	}

console_unlock_for_printk() does the call_console_drivers() loop.

console_flush_on_panic() and printing_func() call console_unlock_for_printk()
directly.


What do you think? Or would you prefer to first introduce async
printk() rework, and move to console_unlock() in vprintk_emit()
one release cycle later?
IOW, in 3 steps:
-- first make printk() async
-- then console_unlock() async, and use console_unlock_for_printk() in
   vprintk_emit()
-- then switch to console_unlock() in vprintk_emit().


below is the patch which introduces console_unlock_for_printk().
not the squashed console_unlock_for_printk() and 0001.

	-ss

======

>From bc3932c68c5afb9bf43af98335c705c75067a93a Mon Sep 17 00:00:00 2001
From: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Subject: [PATCH 3/4] printk: introduce console_unlock_for_printk()

Even though we already have asynchronous printk()->vprintk_emit(),
there are still good chances to get lockups, because we don't have
asynchronous console_unlock(). So any process doing console_lock()
and console_unlock() will end up looping in console_unlock(), pushing
the messages to console drivers (possibly with IRQs or preemption
disabled), regardless the fact that we have a dedicated kthread for
that particular job.

Apart from that, console_lock()/console_unlock() can be executed by
user processes as a part of system calls:

a)  SyS_open()
     ...
      chrdev_open()
       tty_open()
        console_device()
         console_lock()
         console_unlock()
          for (;;) {
           call_console_drivers()
          }

b) SyS_read()
    ...
     sysfs_read_file()
      dev_attr_show()
       show_cons_active()
        console_lock()
        console_unlock()
         for (;;) {
          call_console_drivers()
         }

c) doing `cat /proc/consoles`
  SyS_read()
   vfs_read()
    proc_reg_read()
     seq_read()
      c_stop()
       console_unlock()
        for (;;) {
         call_console_drivers()
        }

etc.

This can add unnecessary latencies to the user space processes.

This patch splits console_unlock() in two parts:
-- the fast path up() console semaphore and wake up printing kthread
   (if there is one, of course), otherwise
-- the slow path: does what console_unlock() did previously, emit
   the messages and then up() console semaphore

The actual printing loop is, thus, moved to a new function,
console_unlock_for_printk(). There are 3 places that
unconditionally call it:
 a) direct printing from vprintk_emit()
 b) console_flush_on_panic()
 c) printing kthread callback

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
---
 kernel/printk/printk.c | 51 +++++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 44 insertions(+), 7 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index de45d86..ddaf62e 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -303,6 +303,8 @@ static struct task_struct *printk_thread;
 /* Wait for printing wakeups from async vprintk_emit() */
 static DECLARE_WAIT_QUEUE_HEAD(printing_wait);
 
+static void console_unlock_for_printk(void);
+
 static int printing_func(void *data)
 {
 	while (1) {
@@ -314,7 +316,7 @@ static int printing_func(void *data)
 		remove_wait_queue(&printing_wait, &wait);
 
 		console_lock();
-		console_unlock();
+		console_unlock_for_printk();
 	}
 
 	return 0;
@@ -1900,7 +1902,7 @@ asmlinkage int vprintk_emit(int facility, int level,
 		 * /dev/kmsg and syslog() users.
 		 */
 		if (console_trylock())
-			console_unlock();
+			console_unlock_for_printk();
 		lockdep_on();
 	}
 
@@ -2339,20 +2341,20 @@ out:
 #define PRINT_MSGS_BEFORE_OOPS 100
 
 /**
- * console_unlock - unlock the console system
+ * console_unlock_for_printk - unlock the console system
  *
  * Releases the console_lock which the caller holds on the console system
  * and the console driver list.
  *
  * While the console_lock was held, console output may have been buffered
- * by printk().  If this is the case, console_unlock(); emits
+ * by printk().  If this is the case, console_unlock_for_printk() emits
  * the output prior to releasing the lock.
  *
  * If there is output waiting, we wake /dev/kmsg and syslog() users.
  *
- * console_unlock(); may be called from any context.
+ * console_unlock_for_printk() may be called from any context.
  */
-void console_unlock(void)
+static void console_unlock_for_printk(void)
 {
 	static char ext_text[CONSOLE_EXT_LOG_MAX];
 	static char text[LOG_LINE_MAX + PREFIX_MAX];
@@ -2511,6 +2513,41 @@ skip:
 	if (wake_klogd)
 		wake_up_klogd();
 }
+
+
+/**
+ * console_unlock - unlock the console system
+ *
+ * Releases the console_lock which the caller holds on the console system.
+ *
+ * The fast path is to wake up the printing kthread (if the system can
+ * perform asynchronous printing) and return; the slow path is to emit
+ * the messages directly invoking console_unlock_for_printk().
+ *
+ * console_unlock() may be called from any context.
+ */
+void console_unlock(void)
+{
+	bool in_panic = console_loglevel == CONSOLE_LOGLEVEL_MOTORMOUTH;
+
+	if (in_panic) {
+		/*
+		 * If the system is in panic console_flush_on_panic() issued
+		 * from panic_cpu will flush the messages.
+		 */
+		console_locked = 0;
+		up_console_sem();
+		return;
+	}
+
+	if (!printk_sync && printk_thread) {
+		console_locked = 0;
+		up_console_sem();
+		wake_up(&printing_wait);
+	} else {
+		console_unlock_for_printk();
+	}
+}
 EXPORT_SYMBOL(console_unlock);
 
 /**
@@ -2567,7 +2604,7 @@ void console_flush_on_panic(void)
 	 */
 	console_trylock();
 	console_may_schedule = 0;
-	console_unlock();
+	console_unlock_for_printk();
 }
 
 /*
-- 
2.8.0.rc0.1.gd285ab0

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async
  2016-03-09  6:09                     ` Sergey Senozhatsky
@ 2016-03-10  9:27                       ` Jan Kara
  2016-03-10 15:48                         ` Sergey Senozhatsky
  2016-03-10  9:53                       ` Petr Mladek
  1 sibling, 1 reply; 28+ messages in thread
From: Jan Kara @ 2016-03-10  9:27 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Jan Kara, Tetsuo Handa, akpm, jack, pmladek, tj, linux-kernel,
	sergey.senozhatsky

On Wed 09-03-16 15:09:50, Sergey Senozhatsky wrote:
> Hello Jan,
> 
> On (03/07/16 13:16), Jan Kara wrote:
> [..]
> > > So if this will be a problem in practice, using a kthread will probably be
> > > the easiest solution.
> > 
> > Hum, and thinking more about it: Considering that WQ_MEM_RECLAIM workqueues
> > create kthread anyway as a rescuer thread, it may be the simplest to just
> > go back to using a single kthread for printing. What do you think?
> 
> I have this patch on top of the series now. In short, it closes one more
> possibility of lockups -- console_lock()/console_unlock() calls. the patch
> splits console_unlock() in two parts:
> -- the fast one just wake up printing kthread
> -- the slow one does call_console_drivers() loop
> 
> I think it sort of makes sense to tweak the patch below a bit and fold it
> into 0001, and move _some_ of the vprintk_emit() checks to console_unlock().
> 
> very schematically, after folding, vprintk_emit() will be
> 
> 	if (in_sched) {
> 		if (!printk_sync && printk_thread)
> 			wake_up()
> 		else
> 			irq_work_queue()
> 	}
> 	
> 	if (!in_sched)
> 		if (console_trylock())
> 			console_unlock()
> 
> and console_unlock() will be
> 
> 	if (!in_panic && !printk_sync && printk_thread) {
> 		up_console_sem()
> 		wake_up()
> 	} else {
> 		console_unlock_for_printk()
> 	}
> 
> console_unlock_for_printk() does the call_console_drivers() loop.
> 
> console_flush_on_panic() and printing_func() call console_unlock_for_printk()
> directly.
> 
> 
> What do you think? Or would you prefer to first introduce async
> printk() rework, and move to console_unlock() in vprintk_emit()
> one release cycle later?
> IOW, in 3 steps:
> -- first make printk() async
> -- then console_unlock() async, and use console_unlock_for_printk() in
>    vprintk_emit()
> -- then switch to console_unlock() in vprintk_emit().
> 
> 
> below is the patch which introduces console_unlock_for_printk().
> not the squashed console_unlock_for_printk() and 0001.

So I think this should definitely stay as a separate patch since it
possibly changes user visible behavior and sometimes blocking may be
actually desirable for userspace. I don't have that strong opinion whether
it should be in a separate patch set or part of this one. Maybe a separate
patch set would be somewhat better so that we first hash out possible issues
with async printk and once that's settled we start messing more with the
code changing the behavior of console_unlock() as well.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async
  2016-03-09  6:09                     ` Sergey Senozhatsky
  2016-03-10  9:27                       ` Jan Kara
@ 2016-03-10  9:53                       ` Petr Mladek
  2016-03-10 16:26                         ` Sergey Senozhatsky
  1 sibling, 1 reply; 28+ messages in thread
From: Petr Mladek @ 2016-03-10  9:53 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Jan Kara, Tetsuo Handa, akpm, jack, tj, linux-kernel, sergey.senozhatsky

On Wed 2016-03-09 15:09:50, Sergey Senozhatsky wrote:
> On (03/07/16 13:16), Jan Kara wrote:
> What do you think? Or would you prefer to first introduce async
> printk() rework, and move to console_unlock() in vprintk_emit()
> one release cycle later?
> IOW, in 3 steps:
> -- first make printk() async
> -- then console_unlock() async, and use console_unlock_for_printk() in
>    vprintk_emit()
>
> -- then switch to console_unlock() in vprintk_emit().

I would sort this by priorities. I know about real-world problems
that will get solved by async printk. I haven't heard yet people
complaining about blocked console_lock()/console_unlock() calls
outside printk code. So, I would personally prefer to handle
async printk first.

Heh, you opened an interesting can of worms. There are definitely
locations that just want to manipulate the list of consoles and
their setting without the need to push the date. I wonder how
many locations really need to push the data.

Note that console_unlock_for_printk() might be a bit
misleading. Especially when you suggest to replace it by
console_unlock() in vprintk_emit() ;-) I wonder if
console_flush_and_unlock() might be more descriptive.

We might even split flush_console() into a separate function in the end.
I think that the combination with unlock() is there to make sure
that somebody will flush the last messages from printk(), see
the retry stuff. It probably won't be needed with the asynch printk().

Anyway, all these console_unlock() changes looks like another big step
and I suggest to do it separately.

I looking forward to see where this ends.

Thanks,
Petr

PS: I want to check more precisely the async printk patchset but
I am repeatedy sidetracked this week :-(

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async
  2016-03-10  9:27                       ` Jan Kara
@ 2016-03-10 15:48                         ` Sergey Senozhatsky
  0 siblings, 0 replies; 28+ messages in thread
From: Sergey Senozhatsky @ 2016-03-10 15:48 UTC (permalink / raw)
  To: Jan Kara
  Cc: Sergey Senozhatsky, Tetsuo Handa, akpm, jack, pmladek, tj,
	linux-kernel, sergey.senozhatsky

On (03/10/16 10:27), Jan Kara wrote:
[..]
> So I think this should definitely stay as a separate patch since it
> possibly changes user visible behavior and sometimes blocking may be
> actually desirable for userspace. I don't have that strong opinion whether
> it should be in a separate patch set or part of this one. Maybe a separate
> patch set would be somewhat better so that we first hash out possible issues
> with async printk and once that's settled we start messing more with the
> code changing the behavior of console_unlock() as well.

agree, thanks. I'll split the series and submit console_unlock() rework
separately.

	-ss

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async
  2016-03-10  9:53                       ` Petr Mladek
@ 2016-03-10 16:26                         ` Sergey Senozhatsky
  0 siblings, 0 replies; 28+ messages in thread
From: Sergey Senozhatsky @ 2016-03-10 16:26 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Jan Kara, Tetsuo Handa, akpm, jack, tj,
	linux-kernel, sergey.senozhatsky

On (03/10/16 10:53), Petr Mladek wrote:
[..]
> On Wed 2016-03-09 15:09:50, Sergey Senozhatsky wrote:
> > On (03/07/16 13:16), Jan Kara wrote:
> > What do you think? Or would you prefer to first introduce async
> > printk() rework, and move to console_unlock() in vprintk_emit()
> > one release cycle later?
> > IOW, in 3 steps:
> > -- first make printk() async
> > -- then console_unlock() async, and use console_unlock_for_printk() in
> >    vprintk_emit()
> >
> > -- then switch to console_unlock() in vprintk_emit().
> 
> I would sort this by priorities.

I agree, let's settle down async printk() first.

> I know about real-world problems that will get solved by
> async printk. I haven't heard yet people complaining about
> blocked console_lock()/console_unlock() calls outside printk
> code. So, I would personally prefer to handle async printk
> first.

well, I see some problems with console_lock()/console_unlock()  :)

> Heh, you opened an interesting can of worms. There are definitely
> locations that just want to manipulate the list of consoles and
> their setting without the need to push the date. I wonder how
> many locations really need to push the data.

I've tested it briefly on some of the setups that I have around,
and the boot time reduced by (very roughly) ~20+%; systemd and
friends do a number of tty/etc. calls, and stuck in console_unock()
each time. of course, the "pre-condition" here are printk()s from
drivers/etc. (frequent enough to keep call_console_drivers() busy,
not necessarily "pressure").

even on my laptop, userspace does a ton of console_unlock()

...
  [<ffffffff8108a904>] console_unlock+0x24/0x89
  [<ffffffff8108ba76>] console_device+0x4a/0x54
  [<ffffffff81261fbb>] tty_open+0x127/0x4c5
  [<ffffffff81145316>] chrdev_open+0x13f/0x164
  [<ffffffff811451d7>] ? cdev_put+0x23/0x23
  [<ffffffff8113fc88>] do_dentry_open.isra.1+0x1b3/0x29e
  [<ffffffff81140791>] vfs_open+0x53/0x58
  [<ffffffff8114e40d>] path_openat+0xa37/0xc8c
  [<ffffffff8114f2a2>] do_filp_open+0x4d/0xa3
  [<ffffffff8115c1f2>] ? __alloc_fd+0x1ae/0x1c0
  [<ffffffff813acdff>] ? _raw_spin_unlock+0x27/0x31
  [<ffffffff81140a8b>] do_sys_open+0x13c/0x1cc
  [<ffffffff81140a8b>] ? do_sys_open+0x13c/0x1cc
  [<ffffffff81140b39>] SyS_open+0x1e/0x20
  [<ffffffff813ad4a5>] entry_SYSCALL_64_fastpath+0x18/0xa8

____fput()->console_unlock()

  [<ffffffff8108a904>] console_unlock+0x24/0x89
  [<ffffffff81271910>] con_shutdown+0x2d/0x30
  [<ffffffff8125e99d>] release_tty+0x52/0x12e
  [<ffffffff81260722>] tty_release+0x436/0x453
  [<ffffffff81142bb3>] __fput+0x107/0x1ba
  [<ffffffff81142c9c>] ____fput+0xe/0x10
  [<ffffffff8105d504>] task_work_run+0x67/0x90
  [<ffffffff810011cc>] exit_to_usermode_loop+0x66/0x84
  [<ffffffff8100179c>] syscall_return_slowpath+0x8d/0x92
  [<ffffffff813ad533>] entry_SYSCALL_64_fastpath+0xa6/0xa8
...

etc.


> Note that console_unlock_for_printk() might be a bit
> misleading. Especially when you suggest to replace it by
> console_unlock() in vprintk_emit() ;-) I wonder if
> console_flush_and_unlock() might be more descriptive.

oh, yes, the function name was absolutely random.
console_flush_and_unlock() looks good.

> We might even split flush_console() into a separate function in the end.
> I think that the combination with unlock() is there to make sure
> that somebody will flush the last messages from printk(), see
> the retry stuff. It probably won't be needed with the asynch printk().
> 
> Anyway, all these console_unlock() changes looks like another big step
> and I suggest to do it separately.

ok.

> PS: I want to check more precisely the async printk patchset but
> I am repeatedy sidetracked this week :-(

no prob! it's a pre-merge period, no pressure.

I'll re-spin the printk() patch tomorrow, I think.
async console_unlock() will be separated.

	-ss

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async
  2016-03-08 10:21                         ` Sergey Senozhatsky
@ 2016-03-11 17:22                           ` Tejun Heo
  2016-03-12  5:01                             ` Sergey Senozhatsky
  0 siblings, 1 reply; 28+ messages in thread
From: Tejun Heo @ 2016-03-11 17:22 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Sergey Senozhatsky, Jan Kara, Tetsuo Handa, akpm, jack, pmladek,
	linux-kernel

Hello, Sergey.

On Tue, Mar 08, 2016 at 07:21:52PM +0900, Sergey Senozhatsky wrote:
> I'd personally prefer to go with the "less dependency" option -- a dedicated
> kthread, I think. mostly for the sake of simplicity. I agree with the point
> that console_unlock() has unpredictable execution time, and in general case
> we would have a busy kworker (or sleeping in console_lock() or doing
> cond_resched()) and an idle extra WQ_RESCUER kthread, with activation rules
> that don't depend on printk. printk with dedicated printk-kthread seems
> easier to control. how does it sound?

I don't think it makes sense to avoid workqueue for execution latency.
The only case which can matter is the rescuer case and as I wrote
before the system is already in an extremely high latency mode by the
time rescuer is needed, so it's unlikely to make noticeable
differences.

However, I agree that using kthread is a good idea here just to reduce
the amount of dependency as prink working even during complex failures
is important.  workqueue itself is fairly complex and it also requires
timer and task creation to work correctly for proper operation.
That's a lot of extra dependency.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async
  2016-03-11 17:22                           ` Tejun Heo
@ 2016-03-12  5:01                             ` Sergey Senozhatsky
  0 siblings, 0 replies; 28+ messages in thread
From: Sergey Senozhatsky @ 2016-03-12  5:01 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Jan Kara, Tetsuo Handa,
	akpm, jack, pmladek, linux-kernel

Hello Tejun,

On (03/11/16 12:22), Tejun Heo wrote:
> On Tue, Mar 08, 2016 at 07:21:52PM +0900, Sergey Senozhatsky wrote:
> > I'd personally prefer to go with the "less dependency" option -- a dedicated
> > kthread, I think. mostly for the sake of simplicity. I agree with the point
> > that console_unlock() has unpredictable execution time, and in general case
> > we would have a busy kworker (or sleeping in console_lock() or doing
> > cond_resched()) and an idle extra WQ_RESCUER kthread, with activation rules
> > that don't depend on printk. printk with dedicated printk-kthread seems
> > easier to control. how does it sound?
> 
> I don't think it makes sense to avoid workqueue for execution latency.
> The only case which can matter is the rescuer case and as I wrote
> before the system is already in an extremely high latency mode by the
> time rescuer is needed, so it's unlikely to make noticeable
> differences.
>
> However, I agree that using kthread is a good idea here just to reduce
> the amount of dependency as prink working even during complex failures
> is important.  workqueue itself is fairly complex and it also requires
> timer and task creation to work correctly for proper operation.
> That's a lot of extra dependency.

Thanks!

I agree that, in some cases (if not in most of them) the "value" of printk()
output is inversely proportional to the system health -- the worst the state,
the more attention people pay to printk() output; so a simpler solution here
gives more confidence.

	-ss

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2016-03-12  5:03 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-05 10:55 [RFC][PATCH v2 0/2] printk: Make printk() completely async Sergey Senozhatsky
2016-03-05 10:55 ` [RFC][PATCH v2 1/2] " Sergey Senozhatsky
2016-03-06  6:32   ` Sergey Senozhatsky
2016-03-06  7:18     ` Tetsuo Handa
2016-03-06  9:35       ` Sergey Senozhatsky
2016-03-06 11:06         ` Tetsuo Handa
2016-03-06 13:27           ` Sergey Senozhatsky
2016-03-06 14:54             ` Tetsuo Handa
2016-03-07  8:22             ` Jan Kara
2016-03-07 10:12               ` Sergey Senozhatsky
2016-03-07 10:52                 ` Jan Kara
2016-03-07 12:16                   ` Jan Kara
2016-03-07 12:37                     ` Tetsuo Handa
2016-03-07 15:10                     ` Sergey Senozhatsky
2016-03-07 15:49                       ` Tejun Heo
2016-03-08 10:21                         ` Sergey Senozhatsky
2016-03-11 17:22                           ` Tejun Heo
2016-03-12  5:01                             ` Sergey Senozhatsky
2016-03-09  6:09                     ` Sergey Senozhatsky
2016-03-10  9:27                       ` Jan Kara
2016-03-10 15:48                         ` Sergey Senozhatsky
2016-03-10  9:53                       ` Petr Mladek
2016-03-10 16:26                         ` Sergey Senozhatsky
2016-03-07 14:40                   ` Sergey Senozhatsky
2016-03-07 11:10                 ` Tetsuo Handa
2016-03-07 14:36                   ` Sergey Senozhatsky
2016-03-07 15:42               ` Tejun Heo
2016-03-05 10:55 ` [RFC][PATCH v2 2/2] printk: Skip messages on oops Sergey Senozhatsky

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.