All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC][PATCH v3 0/2] printk: Make printk() completely async
@ 2016-03-13 13:34 Sergey Senozhatsky
  2016-03-13 13:34 ` [RFC][PATCH v3 1/2] " Sergey Senozhatsky
  2016-03-13 13:34 ` [RFC][PATCH v3 2/2] printk: Skip messages on oops Sergey Senozhatsky
  0 siblings, 2 replies; 5+ messages in thread
From: Sergey Senozhatsky @ 2016-03-13 13:34 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Jan Kara, Petr Mladek, Tejun Heo, Tetsuo Handa, linux-kernel,
	Sergey Senozhatsky, Sergey Senozhatsky

Hello,

 RFC

 The patch set is based on slightly updated Jan Kara's patches.

This patch set makes printk() completely asynchronous: new messages
are getting upended to the kernel printk buffer, but instead of 'direct'
printing the actual print job is performed by a dedicated kthread.
This has the advantage that printing always happens from a schedulable
context and thus we don't lockup any particular CPU or even interrupts.


The patch set is against next-20160311 with reverted NMI patches.

The patch that makes console_unlock() asynchronous will be posted
later (once this patch set is done), as a separate further improvement.



v3 (Sergey):
-- use a dedicated kthread for printing instead of using wq (Jan, Tetsuo, Tejun)

v2 (Sergey):
- use dedicated printk workqueue with WQ_MEM_RECLAIM bit
- fallback to system-wide workqueue only if allocation of printk_wq has
  failed
- do not use system_wq as a fallback wq. both console_lock() and onsole_unlock()
  can spend a significant amount of time; so we need to use system_long_wq.
- rework sync/!sync detection logic
  a) we can have deferred (in_sched) messages before we allocate printk_wq,
     so the only way to handle those messages is via IRQ context
  b) even in printk.synchronous mode, deferred messages must not be printed
     directly, and should go via IRQ context
  c) even if we allocated printk_wq and have !sync_printk mode, we must route
     deferred messages via IRQ context
- so this adds additional bool flags to vprint_emit() and introduces a new
  pending bit to `printk_pending'
- fix build on !PRINTK configs


Jan Kara (2):
  printk: Make printk() completely async
  printk: Skip messages on oops

 Documentation/kernel-parameters.txt |  10 ++
 kernel/printk/printk.c              | 229 ++++++++++++++++++++++++++----------
 2 files changed, 180 insertions(+), 59 deletions(-)

-- 
2.8.0.rc0

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [RFC][PATCH v3 1/2] printk: Make printk() completely async
  2016-03-13 13:34 [RFC][PATCH v3 0/2] printk: Make printk() completely async Sergey Senozhatsky
@ 2016-03-13 13:34 ` Sergey Senozhatsky
  2016-03-14  7:11   ` Jan Kara
  2016-03-13 13:34 ` [RFC][PATCH v3 2/2] printk: Skip messages on oops Sergey Senozhatsky
  1 sibling, 1 reply; 5+ messages in thread
From: Sergey Senozhatsky @ 2016-03-13 13:34 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Jan Kara, Petr Mladek, Tejun Heo, Tetsuo Handa, linux-kernel,
	Sergey Senozhatsky, Sergey Senozhatsky, Jan Kara

From: Jan Kara <jack@suse.cz>

Currently, printk() sometimes waits for message to be printed to console
and sometimes it does not (when console_sem is held by some other
process). In case printk() grabs console_sem and starts printing to
console, it prints messages from kernel printk buffer until the buffer
is empty. When serial console is attached, printing is slow and thus
other CPUs in the system have plenty of time to append new messages to
the buffer while one CPU is printing. Thus the CPU can spend unbounded
amount of time doing printing in console_unlock().  This is especially
serious problem if the printk() calling console_unlock() was called with
interrupts disabled.

In practice users have observed a CPU can spend tens of seconds printing
in console_unlock() (usually during boot when hundreds of SCSI devices
are discovered) resulting in RCU stalls (CPU doing printing doesn't
reach quiescent state for a long time), softlockup reports (IPIs for the
printing CPU don't get served and thus other CPUs are spinning waiting
for the printing CPU to process IPIs), and eventually a machine death
(as messages from stalls and lockups append to printk buffer faster than
we are able to print). So these machines are unable to boot with serial
console attached. Another observed issue is that due to slow printk,
hardware discovery is slow and udev times out before kernel manages to
discover all the attached HW. Also during artificial stress testing SATA
disk disappears from the system because its interrupts aren't served for
too long.

This patch makes printk() completely asynchronous (similar to what
printk_deferred() did until now). It appends message to the kernel
printk buffer and queues work to do the printing to console. This has
the advantage that printing always happens from a schedulable contex and
thus we don't lockup any particular CPU or even interrupts. Also it has
the advantage that printk() is fast and thus kernel booting is not
slowed down by slow serial console. Disadvantage of this method is that
in case of crash there is higher chance that important messages won't
appear in console output (we may need working scheduling to print
message to console). We somewhat mitigate this risk by switching printk
to the original method of immediate printing to console if oops is in
progress.  Also for debugging purposes we provide printk.synchronous
kernel parameter which resorts to the original printk behavior.

printk() is expected to work under different conditions and in different
scenarios, including corner cases of OOM when all of the workers are busy
(e.g. allocating memory), thus printk() uses its own dedicated printing
kthread, rather than relying on workqueue (even with WQ_MEM_RECLAIM bit
set we potentially can receive delays in printing until workqueue
declares a ->mayday, as noted by Tetsuo Handa). Another thing to mention,
is that deferred printk() messages may appear before @printk_thread is
allocated, so in the very beginning we have to print deferred messages
the old way -- via IRQs.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
---
 Documentation/kernel-parameters.txt |  10 ++
 kernel/printk/printk.c              | 193 +++++++++++++++++++++++++-----------
 2 files changed, 145 insertions(+), 58 deletions(-)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 1e58ae9..454999e 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -3114,6 +3114,16 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
 	printk.time=	Show timing data prefixed to each printk message line
 			Format: <bool>  (1/Y/y=enable, 0/N/n=disable)
 
+	printk.synchronous=
+			By default kernel messages are printed to console
+			asynchronously (except during early boot or when oops
+			is happening). That avoids kernel stalling behind slow
+			serial console and thus avoids softlockups, interrupt
+			timeouts, or userspace timing out during heavy printing.
+			However for debugging problems, printing messages to
+			console immediately may be desirable. This option
+			enables such behavior.
+
 	processor.max_cstate=	[HW,ACPI]
 			Limit processor to maximum C-state
 			max_cstate=9 overrides any DMI blacklist limit.
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index d5fd844..323566b 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -46,6 +46,7 @@
 #include <linux/utsname.h>
 #include <linux/ctype.h>
 #include <linux/uio.h>
+#include <linux/kthread.h>
 
 #include <asm/uaccess.h>
 #include <asm-generic/sections.h>
@@ -284,6 +285,98 @@ static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN);
 static char *log_buf = __log_buf;
 static u32 log_buf_len = __LOG_BUF_LEN;
 
+/*
+ * When true, printing to console will happen synchronously unless someone else
+ * is already printing messages.
+ *
+ * The default value on UP systems is 'true'.
+ */
+static bool __read_mostly printk_sync = !IS_ENABLED(CONFIG_SMP);
+module_param_named(synchronous, printk_sync, bool, S_IRUGO | S_IWUSR);
+MODULE_PARM_DESC(synchronous, "make printing to console synchronous");
+
+/* Printing kthread for async vprintk_emit() */
+static struct task_struct *printk_thread;
+/* Wait for printing wakeups from async vprintk_emit() */
+static DECLARE_WAIT_QUEUE_HEAD(printing_wait);
+
+static int printing_func(void *data)
+{
+	while (1) {
+		DECLARE_WAITQUEUE(wait, current);
+		set_current_state(TASK_INTERRUPTIBLE);
+		add_wait_queue(&printing_wait, &wait);
+
+		schedule();
+		remove_wait_queue(&printing_wait, &wait);
+
+		console_lock();
+		console_unlock();
+	}
+
+	return 0;
+}
+
+static int __init init_printk_thread(void)
+{
+	if (printk_sync)
+		return 0;
+
+	printk_thread = kthread_run(printing_func, NULL, "printk");
+	BUG_ON(IS_ERR(printk_thread));
+	return 0;
+}
+late_initcall(init_printk_thread);
+
+/*
+ * Delayed printk version, for scheduler-internal messages:
+ */
+#define PRINTK_PENDING_WAKEUP	(1<<0)
+#define PRINTK_PENDING_OUTPUT	(1<<1)
+
+static DEFINE_PER_CPU(int, printk_pending);
+
+static void wake_up_klogd_work_func(struct irq_work *irq_work)
+{
+	int pending = __this_cpu_xchg(printk_pending, 0);
+
+	if (pending & PRINTK_PENDING_OUTPUT) {
+		/* If trylock fails, someone else is doing the printing */
+		if (console_trylock())
+			console_unlock();
+	}
+
+	if (pending & PRINTK_PENDING_WAKEUP)
+		wake_up_interruptible(&log_wait);
+}
+
+static DEFINE_PER_CPU(struct irq_work, wake_up_klogd_work) = {
+	.func = wake_up_klogd_work_func,
+	.flags = IRQ_WORK_LAZY,
+};
+
+void wake_up_klogd(void)
+{
+	preempt_disable();
+	if (waitqueue_active(&log_wait)) {
+		this_cpu_or(printk_pending, PRINTK_PENDING_WAKEUP);
+		irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
+	}
+	preempt_enable();
+}
+
+int printk_deferred(const char *fmt, ...)
+{
+	va_list args;
+	int r;
+
+	va_start(args, fmt);
+	r = vprintk_emit(0, LOGLEVEL_SCHED, NULL, 0, fmt, args);
+	va_end(args);
+
+	return r;
+}
+
 /* Return log buffer address */
 char *log_buf_addr_get(void)
 {
@@ -1609,6 +1702,8 @@ asmlinkage int vprintk_emit(int facility, int level,
 			    const char *dict, size_t dictlen,
 			    const char *fmt, va_list args)
 {
+	/* cpu currently holding logbuf_lock in this function */
+	static unsigned int logbuf_cpu = UINT_MAX;
 	static bool recursion_bug;
 	static char textbuf[LOG_LINE_MAX];
 	char *text = textbuf;
@@ -1618,12 +1713,17 @@ asmlinkage int vprintk_emit(int facility, int level,
 	int this_cpu;
 	int printed_len = 0;
 	bool in_sched = false;
-	/* cpu currently holding logbuf_lock in this function */
-	static unsigned int logbuf_cpu = UINT_MAX;
+	bool in_panic = console_loglevel == CONSOLE_LOGLEVEL_MOTORMOUTH;
+	bool sync_print = printk_sync;
 
 	if (level == LOGLEVEL_SCHED) {
 		level = LOGLEVEL_DEFAULT;
+		/*
+		 * Deferred sched messages must not be printed
+		 * synchronously regardless the @printk_sync or @in_panic.
+		 */
 		in_sched = true;
+		sync_print = false;
 	}
 
 	boot_delay_msec(level);
@@ -1752,10 +1852,41 @@ asmlinkage int vprintk_emit(int facility, int level,
 	logbuf_cpu = UINT_MAX;
 	raw_spin_unlock(&logbuf_lock);
 	lockdep_on();
+	/*
+	 * By default we print message to console asynchronously so that kernel
+	 * doesn't get stalled due to slow serial console. That can lead to
+	 * softlockups, lost interrupts, or userspace timing out under heavy
+	 * printing load.
+	 *
+	 * However we resort to synchronous printing of messages during early
+	 * boot, when synchronous printing was explicitly requested by
+	 * kernel parameter, or when console_verbose() was called to print
+	 * everything during panic / oops.
+	 */
+	if (!sync_print) {
+		if (printk_thread && !in_panic) {
+			/*
+			 * Wakeup the printing kthread and offload printing
+			 * to a schedulable context.
+			 */
+			wake_up(&printing_wait);
+		} else if (in_sched) {
+			/*
+			 * @in_sched messages may come too early, when we don't
+			 * yet have @printk_thread. We can't print deferred
+			 * messages directly, because this may deadlock, route
+			 * them via IRQ context.
+			 */
+			__this_cpu_or(printk_pending,
+					PRINTK_PENDING_OUTPUT);
+			irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
+		} else {
+			sync_print = true;
+		}
+	}
 	local_irq_restore(flags);
 
-	/* If called from the scheduler, we can not call up(). */
-	if (!in_sched) {
+	if (sync_print) {
 		lockdep_off();
 		/*
 		 * Try to acquire and then immediately release the console
@@ -2724,60 +2855,6 @@ late_initcall(printk_late_init);
 
 #if defined CONFIG_PRINTK
 /*
- * Delayed printk version, for scheduler-internal messages:
- */
-#define PRINTK_PENDING_WAKEUP	0x01
-#define PRINTK_PENDING_OUTPUT	0x02
-
-static DEFINE_PER_CPU(int, printk_pending);
-
-static void wake_up_klogd_work_func(struct irq_work *irq_work)
-{
-	int pending = __this_cpu_xchg(printk_pending, 0);
-
-	if (pending & PRINTK_PENDING_OUTPUT) {
-		/* If trylock fails, someone else is doing the printing */
-		if (console_trylock())
-			console_unlock();
-	}
-
-	if (pending & PRINTK_PENDING_WAKEUP)
-		wake_up_interruptible(&log_wait);
-}
-
-static DEFINE_PER_CPU(struct irq_work, wake_up_klogd_work) = {
-	.func = wake_up_klogd_work_func,
-	.flags = IRQ_WORK_LAZY,
-};
-
-void wake_up_klogd(void)
-{
-	preempt_disable();
-	if (waitqueue_active(&log_wait)) {
-		this_cpu_or(printk_pending, PRINTK_PENDING_WAKEUP);
-		irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
-	}
-	preempt_enable();
-}
-
-int printk_deferred(const char *fmt, ...)
-{
-	va_list args;
-	int r;
-
-	preempt_disable();
-	va_start(args, fmt);
-	r = vprintk_emit(0, LOGLEVEL_SCHED, NULL, 0, fmt, args);
-	va_end(args);
-
-	__this_cpu_or(printk_pending, PRINTK_PENDING_OUTPUT);
-	irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
-	preempt_enable();
-
-	return r;
-}
-
-/*
  * printk rate limiting, lifted from the networking subsystem.
  *
  * This enforces a rate limit: not more than 10 kernel messages
-- 
2.8.0.rc0

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [RFC][PATCH v3 2/2] printk: Skip messages on oops
  2016-03-13 13:34 [RFC][PATCH v3 0/2] printk: Make printk() completely async Sergey Senozhatsky
  2016-03-13 13:34 ` [RFC][PATCH v3 1/2] " Sergey Senozhatsky
@ 2016-03-13 13:34 ` Sergey Senozhatsky
  1 sibling, 0 replies; 5+ messages in thread
From: Sergey Senozhatsky @ 2016-03-13 13:34 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Jan Kara, Petr Mladek, Tejun Heo, Tetsuo Handa, linux-kernel,
	Sergey Senozhatsky, Sergey Senozhatsky, Jan Kara

From: Jan Kara <jack@suse.cz>

When there are too many messages in the kernel printk buffer it can take
very long to print them to console (especially when using slow serial
console). This is undesirable during oops so when we encounter oops and
there are more than 100 messages to print, print just the newest 100
messages and then the oops message.

[ss: fix tiny config build]
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
---
 kernel/printk/printk.c | 36 +++++++++++++++++++++++++++++++++++-
 1 file changed, 35 insertions(+), 1 deletion(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 323566b..de45d86 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -268,6 +268,9 @@ static u64 console_seq;
 static u32 console_idx;
 static enum log_flags console_prev;
 
+/* current record sequence when oops happened */
+static u64 oops_start_seq;
+
 /* the next printk record to read after the last 'clear' command */
 static u64 clear_seq;
 static u32 clear_idx;
@@ -1767,6 +1770,9 @@ asmlinkage int vprintk_emit(int facility, int level,
 					 strlen(recursion_msg));
 	}
 
+	if (oops_in_progress && !oops_start_seq)
+		oops_start_seq = log_next_seq;
+
 	/*
 	 * The printf needs to come first; we need the syslog
 	 * prefix which might be passed-in as a parameter.
@@ -2000,6 +2006,7 @@ static u64 syslog_seq;
 static u32 syslog_idx;
 static u64 console_seq;
 static u32 console_idx;
+static u64 oops_start_seq;
 static enum log_flags syslog_prev;
 static u64 log_first_seq;
 static u32 log_first_idx;
@@ -2325,6 +2332,12 @@ out:
 	raw_spin_unlock_irqrestore(&logbuf_lock, flags);
 }
 
+/*
+ * When oops happens and there are more messages to be printed in the printk
+ * buffer that this, skip some mesages and print only this many newest messages.
+ */
+#define PRINT_MSGS_BEFORE_OOPS 100
+
 /**
  * console_unlock - unlock the console system
  *
@@ -2393,7 +2406,28 @@ again:
 			seen_seq = log_next_seq;
 		}
 
-		if (console_seq < log_first_seq) {
+		/*
+		 * If oops happened and there are more than
+		 * PRINT_MSGS_BEFORE_OOPS messages pending before oops message,
+		 * skip them to make oops appear faster.
+		 */
+		if (oops_start_seq &&
+		    console_seq + PRINT_MSGS_BEFORE_OOPS < oops_start_seq) {
+			len = sprintf(text,
+				"** %u printk messages dropped due to oops ** ",
+				(unsigned)(oops_start_seq - console_seq -
+					   PRINT_MSGS_BEFORE_OOPS));
+			if (console_seq < log_first_seq) {
+				console_seq = log_first_seq;
+				console_idx = log_first_idx;
+			}
+			while (console_seq <
+			       oops_start_seq - PRINT_MSGS_BEFORE_OOPS) {
+				console_idx = log_next(console_idx);
+				console_seq++;
+			}
+			console_prev = 0;
+		} else if (console_seq < log_first_seq) {
 			len = sprintf(text, "** %u printk messages dropped ** ",
 				      (unsigned)(log_first_seq - console_seq));
 
-- 
2.8.0.rc0

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [RFC][PATCH v3 1/2] printk: Make printk() completely async
  2016-03-13 13:34 ` [RFC][PATCH v3 1/2] " Sergey Senozhatsky
@ 2016-03-14  7:11   ` Jan Kara
  2016-03-14  8:17     ` Sergey Senozhatsky
  0 siblings, 1 reply; 5+ messages in thread
From: Jan Kara @ 2016-03-14  7:11 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Andrew Morton, Jan Kara, Petr Mladek, Tejun Heo, Tetsuo Handa,
	linux-kernel, Sergey Senozhatsky, Jan Kara

On Sun 13-03-16 22:34:46, Sergey Senozhatsky wrote:
> +	/*
> +	 * By default we print message to console asynchronously so that kernel
> +	 * doesn't get stalled due to slow serial console. That can lead to
> +	 * softlockups, lost interrupts, or userspace timing out under heavy
> +	 * printing load.
> +	 *
> +	 * However we resort to synchronous printing of messages during early
> +	 * boot, when synchronous printing was explicitly requested by
> +	 * kernel parameter, or when console_verbose() was called to print
> +	 * everything during panic / oops.
> +	 */
> +	if (!sync_print) {
> +		if (printk_thread && !in_panic) {
> +			/*
> +			 * Wakeup the printing kthread and offload printing
> +			 * to a schedulable context.
> +			 */
> +			wake_up(&printing_wait);
> +		} else if (in_sched) {
> +			/*
> +			 * @in_sched messages may come too early, when we don't
> +			 * yet have @printk_thread. We can't print deferred
> +			 * messages directly, because this may deadlock, route
> +			 * them via IRQ context.
> +			 */
> +			__this_cpu_or(printk_pending,
> +					PRINTK_PENDING_OUTPUT);
> +			irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));

You cannot call wake_up() from a scheduler context - i.e., if in_sched is
true, you have to use irq work unconditionally. You can wake up kthread
from irq work or just directly print the message there. Otherwise the patch
looks fine.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC][PATCH v3 1/2] printk: Make printk() completely async
  2016-03-14  7:11   ` Jan Kara
@ 2016-03-14  8:17     ` Sergey Senozhatsky
  0 siblings, 0 replies; 5+ messages in thread
From: Sergey Senozhatsky @ 2016-03-14  8:17 UTC (permalink / raw)
  To: Jan Kara
  Cc: Sergey Senozhatsky, Andrew Morton, Jan Kara, Petr Mladek,
	Tejun Heo, Tetsuo Handa, linux-kernel, Sergey Senozhatsky

On (03/14/16 08:11), Jan Kara wrote:
[..]
> > +	if (!sync_print) {
> > +		if (printk_thread && !in_panic) {
> > +			/*
> > +			 * Wakeup the printing kthread and offload printing
> > +			 * to a schedulable context.
> > +			 */
> > +			wake_up(&printing_wait);
> > +		} else if (in_sched) {
> > +			/*
> > +			 * @in_sched messages may come too early, when we don't
> > +			 * yet have @printk_thread. We can't print deferred
> > +			 * messages directly, because this may deadlock, route
> > +			 * them via IRQ context.
> > +			 */
> > +			__this_cpu_or(printk_pending,
> > +					PRINTK_PENDING_OUTPUT);
> > +			irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
> 
> You cannot call wake_up() from a scheduler context - i.e., if in_sched is
> true, you have to use irq work unconditionally. You can wake up kthread
> from irq work or just directly print the message there. Otherwise the patch
> looks fine.

oh, thanks for catching this! you are right.

	-ss

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2016-03-14  8:16 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-13 13:34 [RFC][PATCH v3 0/2] printk: Make printk() completely async Sergey Senozhatsky
2016-03-13 13:34 ` [RFC][PATCH v3 1/2] " Sergey Senozhatsky
2016-03-14  7:11   ` Jan Kara
2016-03-14  8:17     ` Sergey Senozhatsky
2016-03-13 13:34 ` [RFC][PATCH v3 2/2] printk: Skip messages on oops Sergey Senozhatsky

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.