linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC][PATCH v4 0/2] printk: Make printk() completely async
@ 2016-03-14 14:13 Sergey Senozhatsky
  2016-03-14 14:13 ` [RFC][PATCH v4 1/2] " Sergey Senozhatsky
                   ` (2 more replies)
  0 siblings, 3 replies; 23+ messages in thread
From: Sergey Senozhatsky @ 2016-03-14 14:13 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Jan Kara, Petr Mladek, Tejun Heo, Tetsuo Handa, linux-kernel,
	Byungchul Park, Sergey Senozhatsky, Sergey Senozhatsky

Hello,

 RFC

 The patch set is based on slightly updated Jan Kara's patches.

This patch set makes printk() completely asynchronous: new messages
are getting upended to the kernel printk buffer, but instead of 'direct'
printing the actual print job is performed by a dedicated kthread.
This has the advantage that printing always happens from a schedulable
context and thus we don't lockup any particular CPU or even interrupts.


The patch set is against next-20160311 with reverted NMI patches.

The patch that makes console_unlock() asynchronous will be posted
later (once this patch set is done; or may be sligthly earlier, so
it'll see wider testing), as a separate further improvement.


v4 (Sergey):
-- do not directly wake_up() the printing kthread from vprintk_emit(), need
   to go via IRQ->wake_up() to avoid sched deadlocks (Jan)

v3 (Sergey):
-- use a dedicated kthread for printing instead of using wq (Jan, Tetsuo, Tejun)

v2 (Sergey):
- use dedicated printk workqueue with WQ_MEM_RECLAIM bit
- fallback to system-wide workqueue only if allocation of printk_wq has
  failed
- do not use system_wq as a fallback wq. both console_lock() and onsole_unlock()
  can spend a significant amount of time; so we need to use system_long_wq.
- rework sync/!sync detection logic
  a) we can have deferred (in_sched) messages before we allocate printk_wq,
     so the only way to handle those messages is via IRQ context
  b) even in printk.synchronous mode, deferred messages must not be printed
     directly, and should go via IRQ context
  c) even if we allocated printk_wq and have !sync_printk mode, we must route
     deferred messages via IRQ context
- so this adds additional bool flags to vprint_emit() and introduces a new
  pending bit to `printk_pending'
- fix build on !PRINTK configs


Jan Kara (2):
  printk: Make printk() completely async
  printk: Skip messages on oops

 Documentation/kernel-parameters.txt |  10 ++
 kernel/printk/printk.c              | 235 +++++++++++++++++++++++++++---------
 2 files changed, 186 insertions(+), 59 deletions(-)

-- 
2.8.0.rc0

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [RFC][PATCH v4 1/2] printk: Make printk() completely async
  2016-03-14 14:13 [RFC][PATCH v4 0/2] printk: Make printk() completely async Sergey Senozhatsky
@ 2016-03-14 14:13 ` Sergey Senozhatsky
  2016-03-15 10:03   ` Jan Kara
  2016-03-15 15:58   ` Petr Mladek
  2016-03-14 14:13 ` [RFC][PATCH v4 2/2] printk: Skip messages on oops Sergey Senozhatsky
  2016-04-23 19:36 ` [RFC][PATCH v4 0/2] printk: Make printk() completely async Pavel Machek
  2 siblings, 2 replies; 23+ messages in thread
From: Sergey Senozhatsky @ 2016-03-14 14:13 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Jan Kara, Petr Mladek, Tejun Heo, Tetsuo Handa, linux-kernel,
	Byungchul Park, Sergey Senozhatsky, Sergey Senozhatsky, Jan Kara

From: Jan Kara <jack@suse.cz>

Currently, printk() sometimes waits for message to be printed to console
and sometimes it does not (when console_sem is held by some other
process). In case printk() grabs console_sem and starts printing to
console, it prints messages from kernel printk buffer until the buffer
is empty. When serial console is attached, printing is slow and thus
other CPUs in the system have plenty of time to append new messages to
the buffer while one CPU is printing. Thus the CPU can spend unbounded
amount of time doing printing in console_unlock().  This is especially
serious problem if the printk() calling console_unlock() was called with
interrupts disabled.

In practice users have observed a CPU can spend tens of seconds printing
in console_unlock() (usually during boot when hundreds of SCSI devices
are discovered) resulting in RCU stalls (CPU doing printing doesn't
reach quiescent state for a long time), softlockup reports (IPIs for the
printing CPU don't get served and thus other CPUs are spinning waiting
for the printing CPU to process IPIs), and eventually a machine death
(as messages from stalls and lockups append to printk buffer faster than
we are able to print). So these machines are unable to boot with serial
console attached. Another observed issue is that due to slow printk,
hardware discovery is slow and udev times out before kernel manages to
discover all the attached HW. Also during artificial stress testing SATA
disk disappears from the system because its interrupts aren't served for
too long.

This patch makes printk() completely asynchronous (similar to what
printk_deferred() did until now). It appends message to the kernel
printk buffer and wake_up()s a special dedicated kthread to do the
printing to console. This has the advantage that printing always happens
from a schedulable contex and thus we don't lockup any particular CPU or
even interrupts. Also it has the advantage that printk() is fast and
thus kernel booting is not slowed down by slow serial console.
Disadvantage of this method is that in case of crash there is higher
chance that important messages won't appear in console output (we may
need working scheduling to print message to console). We somewhat
mitigate this risk by switching printk to the original method of
immediate printing to console if oops is in progress.  Also for
debugging purposes we provide printk.synchronous kernel parameter which
resorts to the original printk behavior.

printk() is expected to work under different conditions and in different
scenarios, including corner cases of OOM when all of the workers are busy
(e.g. allocating memory), thus printk() uses its own dedicated printing
kthread, rather than relying on workqueue (even with WQ_MEM_RECLAIM bit
set we potentially can receive delays in printing until workqueue
declares a ->mayday, as noted by Tetsuo Handa). Another thing to mention,
is that deferred printk() messages may appear before printing kthread
created, so in the very beginning we have to print deferred messages
the old way -- via IRQs.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
---
 Documentation/kernel-parameters.txt |  10 ++
 kernel/printk/printk.c              | 199 +++++++++++++++++++++++++-----------
 2 files changed, 151 insertions(+), 58 deletions(-)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 1e58ae9..454999e 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -3114,6 +3114,16 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
 	printk.time=	Show timing data prefixed to each printk message line
 			Format: <bool>  (1/Y/y=enable, 0/N/n=disable)
 
+	printk.synchronous=
+			By default kernel messages are printed to console
+			asynchronously (except during early boot or when oops
+			is happening). That avoids kernel stalling behind slow
+			serial console and thus avoids softlockups, interrupt
+			timeouts, or userspace timing out during heavy printing.
+			However for debugging problems, printing messages to
+			console immediately may be desirable. This option
+			enables such behavior.
+
 	processor.max_cstate=	[HW,ACPI]
 			Limit processor to maximum C-state
 			max_cstate=9 overrides any DMI blacklist limit.
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index d5fd844..3c4f8b7 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -46,6 +46,7 @@
 #include <linux/utsname.h>
 #include <linux/ctype.h>
 #include <linux/uio.h>
+#include <linux/kthread.h>
 
 #include <asm/uaccess.h>
 #include <asm-generic/sections.h>
@@ -284,6 +285,102 @@ static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN);
 static char *log_buf = __log_buf;
 static u32 log_buf_len = __LOG_BUF_LEN;
 
+/*
+ * When true, printing to console will happen synchronously unless someone else
+ * is already printing messages.
+ *
+ * The default value on UP systems is 'true'.
+ */
+static bool __read_mostly printk_sync = !IS_ENABLED(CONFIG_SMP);
+module_param_named(synchronous, printk_sync, bool, S_IRUGO | S_IWUSR);
+MODULE_PARM_DESC(synchronous, "make printing to console synchronous");
+
+/* Printing kthread for async vprintk_emit() */
+static struct task_struct *printk_thread;
+/* Wait for printing wakeups from async vprintk_emit() */
+static DECLARE_WAIT_QUEUE_HEAD(printing_wait);
+
+static int printing_func(void *data)
+{
+	while (1) {
+		DECLARE_WAITQUEUE(wait, current);
+		set_current_state(TASK_INTERRUPTIBLE);
+		add_wait_queue(&printing_wait, &wait);
+
+		schedule();
+		remove_wait_queue(&printing_wait, &wait);
+
+		console_lock();
+		console_unlock();
+	}
+
+	return 0;
+}
+
+static int __init init_printk_thread(void)
+{
+	if (printk_sync)
+		return 0;
+
+	printk_thread = kthread_run(printing_func, NULL, "printk");
+	BUG_ON(IS_ERR(printk_thread));
+	return 0;
+}
+late_initcall(init_printk_thread);
+
+/*
+ * Delayed printk version, for scheduler-internal messages:
+ */
+#define PRINTK_PENDING_WAKEUP	(1<<0)
+#define PRINTK_PENDING_IRQ_OUTPUT	(1<<1)
+#define PRINTK_PENDING_KTHREAD_OUTPUT	(1<<2)
+
+static DEFINE_PER_CPU(int, printk_pending);
+
+static void wake_up_klogd_work_func(struct irq_work *irq_work)
+{
+	int pending = __this_cpu_xchg(printk_pending, 0);
+
+	if (pending & PRINTK_PENDING_KTHREAD_OUTPUT)
+		wake_up(&printing_wait);
+
+	if (pending & PRINTK_PENDING_IRQ_OUTPUT) {
+		/* If trylock fails, someone else is doing the printing */
+		if (console_trylock())
+			console_unlock();
+	}
+
+	if (pending & PRINTK_PENDING_WAKEUP)
+		wake_up_interruptible(&log_wait);
+}
+
+static DEFINE_PER_CPU(struct irq_work, wake_up_klogd_work) = {
+	.func = wake_up_klogd_work_func,
+	.flags = IRQ_WORK_LAZY,
+};
+
+void wake_up_klogd(void)
+{
+	preempt_disable();
+	if (waitqueue_active(&log_wait)) {
+		this_cpu_or(printk_pending, PRINTK_PENDING_WAKEUP);
+		irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
+	}
+	preempt_enable();
+}
+
+int printk_deferred(const char *fmt, ...)
+{
+	va_list args;
+	int r;
+
+	va_start(args, fmt);
+	r = vprintk_emit(0, LOGLEVEL_SCHED, NULL, 0, fmt, args);
+	va_end(args);
+
+	return r;
+}
+
 /* Return log buffer address */
 char *log_buf_addr_get(void)
 {
@@ -1609,6 +1706,8 @@ asmlinkage int vprintk_emit(int facility, int level,
 			    const char *dict, size_t dictlen,
 			    const char *fmt, va_list args)
 {
+	/* cpu currently holding logbuf_lock in this function */
+	static unsigned int logbuf_cpu = UINT_MAX;
 	static bool recursion_bug;
 	static char textbuf[LOG_LINE_MAX];
 	char *text = textbuf;
@@ -1618,12 +1717,17 @@ asmlinkage int vprintk_emit(int facility, int level,
 	int this_cpu;
 	int printed_len = 0;
 	bool in_sched = false;
-	/* cpu currently holding logbuf_lock in this function */
-	static unsigned int logbuf_cpu = UINT_MAX;
+	bool in_panic = console_loglevel == CONSOLE_LOGLEVEL_MOTORMOUTH;
+	bool sync_print = printk_sync;
 
 	if (level == LOGLEVEL_SCHED) {
 		level = LOGLEVEL_DEFAULT;
+		/*
+		 * Deferred sched messages must not be printed
+		 * synchronously regardless the @printk_sync or @in_panic.
+		 */
 		in_sched = true;
+		sync_print = false;
 	}
 
 	boot_delay_msec(level);
@@ -1752,10 +1856,43 @@ asmlinkage int vprintk_emit(int facility, int level,
 	logbuf_cpu = UINT_MAX;
 	raw_spin_unlock(&logbuf_lock);
 	lockdep_on();
+	/*
+	 * By default we print message to console asynchronously so that kernel
+	 * doesn't get stalled due to slow serial console. That can lead to
+	 * softlockups, lost interrupts, or userspace timing out under heavy
+	 * printing load.
+	 *
+	 * However we resort to synchronous printing of messages during early
+	 * boot, when synchronous printing was explicitly requested by
+	 * kernel parameter, or when console_verbose() was called to print
+	 * everything during panic / oops.
+	 */
+	if (!sync_print) {
+		if (printk_thread && !in_panic) {
+			/*
+			 * This will wakeup the printing kthread and offload
+			 * printing to a schedulable context.
+			 */
+			__this_cpu_or(printk_pending,
+					PRINTK_PENDING_KTHREAD_OUTPUT);
+			irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
+		} else if (in_sched) {
+			/*
+			 * @in_sched messages may come too early, when we don't
+			 * yet have @printk_thread. We can't print deferred
+			 * messages directly, because this may deadlock, route
+			 * them via IRQ context.
+			 */
+			__this_cpu_or(printk_pending,
+					PRINTK_PENDING_IRQ_OUTPUT);
+			irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
+		} else {
+			sync_print = true;
+		}
+	}
 	local_irq_restore(flags);
 
-	/* If called from the scheduler, we can not call up(). */
-	if (!in_sched) {
+	if (sync_print) {
 		lockdep_off();
 		/*
 		 * Try to acquire and then immediately release the console
@@ -2724,60 +2861,6 @@ late_initcall(printk_late_init);
 
 #if defined CONFIG_PRINTK
 /*
- * Delayed printk version, for scheduler-internal messages:
- */
-#define PRINTK_PENDING_WAKEUP	0x01
-#define PRINTK_PENDING_OUTPUT	0x02
-
-static DEFINE_PER_CPU(int, printk_pending);
-
-static void wake_up_klogd_work_func(struct irq_work *irq_work)
-{
-	int pending = __this_cpu_xchg(printk_pending, 0);
-
-	if (pending & PRINTK_PENDING_OUTPUT) {
-		/* If trylock fails, someone else is doing the printing */
-		if (console_trylock())
-			console_unlock();
-	}
-
-	if (pending & PRINTK_PENDING_WAKEUP)
-		wake_up_interruptible(&log_wait);
-}
-
-static DEFINE_PER_CPU(struct irq_work, wake_up_klogd_work) = {
-	.func = wake_up_klogd_work_func,
-	.flags = IRQ_WORK_LAZY,
-};
-
-void wake_up_klogd(void)
-{
-	preempt_disable();
-	if (waitqueue_active(&log_wait)) {
-		this_cpu_or(printk_pending, PRINTK_PENDING_WAKEUP);
-		irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
-	}
-	preempt_enable();
-}
-
-int printk_deferred(const char *fmt, ...)
-{
-	va_list args;
-	int r;
-
-	preempt_disable();
-	va_start(args, fmt);
-	r = vprintk_emit(0, LOGLEVEL_SCHED, NULL, 0, fmt, args);
-	va_end(args);
-
-	__this_cpu_or(printk_pending, PRINTK_PENDING_OUTPUT);
-	irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
-	preempt_enable();
-
-	return r;
-}
-
-/*
  * printk rate limiting, lifted from the networking subsystem.
  *
  * This enforces a rate limit: not more than 10 kernel messages
-- 
2.8.0.rc0

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [RFC][PATCH v4 2/2] printk: Skip messages on oops
  2016-03-14 14:13 [RFC][PATCH v4 0/2] printk: Make printk() completely async Sergey Senozhatsky
  2016-03-14 14:13 ` [RFC][PATCH v4 1/2] " Sergey Senozhatsky
@ 2016-03-14 14:13 ` Sergey Senozhatsky
  2016-03-17 10:56   ` Sergey Senozhatsky
  2016-04-23 19:36 ` [RFC][PATCH v4 0/2] printk: Make printk() completely async Pavel Machek
  2 siblings, 1 reply; 23+ messages in thread
From: Sergey Senozhatsky @ 2016-03-14 14:13 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Jan Kara, Petr Mladek, Tejun Heo, Tetsuo Handa, linux-kernel,
	Byungchul Park, Sergey Senozhatsky, Sergey Senozhatsky, Jan Kara

From: Jan Kara <jack@suse.cz>

When there are too many messages in the kernel printk buffer it can take
very long to print them to console (especially when using slow serial
console). This is undesirable during oops so when we encounter oops and
there are more than 100 messages to print, print just the newest 100
messages and then the oops message.

[ss: fix tiny config build]
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
---
 kernel/printk/printk.c | 36 +++++++++++++++++++++++++++++++++++-
 1 file changed, 35 insertions(+), 1 deletion(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 3c4f8b7..cb15c7b 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -268,6 +268,9 @@ static u64 console_seq;
 static u32 console_idx;
 static enum log_flags console_prev;
 
+/* current record sequence when oops happened */
+static u64 oops_start_seq;
+
 /* the next printk record to read after the last 'clear' command */
 static u64 clear_seq;
 static u32 clear_idx;
@@ -1771,6 +1774,9 @@ asmlinkage int vprintk_emit(int facility, int level,
 					 strlen(recursion_msg));
 	}
 
+	if (oops_in_progress && !oops_start_seq)
+		oops_start_seq = log_next_seq;
+
 	/*
 	 * The printf needs to come first; we need the syslog
 	 * prefix which might be passed-in as a parameter.
@@ -2006,6 +2012,7 @@ static u64 syslog_seq;
 static u32 syslog_idx;
 static u64 console_seq;
 static u32 console_idx;
+static u64 oops_start_seq;
 static enum log_flags syslog_prev;
 static u64 log_first_seq;
 static u32 log_first_idx;
@@ -2331,6 +2338,12 @@ out:
 	raw_spin_unlock_irqrestore(&logbuf_lock, flags);
 }
 
+/*
+ * When oops happens and there are more messages to be printed in the printk
+ * buffer that this, skip some mesages and print only this many newest messages.
+ */
+#define PRINT_MSGS_BEFORE_OOPS 100
+
 /**
  * console_unlock - unlock the console system
  *
@@ -2399,7 +2412,28 @@ again:
 			seen_seq = log_next_seq;
 		}
 
-		if (console_seq < log_first_seq) {
+		/*
+		 * If oops happened and there are more than
+		 * PRINT_MSGS_BEFORE_OOPS messages pending before oops message,
+		 * skip them to make oops appear faster.
+		 */
+		if (oops_start_seq &&
+		    console_seq + PRINT_MSGS_BEFORE_OOPS < oops_start_seq) {
+			len = sprintf(text,
+				"** %u printk messages dropped due to oops ** ",
+				(unsigned)(oops_start_seq - console_seq -
+					   PRINT_MSGS_BEFORE_OOPS));
+			if (console_seq < log_first_seq) {
+				console_seq = log_first_seq;
+				console_idx = log_first_idx;
+			}
+			while (console_seq <
+			       oops_start_seq - PRINT_MSGS_BEFORE_OOPS) {
+				console_idx = log_next(console_idx);
+				console_seq++;
+			}
+			console_prev = 0;
+		} else if (console_seq < log_first_seq) {
 			len = sprintf(text, "** %u printk messages dropped ** ",
 				      (unsigned)(log_first_seq - console_seq));
 
-- 
2.8.0.rc0

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [RFC][PATCH v4 1/2] printk: Make printk() completely async
  2016-03-14 14:13 ` [RFC][PATCH v4 1/2] " Sergey Senozhatsky
@ 2016-03-15 10:03   ` Jan Kara
  2016-03-15 14:07     ` Sergey Senozhatsky
  2016-03-15 15:58   ` Petr Mladek
  1 sibling, 1 reply; 23+ messages in thread
From: Jan Kara @ 2016-03-15 10:03 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Andrew Morton, Jan Kara, Petr Mladek, Tejun Heo, Tetsuo Handa,
	linux-kernel, Byungchul Park, Sergey Senozhatsky, Jan Kara

> +	if (!sync_print) {
> +		if (printk_thread && !in_panic) {
> +			/*
> +			 * This will wakeup the printing kthread and offload
> +			 * printing to a schedulable context.
> +			 */
> +			__this_cpu_or(printk_pending,
> +					PRINTK_PENDING_KTHREAD_OUTPUT);
> +			irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
> +		} else if (in_sched) {
> +			/*
> +			 * @in_sched messages may come too early, when we don't
> +			 * yet have @printk_thread. We can't print deferred
> +			 * messages directly, because this may deadlock, route
> +			 * them via IRQ context.
> +			 */
> +			__this_cpu_or(printk_pending,
> +					PRINTK_PENDING_IRQ_OUTPUT);
> +			irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
> +		} else {
> +			sync_print = true;
> +		}
> +	}

I'm a bit undecided whether we want to go through irq work even for the
common case of !in_sched messages or whether we want to directly call
wake_up() in that case. Maybe I'd do it like:

	if (!sync_print) {
		if (in_sched) {
			__this_cpu_or(printk_pending,
				      PRINTK_PENDING_IRQ_OUTPUT);
			irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
		} else if (printk_thread && !in_panic) {
			wake_up(&printing_wait);
		} else {
			sync_print = true;
		}
	}

and the wake_up_klogd_work_func() would look like:

static void wake_up_klogd_work_func(struct irq_work *irq_work)
{
	int pending = __this_cpu_xchg(printk_pending, 0);

	if (pending & PRINTK_PENDING_OUTPUT) {
		if (printk_thread) {
			wake_up(&printing_wait);
		} else {
			/*
			 * If trylock fails, someone else is doing the printing
			 */
			if (console_trylock())
				console_unlock();
		}
	}
	...

								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC][PATCH v4 1/2] printk: Make printk() completely async
  2016-03-15 10:03   ` Jan Kara
@ 2016-03-15 14:07     ` Sergey Senozhatsky
  2016-03-16  5:39       ` Byungchul Park
  0 siblings, 1 reply; 23+ messages in thread
From: Sergey Senozhatsky @ 2016-03-15 14:07 UTC (permalink / raw)
  To: Jan Kara
  Cc: Sergey Senozhatsky, Andrew Morton, Jan Kara, Petr Mladek,
	Tejun Heo, Tetsuo Handa, linux-kernel, Byungchul Park,
	Sergey Senozhatsky

On (03/15/16 11:03), Jan Kara wrote:
[..]
> > +	if (!sync_print) {
> > +		if (printk_thread && !in_panic) {
> > +			/*
> > +			 * This will wakeup the printing kthread and offload
> > +			 * printing to a schedulable context.
> > +			 */
> > +			__this_cpu_or(printk_pending,
> > +					PRINTK_PENDING_KTHREAD_OUTPUT);
> > +			irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
> > +		} else if (in_sched) {
> > +			/*
> > +			 * @in_sched messages may come too early, when we don't
> > +			 * yet have @printk_thread. We can't print deferred
> > +			 * messages directly, because this may deadlock, route
> > +			 * them via IRQ context.
> > +			 */
> > +			__this_cpu_or(printk_pending,
> > +					PRINTK_PENDING_IRQ_OUTPUT);
> > +			irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
> > +		} else {
> > +			sync_print = true;
> > +		}
> > +	}
> 
> I'm a bit undecided whether we want to go through irq work even for the
> common case of !in_sched messages or whether we want to directly call
> wake_up() in that case. Maybe I'd do it like:

thanks for review. I like the change that you have suggested. just
updated wake_up_klogd_work_func() check with '!printk_sync && printk_thread',
user can change `printk_sync' via sysfs.


what I wanted to do with this "irq indirection" is to move wake_up() out of
vprintk_emit(), but I didn't think good enough, I guess.

so, __wake_up() takes the spin_lock_irqsave(&q->lock, flags), which with
spin_debug enabled, can recursively call printk(). since we wake_up()
the printing thread after we release logbuf_lock and UINT_MAX logbuf_cpu,
we have no chance to survive this recursion.

/* theretical possibilities of having printk() in __wake_up_common() or down
 * the call chain are not real concerns. */

so I guess what I want to do now -- is to move part of this wake_up()
decision making under the logbuf_lock and logbuf_cpu area. so recursion
detection code will be able to catch this. and we also need to switch
back to sync printk once the recursion happened; just to shut up the
faulty wake_up() in case if it was the root cause.


something like this (*minimally tested so far*).

-- move wake_up() and friends under the logbuf section; so we can detect
   printk() recursion from wake_up()

-- in recursion_bug branch switch to sync printk. we don't know why did we
   recurse, may be because of wake_up()->spin_lock(). doing
   kthread_stop()->wake_up_process() can be unsafe, I guess, just set
   `printk_sync' to true.


Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
---
 Documentation/kernel-parameters.txt |  10 ++
 kernel/printk/printk.c              | 202 +++++++++++++++++++++++++-----------
 2 files changed, 154 insertions(+), 58 deletions(-)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 1e58ae9..454999e 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -3114,6 +3114,16 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
 	printk.time=	Show timing data prefixed to each printk message line
 			Format: <bool>  (1/Y/y=enable, 0/N/n=disable)
 
+	printk.synchronous=
+			By default kernel messages are printed to console
+			asynchronously (except during early boot or when oops
+			is happening). That avoids kernel stalling behind slow
+			serial console and thus avoids softlockups, interrupt
+			timeouts, or userspace timing out during heavy printing.
+			However for debugging problems, printing messages to
+			console immediately may be desirable. This option
+			enables such behavior.
+
 	processor.max_cstate=	[HW,ACPI]
 			Limit processor to maximum C-state
 			max_cstate=9 overrides any DMI blacklist limit.
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index d5fd844..38baed1 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -46,6 +46,7 @@
 #include <linux/utsname.h>
 #include <linux/ctype.h>
 #include <linux/uio.h>
+#include <linux/kthread.h>
 
 #include <asm/uaccess.h>
 #include <asm-generic/sections.h>
@@ -284,6 +285,105 @@ static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN);
 static char *log_buf = __log_buf;
 static u32 log_buf_len = __LOG_BUF_LEN;
 
+/*
+ * When true, printing to console will happen synchronously unless someone else
+ * is already printing messages.
+ *
+ * The default value on UP systems is 'true'.
+ */
+static bool __read_mostly printk_sync = !IS_ENABLED(CONFIG_SMP);
+module_param_named(synchronous, printk_sync, bool, S_IRUGO | S_IWUSR);
+MODULE_PARM_DESC(synchronous, "make printing to console synchronous");
+
+/* Printing kthread for async vprintk_emit() */
+static struct task_struct *printk_thread;
+/* Wait for printing wakeups from async vprintk_emit() */
+static DECLARE_WAIT_QUEUE_HEAD(printing_wait);
+
+static int printing_func(void *data)
+{
+	while (1) {
+		DECLARE_WAITQUEUE(wait, current);
+		set_current_state(TASK_INTERRUPTIBLE);
+		add_wait_queue(&printing_wait, &wait);
+
+		schedule();
+		remove_wait_queue(&printing_wait, &wait);
+
+		console_lock();
+		console_unlock();
+	}
+
+	return 0;
+}
+
+static int __init init_printk_thread(void)
+{
+	if (printk_sync)
+		return 0;
+
+	printk_thread = kthread_run(printing_func, NULL, "printk");
+	BUG_ON(IS_ERR(printk_thread));
+	return 0;
+}
+late_initcall(init_printk_thread);
+
+/*
+ * Delayed printk version, for scheduler-internal messages:
+ */
+#define PRINTK_PENDING_WAKEUP	(1<<0)
+#define PRINTK_PENDING_OUTPUT	(1<<1)
+
+static DEFINE_PER_CPU(int, printk_pending);
+
+static void wake_up_klogd_work_func(struct irq_work *irq_work)
+{
+	int pending = __this_cpu_xchg(printk_pending, 0);
+
+	if (pending & PRINTK_PENDING_OUTPUT) {
+		if (!printk_sync && printk_thread) {
+			wake_up(&printing_wait);
+		} else {
+			/*
+			 * If trylock fails, someone else is doing
+			 * the printing
+			 */
+			if (console_trylock())
+				console_unlock();
+		}
+	}
+
+	if (pending & PRINTK_PENDING_WAKEUP)
+		wake_up_interruptible(&log_wait);
+}
+
+static DEFINE_PER_CPU(struct irq_work, wake_up_klogd_work) = {
+	.func = wake_up_klogd_work_func,
+	.flags = IRQ_WORK_LAZY,
+};
+
+void wake_up_klogd(void)
+{
+	preempt_disable();
+	if (waitqueue_active(&log_wait)) {
+		this_cpu_or(printk_pending, PRINTK_PENDING_WAKEUP);
+		irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
+	}
+	preempt_enable();
+}
+
+int printk_deferred(const char *fmt, ...)
+{
+	va_list args;
+	int r;
+
+	va_start(args, fmt);
+	r = vprintk_emit(0, LOGLEVEL_SCHED, NULL, 0, fmt, args);
+	va_end(args);
+
+	return r;
+}
+
 /* Return log buffer address */
 char *log_buf_addr_get(void)
 {
@@ -1609,6 +1709,8 @@ asmlinkage int vprintk_emit(int facility, int level,
 			    const char *dict, size_t dictlen,
 			    const char *fmt, va_list args)
 {
+	/* cpu currently holding logbuf_lock in this function */
+	static unsigned int logbuf_cpu = UINT_MAX;
 	static bool recursion_bug;
 	static char textbuf[LOG_LINE_MAX];
 	char *text = textbuf;
@@ -1618,12 +1720,17 @@ asmlinkage int vprintk_emit(int facility, int level,
 	int this_cpu;
 	int printed_len = 0;
 	bool in_sched = false;
-	/* cpu currently holding logbuf_lock in this function */
-	static unsigned int logbuf_cpu = UINT_MAX;
+	bool in_panic = console_loglevel == CONSOLE_LOGLEVEL_MOTORMOUTH;
+	bool sync_print = printk_sync;
 
 	if (level == LOGLEVEL_SCHED) {
 		level = LOGLEVEL_DEFAULT;
+		/*
+		 * Deferred sched messages must not be printed
+		 * synchronously regardless the @printk_sync or @in_panic.
+		 */
 		in_sched = true;
+		sync_print = false;
 	}
 
 	boot_delay_msec(level);
@@ -1660,6 +1767,7 @@ asmlinkage int vprintk_emit(int facility, int level,
 		static const char recursion_msg[] =
 			"BUG: recent printk recursion!";
 
+		printk_sync = true;
 		recursion_bug = false;
 		/* emit KERN_CRIT message */
 		printed_len += log_store(0, 2, LOG_PREFIX|LOG_NEWLINE, 0,
@@ -1749,13 +1857,45 @@ asmlinkage int vprintk_emit(int facility, int level,
 						 dict, dictlen, text, text_len);
 	}
 
+	/*
+	 * By default we print message to console asynchronously so that kernel
+	 * doesn't get stalled due to slow serial console. That can lead to
+	 * softlockups, lost interrupts, or userspace timing out under heavy
+	 * printing load.
+	 *
+	 * However we resort to synchronous printing of messages during early
+	 * boot, when synchronous printing was explicitly requested by
+	 * kernel parameter, or when console_verbose() was called to print
+	 * everything during panic / oops.
+	 */
+	if (!sync_print) {
+		if (in_sched) {
+			/*
+			 * @in_sched messages may come too early, when we don't
+			 * yet have @printk_thread. We can't print deferred
+			 * messages directly, because this may deadlock, route
+			 * them via IRQ context.
+			 */
+			__this_cpu_or(printk_pending,
+					PRINTK_PENDING_OUTPUT);
+			irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
+		} else if (printk_thread && !in_panic) {
+			/*
+			 * This will wakeup the printing kthread and offload
+			 * printing to a schedulable context.
+			 */
+			wake_up(&printing_wait);
+		} else {
+			sync_print = true;
+		}
+	}
+
 	logbuf_cpu = UINT_MAX;
 	raw_spin_unlock(&logbuf_lock);
 	lockdep_on();
 	local_irq_restore(flags);
 
-	/* If called from the scheduler, we can not call up(). */
-	if (!in_sched) {
+	if (sync_print) {
 		lockdep_off();
 		/*
 		 * Try to acquire and then immediately release the console
@@ -2724,60 +2864,6 @@ late_initcall(printk_late_init);
 
 #if defined CONFIG_PRINTK
 /*
- * Delayed printk version, for scheduler-internal messages:
- */
-#define PRINTK_PENDING_WAKEUP	0x01
-#define PRINTK_PENDING_OUTPUT	0x02
-
-static DEFINE_PER_CPU(int, printk_pending);
-
-static void wake_up_klogd_work_func(struct irq_work *irq_work)
-{
-	int pending = __this_cpu_xchg(printk_pending, 0);
-
-	if (pending & PRINTK_PENDING_OUTPUT) {
-		/* If trylock fails, someone else is doing the printing */
-		if (console_trylock())
-			console_unlock();
-	}
-
-	if (pending & PRINTK_PENDING_WAKEUP)
-		wake_up_interruptible(&log_wait);
-}
-
-static DEFINE_PER_CPU(struct irq_work, wake_up_klogd_work) = {
-	.func = wake_up_klogd_work_func,
-	.flags = IRQ_WORK_LAZY,
-};
-
-void wake_up_klogd(void)
-{
-	preempt_disable();
-	if (waitqueue_active(&log_wait)) {
-		this_cpu_or(printk_pending, PRINTK_PENDING_WAKEUP);
-		irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
-	}
-	preempt_enable();
-}
-
-int printk_deferred(const char *fmt, ...)
-{
-	va_list args;
-	int r;
-
-	preempt_disable();
-	va_start(args, fmt);
-	r = vprintk_emit(0, LOGLEVEL_SCHED, NULL, 0, fmt, args);
-	va_end(args);
-
-	__this_cpu_or(printk_pending, PRINTK_PENDING_OUTPUT);
-	irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
-	preempt_enable();
-
-	return r;
-}
-
-/*
  * printk rate limiting, lifted from the networking subsystem.
  *
  * This enforces a rate limit: not more than 10 kernel messages
-- 
2.8.0.rc0

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [RFC][PATCH v4 1/2] printk: Make printk() completely async
  2016-03-14 14:13 ` [RFC][PATCH v4 1/2] " Sergey Senozhatsky
  2016-03-15 10:03   ` Jan Kara
@ 2016-03-15 15:58   ` Petr Mladek
  2016-03-16  2:01     ` Sergey Senozhatsky
  1 sibling, 1 reply; 23+ messages in thread
From: Petr Mladek @ 2016-03-15 15:58 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Andrew Morton, Jan Kara, Tejun Heo, Tetsuo Handa, linux-kernel,
	Byungchul Park, Sergey Senozhatsky, Jan Kara

On Mon 2016-03-14 23:13:39, Sergey Senozhatsky wrote:
> From: Jan Kara <jack@suse.cz>
> 
> Currently, printk() sometimes waits for message to be printed to console
> and sometimes it does not (when console_sem is held by some other
> process). In case printk() grabs console_sem and starts printing to
> console, it prints messages from kernel printk buffer until the buffer
> is empty. When serial console is attached, printing is slow and thus
> other CPUs in the system have plenty of time to append new messages to
> the buffer while one CPU is printing. Thus the CPU can spend unbounded
> amount of time doing printing in console_unlock().  This is especially
> serious problem if the printk() calling console_unlock() was called with
> interrupts disabled.
> 
> In practice users have observed a CPU can spend tens of seconds printing
> in console_unlock() (usually during boot when hundreds of SCSI devices
> are discovered) resulting in RCU stalls (CPU doing printing doesn't
> reach quiescent state for a long time), softlockup reports (IPIs for the
> printing CPU don't get served and thus other CPUs are spinning waiting
> for the printing CPU to process IPIs), and eventually a machine death
> (as messages from stalls and lockups append to printk buffer faster than
> we are able to print). So these machines are unable to boot with serial
> console attached. Another observed issue is that due to slow printk,
> hardware discovery is slow and udev times out before kernel manages to
> discover all the attached HW. Also during artificial stress testing SATA
> disk disappears from the system because its interrupts aren't served for
> too long.
> 
> This patch makes printk() completely asynchronous (similar to what
> printk_deferred() did until now). It appends message to the kernel
> printk buffer and wake_up()s a special dedicated kthread to do the
> printing to console. This has the advantage that printing always happens
> from a schedulable contex and thus we don't lockup any particular CPU or
> even interrupts. Also it has the advantage that printk() is fast and
> thus kernel booting is not slowed down by slow serial console.
> Disadvantage of this method is that in case of crash there is higher
> chance that important messages won't appear in console output (we may
> need working scheduling to print message to console). We somewhat
> mitigate this risk by switching printk to the original method of
> immediate printing to console if oops is in progress.  Also for
> debugging purposes we provide printk.synchronous kernel parameter which
> resorts to the original printk behavior.
> 
> printk() is expected to work under different conditions and in different
> scenarios, including corner cases of OOM when all of the workers are busy
> (e.g. allocating memory), thus printk() uses its own dedicated printing
> kthread, rather than relying on workqueue (even with WQ_MEM_RECLAIM bit
> set we potentially can receive delays in printing until workqueue
> declares a ->mayday, as noted by Tetsuo Handa). Another thing to mention,
> is that deferred printk() messages may appear before printing kthread
> created, so in the very beginning we have to print deferred messages
> the old way -- via IRQs.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>
> Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
> ---
>  Documentation/kernel-parameters.txt |  10 ++
>  kernel/printk/printk.c              | 199 +++++++++++++++++++++++++-----------
>  2 files changed, 151 insertions(+), 58 deletions(-)
> 
> diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
> index 1e58ae9..454999e 100644
> --- a/Documentation/kernel-parameters.txt
> +++ b/Documentation/kernel-parameters.txt
> @@ -3114,6 +3114,16 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
>  	printk.time=	Show timing data prefixed to each printk message line
>  			Format: <bool>  (1/Y/y=enable, 0/N/n=disable)
>  
> +	printk.synchronous=
> +			By default kernel messages are printed to console
> +			asynchronously (except during early boot or when oops
> +			is happening). That avoids kernel stalling behind slow
> +			serial console and thus avoids softlockups, interrupt
> +			timeouts, or userspace timing out during heavy printing.
> +			However for debugging problems, printing messages to
> +			console immediately may be desirable. This option
> +			enables such behavior.
> +
>  	processor.max_cstate=	[HW,ACPI]
>  			Limit processor to maximum C-state
>  			max_cstate=9 overrides any DMI blacklist limit.
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index d5fd844..3c4f8b7 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -46,6 +46,7 @@
>  #include <linux/utsname.h>
>  #include <linux/ctype.h>
>  #include <linux/uio.h>
> +#include <linux/kthread.h>
>  
>  #include <asm/uaccess.h>
>  #include <asm-generic/sections.h>
> @@ -284,6 +285,102 @@ static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN);
>  static char *log_buf = __log_buf;
>  static u32 log_buf_len = __LOG_BUF_LEN;
>  
> +/*
> + * When true, printing to console will happen synchronously unless someone else
> + * is already printing messages.
> + *
> + * The default value on UP systems is 'true'.
> + */
> +static bool __read_mostly printk_sync = !IS_ENABLED(CONFIG_SMP);
> +module_param_named(synchronous, printk_sync, bool, S_IRUGO | S_IWUSR);

If we make it writtable, we also need to handle the situation that
it gets disabled at runtime. It means to make sure that the kthread
will be running event printk_sync was set during the boot.

> +MODULE_PARM_DESC(synchronous, "make printing to console synchronous");
> +
> +/* Printing kthread for async vprintk_emit() */
> +static struct task_struct *printk_thread;
> +/* Wait for printing wakeups from async vprintk_emit() */
> +static DECLARE_WAIT_QUEUE_HEAD(printing_wait);
> +
> +static int printing_func(void *data)
> +{
> +	while (1) {
> +		DECLARE_WAITQUEUE(wait, current);
> +		set_current_state(TASK_INTERRUPTIBLE);
> +		add_wait_queue(&printing_wait, &wait);
> +
> +		schedule();
> +		remove_wait_queue(&printing_wait, &wait);

This whole thing looks strange. But it is also possible
that I do not have enough knowledge.

First, I think that this is racy. IMHO, we should do some
check before calling schedule(). We should not sleep if
anyone woken us before we got registered into the waitqueue.

Second, I wonder what was the motivation to use the waitqueue.
One advantage is that we do not need to check if the process
exists but then we also do not catch problems. Otherwise,
it looks too complicated. IMHO, it might be much easier
to just sleep here and call wake_up_process.

What about this?

int need_flush_console;


	int work;

	while(1) {
		set_current_state(TASK_INTERRUPTIBLE);
		if (!need_flush_console)
			schedule();
		__set_current_state(TASK_RUNNING);

		need_flush_console =  false;

> +		console_lock();
> +		console_unlock();
> +	}

Also I wonder if we need some special handling of the system freezing
but I do not thing so.

> +
> +	return 0;
> +}
> +
> +static int __init init_printk_thread(void)
> +{
> +	if (printk_sync)
> +		return 0;
> +
> +	printk_thread = kthread_run(printing_func, NULL, "printk");
> +	BUG_ON(IS_ERR(printk_thread));

I would prefer to force the synchronous mode instead.

> +	return 0;
> +}
> +late_initcall(init_printk_thread);
> +
> +/*
> + * Delayed printk version, for scheduler-internal messages:

This is not longer related to sheduler only.

BTW: I suggest to move this whole section in a separate patch.
It will be more clear what has changed for the async printk
and what stays for the deferred printk.

> + */
> +#define PRINTK_PENDING_WAKEUP	(1<<0)
> +#define PRINTK_PENDING_IRQ_OUTPUT	(1<<1)
> +#define PRINTK_PENDING_KTHREAD_OUTPUT	(1<<2)

Sigh, I am getting a bit lost in these names. Either
they deserve a comment or better renaming.

I would make it clear what is related to console, klogd,
and deferred synchronous printk.

Well, I wonder if we need to differ the defferred
printk and KTHREAD OUTPUT by the flag. We could
check the printk_sync global flag here.

What about?

#define PRINTK_PENDING_KLOGD_WAKEUP	(1<<0)
#define PRINTK_PENDING_CONSOLE_OUTPUT	(1<<2)


> +static DEFINE_PER_CPU(int, printk_pending);
> +
> +static void wake_up_klogd_work_func(struct irq_work *irq_work)
> +{
> +	int pending = __this_cpu_xchg(printk_pending, 0);
> +
> +	if (pending & PRINTK_PENDING_KTHREAD_OUTPUT) 
> +		wake_up(&printing_wait);
> +
> +	if (pending & PRINTK_PENDING_IRQ_OUTPUT) {
> +		/* If trylock fails, someone else is doing the printing */
> +		if (console_trylock())
> +			console_unlock();
> +	}
> +
> +	if (pending & PRINTK_PENDING_WAKEUP)
> +		wake_up_interruptible(&log_wait);

then we could use:

	if (pending & PRINTK_PENDING_CONSOLE_OUTPUT) {
		if (printk_sync || !printk_kthread) {
			/* If trylock fails, someone else is doing the printing */
			if (console_trylock())
				console_unlock();
		} else {
			wake_up_process(printk_kthread);
		}

	if (pending & PRINTK_PENDING_KLOGD_WAKEUP)
		wake_up_interruptible(&log_wait);


> +}
> +
> +static DEFINE_PER_CPU(struct irq_work, wake_up_klogd_work) = {
> +	.func = wake_up_klogd_work_func,
> +	.flags = IRQ_WORK_LAZY,
> +};
> +
> +void wake_up_klogd(void)
> +{
> +	preempt_disable();
> +	if (waitqueue_active(&log_wait)) {
> +		this_cpu_or(printk_pending, PRINTK_PENDING_WAKEUP);
> +		irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
> +	}
> +	preempt_enable();
> +}
> +
> +int printk_deferred(const char *fmt, ...)
> +{
> +	va_list args;
> +	int r;
> +
> +	va_start(args, fmt);
> +	r = vprintk_emit(0, LOGLEVEL_SCHED, NULL, 0, fmt, args);
> +	va_end(args);
> +
> +	return r;
> +}
> +
>  /* Return log buffer address */
>  char *log_buf_addr_get(void)
>  {
> @@ -1609,6 +1706,8 @@ asmlinkage int vprintk_emit(int facility, int level,
>  			    const char *dict, size_t dictlen,
>  			    const char *fmt, va_list args)
>  {
> +	/* cpu currently holding logbuf_lock in this function */
> +	static unsigned int logbuf_cpu = UINT_MAX;
>  	static bool recursion_bug;
>  	static char textbuf[LOG_LINE_MAX];
>  	char *text = textbuf;
> @@ -1618,12 +1717,17 @@ asmlinkage int vprintk_emit(int facility, int level,
>  	int this_cpu;
>  	int printed_len = 0;
>  	bool in_sched = false;
> -	/* cpu currently holding logbuf_lock in this function */
> -	static unsigned int logbuf_cpu = UINT_MAX;
> +	bool in_panic = console_loglevel == CONSOLE_LOGLEVEL_MOTORMOUTH;
> +	bool sync_print = printk_sync;

I would force the global printk_sync if we are in_panic

	if (in_panic)
		printk_sync = true;
 
>  	if (level == LOGLEVEL_SCHED) {
>  		level = LOGLEVEL_DEFAULT;
> +		/*
> +		 * Deferred sched messages must not be printed
> +		 * synchronously regardless the @printk_sync or @in_panic.
> +		 */
>  		in_sched = true;
> +		sync_print = false;
>  	}
>  
>  	boot_delay_msec(level);
> @@ -1752,10 +1856,43 @@ asmlinkage int vprintk_emit(int facility, int level,
>  	logbuf_cpu = UINT_MAX;
>  	raw_spin_unlock(&logbuf_lock);
>  	lockdep_on();
> +	/*
> +	 * By default we print message to console asynchronously so that kernel
> +	 * doesn't get stalled due to slow serial console. That can lead to
> +	 * softlockups, lost interrupts, or userspace timing out under heavy
> +	 * printing load.
> +	 *
> +	 * However we resort to synchronous printing of messages during early
> +	 * boot, when synchronous printing was explicitly requested by
> +	 * kernel parameter, or when console_verbose() was called to print
> +	 * everything during panic / oops.
> +	 */
> +	if (!sync_print) {
> +		if (printk_thread && !in_panic) {
> +			/*
> +			 * This will wakeup the printing kthread and offload
> +			 * printing to a schedulable context.
> +			 */
> +			__this_cpu_or(printk_pending,
> +					PRINTK_PENDING_KTHREAD_OUTPUT);
> +			irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
> +		} else if (in_sched) {
> +			/*
> +			 * @in_sched messages may come too early, when we don't
> +			 * yet have @printk_thread. We can't print deferred
> +			 * messages directly, because this may deadlock, route
> +			 * them via IRQ context.
> +			 */
> +			__this_cpu_or(printk_pending,
> +					PRINTK_PENDING_IRQ_OUTPUT);
> +			irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
>  
> -	/* If called from the scheduler, we can not call up(). */
> -	if (!in_sched) {
> +	if (sync_print) {
>  		lockdep_off();

I wonder if it might be much easier with If we used only the two
PRINTK_PENDING flags and force global printk_sync when in panic.

	if (in_sched || (printk_kthread && !printk_sync)) {
		__this_cpu_or(printk_pending,
			PRINTK_PENDING_CONSOLE_OUTPUT);
			irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
	} else {
>  		/*
>  		 * Try to acquire and then immediately release the console

Sigh, it would be great to rename also wake_up_klogd_work and
wake_up_klogd_work_func(). They are not only about klogd.
Well, this should be separate patch as well because it
was even before.

I still to thing about possible races. Especially, when checking
printk_kthread and printk_sync.

I hope that some of the above suggestions makes sense. vprintk_emit()
is crazy already now. I feel motivated to do not make it worse ;-)

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC][PATCH v4 1/2] printk: Make printk() completely async
  2016-03-15 15:58   ` Petr Mladek
@ 2016-03-16  2:01     ` Sergey Senozhatsky
  2016-03-16  2:10       ` Byungchul Park
  0 siblings, 1 reply; 23+ messages in thread
From: Sergey Senozhatsky @ 2016-03-16  2:01 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Andrew Morton, Jan Kara, Tejun Heo,
	Tetsuo Handa, linux-kernel, Byungchul Park, Sergey Senozhatsky,
	Jan Kara

Hello Petr,

On (03/15/16 16:58), Petr Mladek wrote:
[..]
> > +static bool __read_mostly printk_sync = !IS_ENABLED(CONFIG_SMP);
> > +module_param_named(synchronous, printk_sync, bool, S_IRUGO | S_IWUSR);
> 
> If we make it writtable, we also need to handle the situation that
> it gets disabled at runtime. It means to make sure that the kthread
> will be running event printk_sync was set during the boot.

yes, I just thought this morning that may be disabling 'write' here would
be ok.

> What about this?
> 
> int need_flush_console;
> 
> 	while(1) {
> 		set_current_state(TASK_INTERRUPTIBLE);
> 		if (!need_flush_console)
> 			schedule();
> 		__set_current_state(TASK_RUNNING);
> 
> 		need_flush_console =  false;
> 
> > +		console_lock();
> > +		console_unlock();
> > +	}

much better, indeed.
I assume `need_flush_console' is primarily for avoiding schedule() cost?
not that it closes the race window 100%, it can be false at the time we
check it, and become true by the time we schedule(). TASK_INTERRUPTIBLE
should prevent lost wake_up() case, AFAIK.

> Also I wonder if we need some special handling of the system freezing
> but I do not thing so.

hm, I don't think so either.

> > +	printk_thread = kthread_run(printing_func, NULL, "printk");
> > +	BUG_ON(IS_ERR(printk_thread));
> 
> I would prefer to force the synchronous mode instead.

ok, no strong opinion here, I thought that if the system can't create
a kthread in late_initcall(), then it probably doesn't have many chances
to survive anyway.

> > + * Delayed printk version, for scheduler-internal messages:
> 
> This is not longer related to sheduler only.

this has changed. KTHREAD/IRQ  split is not needed anymore, please
see below.

> BTW: I suggest to move this whole section in a separate patch.
> It will be more clear what has changed for the async printk
> and what stays for the deferred printk.

hm, sounds good.


> 	if (pending & PRINTK_PENDING_CONSOLE_OUTPUT) {
> 		if (printk_sync || !printk_kthread) {
> 			/* If trylock fails, someone else is doing the printing */
> 			if (console_trylock())
> 				console_unlock();
> 		} else {
> 			wake_up_process(printk_kthread);
> 		}
> 
> 	if (pending & PRINTK_PENDING_KLOGD_WAKEUP)
> 		wake_up_interruptible(&log_wait);

yes, agree. this is what I have here:
http://marc.info/?l=linux-kernel&m=145805101825604

> > +	bool in_panic = console_loglevel == CONSOLE_LOGLEVEL_MOTORMOUTH;
> > +	bool sync_print = printk_sync;
> 
> I would force the global printk_sync if we are in_panic
> 
> 	if (in_panic)
> 		printk_sync = true;

can add, yes.

> > -	/* If called from the scheduler, we can not call up(). */
> > -	if (!in_sched) {
> > +	if (sync_print) {
> >  		lockdep_off();
> 
> I wonder if it might be much easier with If we used only the two
> PRINTK_PENDING flags and force global printk_sync when in panic.

two PENDING flags stuff was my bad. (I replied here
http://marc.info/?l=linux-kernel&m=145805101825604)

in short, my intention was to move it out of that part of vprintk_emit() that
can recurse, but cannot detect the recursion. wake_up()/wake_up_process()
add spin_locks/etc., which add possibilities of
        vprint_emit()->spin_lock()->spin_dump()->vprintk_emit()->...
that will not be handled by vprintk_emit() recursion detection code. but
I guess I simply want to move this under the logbuf lock section after all,
so printk recursion detection will have better chances to help us out.

> Sigh, it would be great to rename also wake_up_klogd_work and
> wake_up_klogd_work_func(). They are not only about klogd.
> Well, this should be separate patch as well because it
> was even before.

hm, yes, as a separate patch later I think.

> I still to thing about possible races. Especially, when checking
> printk_kthread and printk_sync.

hm, I don't think we risk anything here. if CPU saw an 'old' (NULL) @printk_kthread
then it just would do direct printk. once it's !NULL, we can wake it up.
is your concern here that `pointer = VALUE' can be !atomic?

> I hope that some of the above suggestions makes sense. vprintk_emit()
> is crazy already now. I feel motivated to do not make it worse ;-)

thanks for review.

	-ss

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC][PATCH v4 1/2] printk: Make printk() completely async
  2016-03-16  2:01     ` Sergey Senozhatsky
@ 2016-03-16  2:10       ` Byungchul Park
  2016-03-16  2:31         ` Sergey Senozhatsky
  0 siblings, 1 reply; 23+ messages in thread
From: Byungchul Park @ 2016-03-16  2:10 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Petr Mladek, Sergey Senozhatsky, Andrew Morton, Jan Kara,
	Tejun Heo, Tetsuo Handa, linux-kernel, Jan Kara

On Wed, Mar 16, 2016 at 11:01:15AM +0900, Sergey Senozhatsky wrote:
> Hello Petr,
> 
> On (03/15/16 16:58), Petr Mladek wrote:
> [..]
> > > +static bool __read_mostly printk_sync = !IS_ENABLED(CONFIG_SMP);
> > > +module_param_named(synchronous, printk_sync, bool, S_IRUGO | S_IWUSR);
> > 
> > If we make it writtable, we also need to handle the situation that
> > it gets disabled at runtime. It means to make sure that the kthread
> > will be running event printk_sync was set during the boot.
> 
> yes, I just thought this morning that may be disabling 'write' here would
> be ok.
> 
> > What about this?
> > 
> > int need_flush_console;
> > 
> > 	while(1) {
> > 		set_current_state(TASK_INTERRUPTIBLE);
> > 		if (!need_flush_console)
> > 			schedule();

                else // This shoule be here, IMHO. Hm?

> > 		__set_current_state(TASK_RUNNING);
> > 
> > 		need_flush_console =  false;
> > 
> > > +		console_lock();
> > > +		console_unlock();
> > > +	}
> 
> much better, indeed.
> I assume `need_flush_console' is primarily for avoiding schedule() cost?
> not that it closes the race window 100%, it can be false at the time we
> check it, and become true by the time we schedule(). TASK_INTERRUPTIBLE
> should prevent lost wake_up() case, AFAIK.
> 
> > Also I wonder if we need some special handling of the system freezing
> > but I do not thing so.
> 
> hm, I don't think so either.
> 
> > > +	printk_thread = kthread_run(printing_func, NULL, "printk");
> > > +	BUG_ON(IS_ERR(printk_thread));
> > 
> > I would prefer to force the synchronous mode instead.
> 
> ok, no strong opinion here, I thought that if the system can't create
> a kthread in late_initcall(), then it probably doesn't have many chances
> to survive anyway.
> 
> > > + * Delayed printk version, for scheduler-internal messages:
> > 
> > This is not longer related to sheduler only.
> 
> this has changed. KTHREAD/IRQ  split is not needed anymore, please
> see below.
> 
> > BTW: I suggest to move this whole section in a separate patch.
> > It will be more clear what has changed for the async printk
> > and what stays for the deferred printk.
> 
> hm, sounds good.
> 
> 
> > 	if (pending & PRINTK_PENDING_CONSOLE_OUTPUT) {
> > 		if (printk_sync || !printk_kthread) {
> > 			/* If trylock fails, someone else is doing the printing */
> > 			if (console_trylock())
> > 				console_unlock();
> > 		} else {
> > 			wake_up_process(printk_kthread);
> > 		}
> > 
> > 	if (pending & PRINTK_PENDING_KLOGD_WAKEUP)
> > 		wake_up_interruptible(&log_wait);
> 
> yes, agree. this is what I have here:
> http://marc.info/?l=linux-kernel&m=145805101825604
> 
> > > +	bool in_panic = console_loglevel == CONSOLE_LOGLEVEL_MOTORMOUTH;
> > > +	bool sync_print = printk_sync;
> > 
> > I would force the global printk_sync if we are in_panic
> > 
> > 	if (in_panic)
> > 		printk_sync = true;
> 
> can add, yes.
> 
> > > -	/* If called from the scheduler, we can not call up(). */
> > > -	if (!in_sched) {
> > > +	if (sync_print) {
> > >  		lockdep_off();
> > 
> > I wonder if it might be much easier with If we used only the two
> > PRINTK_PENDING flags and force global printk_sync when in panic.
> 
> two PENDING flags stuff was my bad. (I replied here
> http://marc.info/?l=linux-kernel&m=145805101825604)
> 
> in short, my intention was to move it out of that part of vprintk_emit() that
> can recurse, but cannot detect the recursion. wake_up()/wake_up_process()
> add spin_locks/etc., which add possibilities of
>         vprint_emit()->spin_lock()->spin_dump()->vprintk_emit()->...
> that will not be handled by vprintk_emit() recursion detection code. but
> I guess I simply want to move this under the logbuf lock section after all,
> so printk recursion detection will have better chances to help us out.
> 
> > Sigh, it would be great to rename also wake_up_klogd_work and
> > wake_up_klogd_work_func(). They are not only about klogd.
> > Well, this should be separate patch as well because it
> > was even before.
> 
> hm, yes, as a separate patch later I think.
> 
> > I still to thing about possible races. Especially, when checking
> > printk_kthread and printk_sync.
> 
> hm, I don't think we risk anything here. if CPU saw an 'old' (NULL) @printk_kthread
> then it just would do direct printk. once it's !NULL, we can wake it up.
> is your concern here that `pointer = VALUE' can be !atomic?
> 
> > I hope that some of the above suggestions makes sense. vprintk_emit()
> > is crazy already now. I feel motivated to do not make it worse ;-)
> 
> thanks for review.
> 
> 	-ss

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC][PATCH v4 1/2] printk: Make printk() completely async
  2016-03-16  2:10       ` Byungchul Park
@ 2016-03-16  2:31         ` Sergey Senozhatsky
  0 siblings, 0 replies; 23+ messages in thread
From: Sergey Senozhatsky @ 2016-03-16  2:31 UTC (permalink / raw)
  To: Byungchul Park
  Cc: Sergey Senozhatsky, Petr Mladek, Sergey Senozhatsky,
	Andrew Morton, Jan Kara, Tejun Heo, Tetsuo Handa, linux-kernel,
	Jan Kara

On (03/16/16 11:10), Byungchul Park wrote:
> > > What about this?
> > > 
> > > int need_flush_console;
> > > 
> > > 	while(1) {
> > > 		set_current_state(TASK_INTERRUPTIBLE);
> > > 		if (!need_flush_console)
> > > 			schedule();
> 
>                 else // This shoule be here, IMHO. Hm?

do __set_current_state(TASK_RUNNING) only if need_flush_console == true?
why? I see no problems with Petr's code.

> > > 		__set_current_state(TASK_RUNNING);
> > > 
> > > 		need_flush_console =  false;
> > > 
> > > > +		console_lock();
> > > > +		console_unlock();
> > > > +	}

	-ss

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC][PATCH v4 1/2] printk: Make printk() completely async
  2016-03-15 14:07     ` Sergey Senozhatsky
@ 2016-03-16  5:39       ` Byungchul Park
  2016-03-16  6:58         ` Sergey Senozhatsky
  2016-03-16  7:00         ` Byungchul Park
  0 siblings, 2 replies; 23+ messages in thread
From: Byungchul Park @ 2016-03-16  5:39 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Jan Kara, Andrew Morton, Jan Kara, Petr Mladek, Tejun Heo,
	Tetsuo Handa, linux-kernel, Sergey Senozhatsky

On Tue, Mar 15, 2016 at 11:07:38PM +0900, Sergey Senozhatsky wrote:
> 
> something like this (*minimally tested so far*).
> 
> -- move wake_up() and friends under the logbuf section; so we can detect
                                          ^^^^^^^^^^^^^^
					  section protected by logbuf_lock?

>    printk() recursion from wake_up()

Excuse me, but I fear that it can cause an unnecessary deadlock.
Furthermore it cannot be handled if it is caused by logbuf_lock. IMHO,
nothing including synced printk can help us in this case. Is there
something I missed? I'am sorry if I did, but could you let me know if so?

I mean that it would be better to keep the wake_up and friend out of the
critical section by logbuf_lock.

Thanks,
Byungchul

> 
> -- in recursion_bug branch switch to sync printk. we don't know why did we
>    recurse, may be because of wake_up()->spin_lock(). doing
>    kthread_stop()->wake_up_process() can be unsafe, I guess, just set
>    `printk_sync' to true.
> 
> 
> Signed-off-by: Jan Kara <jack@suse.cz>
> Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
> ---
>  Documentation/kernel-parameters.txt |  10 ++
>  kernel/printk/printk.c              | 202 +++++++++++++++++++++++++-----------
>  2 files changed, 154 insertions(+), 58 deletions(-)
> 
> diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
> index 1e58ae9..454999e 100644
> --- a/Documentation/kernel-parameters.txt
> +++ b/Documentation/kernel-parameters.txt
> @@ -3114,6 +3114,16 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
>  	printk.time=	Show timing data prefixed to each printk message line
>  			Format: <bool>  (1/Y/y=enable, 0/N/n=disable)
>  
> +	printk.synchronous=
> +			By default kernel messages are printed to console
> +			asynchronously (except during early boot or when oops
> +			is happening). That avoids kernel stalling behind slow
> +			serial console and thus avoids softlockups, interrupt
> +			timeouts, or userspace timing out during heavy printing.
> +			However for debugging problems, printing messages to
> +			console immediately may be desirable. This option
> +			enables such behavior.
> +
>  	processor.max_cstate=	[HW,ACPI]
>  			Limit processor to maximum C-state
>  			max_cstate=9 overrides any DMI blacklist limit.
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index d5fd844..38baed1 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -46,6 +46,7 @@
>  #include <linux/utsname.h>
>  #include <linux/ctype.h>
>  #include <linux/uio.h>
> +#include <linux/kthread.h>
>  
>  #include <asm/uaccess.h>
>  #include <asm-generic/sections.h>
> @@ -284,6 +285,105 @@ static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN);
>  static char *log_buf = __log_buf;
>  static u32 log_buf_len = __LOG_BUF_LEN;
>  
> +/*
> + * When true, printing to console will happen synchronously unless someone else
> + * is already printing messages.
> + *
> + * The default value on UP systems is 'true'.
> + */
> +static bool __read_mostly printk_sync = !IS_ENABLED(CONFIG_SMP);
> +module_param_named(synchronous, printk_sync, bool, S_IRUGO | S_IWUSR);
> +MODULE_PARM_DESC(synchronous, "make printing to console synchronous");
> +
> +/* Printing kthread for async vprintk_emit() */
> +static struct task_struct *printk_thread;
> +/* Wait for printing wakeups from async vprintk_emit() */
> +static DECLARE_WAIT_QUEUE_HEAD(printing_wait);
> +
> +static int printing_func(void *data)
> +{
> +	while (1) {
> +		DECLARE_WAITQUEUE(wait, current);
> +		set_current_state(TASK_INTERRUPTIBLE);
> +		add_wait_queue(&printing_wait, &wait);
> +
> +		schedule();
> +		remove_wait_queue(&printing_wait, &wait);
> +
> +		console_lock();
> +		console_unlock();
> +	}
> +
> +	return 0;
> +}
> +
> +static int __init init_printk_thread(void)
> +{
> +	if (printk_sync)
> +		return 0;
> +
> +	printk_thread = kthread_run(printing_func, NULL, "printk");
> +	BUG_ON(IS_ERR(printk_thread));
> +	return 0;
> +}
> +late_initcall(init_printk_thread);
> +
> +/*
> + * Delayed printk version, for scheduler-internal messages:
> + */
> +#define PRINTK_PENDING_WAKEUP	(1<<0)
> +#define PRINTK_PENDING_OUTPUT	(1<<1)
> +
> +static DEFINE_PER_CPU(int, printk_pending);
> +
> +static void wake_up_klogd_work_func(struct irq_work *irq_work)
> +{
> +	int pending = __this_cpu_xchg(printk_pending, 0);
> +
> +	if (pending & PRINTK_PENDING_OUTPUT) {
> +		if (!printk_sync && printk_thread) {
> +			wake_up(&printing_wait);
> +		} else {
> +			/*
> +			 * If trylock fails, someone else is doing
> +			 * the printing
> +			 */
> +			if (console_trylock())
> +				console_unlock();
> +		}
> +	}
> +
> +	if (pending & PRINTK_PENDING_WAKEUP)
> +		wake_up_interruptible(&log_wait);
> +}
> +
> +static DEFINE_PER_CPU(struct irq_work, wake_up_klogd_work) = {
> +	.func = wake_up_klogd_work_func,
> +	.flags = IRQ_WORK_LAZY,
> +};
> +
> +void wake_up_klogd(void)
> +{
> +	preempt_disable();
> +	if (waitqueue_active(&log_wait)) {
> +		this_cpu_or(printk_pending, PRINTK_PENDING_WAKEUP);
> +		irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
> +	}
> +	preempt_enable();
> +}
> +
> +int printk_deferred(const char *fmt, ...)
> +{
> +	va_list args;
> +	int r;
> +
> +	va_start(args, fmt);
> +	r = vprintk_emit(0, LOGLEVEL_SCHED, NULL, 0, fmt, args);
> +	va_end(args);
> +
> +	return r;
> +}
> +
>  /* Return log buffer address */
>  char *log_buf_addr_get(void)
>  {
> @@ -1609,6 +1709,8 @@ asmlinkage int vprintk_emit(int facility, int level,
>  			    const char *dict, size_t dictlen,
>  			    const char *fmt, va_list args)
>  {
> +	/* cpu currently holding logbuf_lock in this function */
> +	static unsigned int logbuf_cpu = UINT_MAX;
>  	static bool recursion_bug;
>  	static char textbuf[LOG_LINE_MAX];
>  	char *text = textbuf;
> @@ -1618,12 +1720,17 @@ asmlinkage int vprintk_emit(int facility, int level,
>  	int this_cpu;
>  	int printed_len = 0;
>  	bool in_sched = false;
> -	/* cpu currently holding logbuf_lock in this function */
> -	static unsigned int logbuf_cpu = UINT_MAX;
> +	bool in_panic = console_loglevel == CONSOLE_LOGLEVEL_MOTORMOUTH;
> +	bool sync_print = printk_sync;
>  
>  	if (level == LOGLEVEL_SCHED) {
>  		level = LOGLEVEL_DEFAULT;
> +		/*
> +		 * Deferred sched messages must not be printed
> +		 * synchronously regardless the @printk_sync or @in_panic.
> +		 */
>  		in_sched = true;
> +		sync_print = false;
>  	}
>  
>  	boot_delay_msec(level);
> @@ -1660,6 +1767,7 @@ asmlinkage int vprintk_emit(int facility, int level,
>  		static const char recursion_msg[] =
>  			"BUG: recent printk recursion!";
>  
> +		printk_sync = true;
>  		recursion_bug = false;
>  		/* emit KERN_CRIT message */
>  		printed_len += log_store(0, 2, LOG_PREFIX|LOG_NEWLINE, 0,
> @@ -1749,13 +1857,45 @@ asmlinkage int vprintk_emit(int facility, int level,
>  						 dict, dictlen, text, text_len);
>  	}
>  
> +	/*
> +	 * By default we print message to console asynchronously so that kernel
> +	 * doesn't get stalled due to slow serial console. That can lead to
> +	 * softlockups, lost interrupts, or userspace timing out under heavy
> +	 * printing load.
> +	 *
> +	 * However we resort to synchronous printing of messages during early
> +	 * boot, when synchronous printing was explicitly requested by
> +	 * kernel parameter, or when console_verbose() was called to print
> +	 * everything during panic / oops.
> +	 */
> +	if (!sync_print) {
> +		if (in_sched) {
> +			/*
> +			 * @in_sched messages may come too early, when we don't
> +			 * yet have @printk_thread. We can't print deferred
> +			 * messages directly, because this may deadlock, route
> +			 * them via IRQ context.
> +			 */
> +			__this_cpu_or(printk_pending,
> +					PRINTK_PENDING_OUTPUT);
> +			irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
> +		} else if (printk_thread && !in_panic) {
> +			/*
> +			 * This will wakeup the printing kthread and offload
> +			 * printing to a schedulable context.
> +			 */
> +			wake_up(&printing_wait);
> +		} else {
> +			sync_print = true;
> +		}
> +	}
> +
>  	logbuf_cpu = UINT_MAX;
>  	raw_spin_unlock(&logbuf_lock);
>  	lockdep_on();
>  	local_irq_restore(flags);
>  
> -	/* If called from the scheduler, we can not call up(). */
> -	if (!in_sched) {
> +	if (sync_print) {
>  		lockdep_off();
>  		/*
>  		 * Try to acquire and then immediately release the console
> @@ -2724,60 +2864,6 @@ late_initcall(printk_late_init);
>  
>  #if defined CONFIG_PRINTK
>  /*
> - * Delayed printk version, for scheduler-internal messages:
> - */
> -#define PRINTK_PENDING_WAKEUP	0x01
> -#define PRINTK_PENDING_OUTPUT	0x02
> -
> -static DEFINE_PER_CPU(int, printk_pending);
> -
> -static void wake_up_klogd_work_func(struct irq_work *irq_work)
> -{
> -	int pending = __this_cpu_xchg(printk_pending, 0);
> -
> -	if (pending & PRINTK_PENDING_OUTPUT) {
> -		/* If trylock fails, someone else is doing the printing */
> -		if (console_trylock())
> -			console_unlock();
> -	}
> -
> -	if (pending & PRINTK_PENDING_WAKEUP)
> -		wake_up_interruptible(&log_wait);
> -}
> -
> -static DEFINE_PER_CPU(struct irq_work, wake_up_klogd_work) = {
> -	.func = wake_up_klogd_work_func,
> -	.flags = IRQ_WORK_LAZY,
> -};
> -
> -void wake_up_klogd(void)
> -{
> -	preempt_disable();
> -	if (waitqueue_active(&log_wait)) {
> -		this_cpu_or(printk_pending, PRINTK_PENDING_WAKEUP);
> -		irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
> -	}
> -	preempt_enable();
> -}
> -
> -int printk_deferred(const char *fmt, ...)
> -{
> -	va_list args;
> -	int r;
> -
> -	preempt_disable();
> -	va_start(args, fmt);
> -	r = vprintk_emit(0, LOGLEVEL_SCHED, NULL, 0, fmt, args);
> -	va_end(args);
> -
> -	__this_cpu_or(printk_pending, PRINTK_PENDING_OUTPUT);
> -	irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
> -	preempt_enable();
> -
> -	return r;
> -}
> -
> -/*
>   * printk rate limiting, lifted from the networking subsystem.
>   *
>   * This enforces a rate limit: not more than 10 kernel messages
> -- 
> 2.8.0.rc0

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC][PATCH v4 1/2] printk: Make printk() completely async
  2016-03-16  5:39       ` Byungchul Park
@ 2016-03-16  6:58         ` Sergey Senozhatsky
  2016-03-16  7:30           ` Byungchul Park
  2016-03-16  7:00         ` Byungchul Park
  1 sibling, 1 reply; 23+ messages in thread
From: Sergey Senozhatsky @ 2016-03-16  6:58 UTC (permalink / raw)
  To: Byungchul Park
  Cc: Sergey Senozhatsky, Jan Kara, Andrew Morton, Jan Kara,
	Petr Mladek, Tejun Heo, Tetsuo Handa, linux-kernel,
	Sergey Senozhatsky

On (03/16/16 14:39), Byungchul Park wrote:
> On Tue, Mar 15, 2016 at 11:07:38PM +0900, Sergey Senozhatsky wrote:
> > 
> > something like this (*minimally tested so far*).
> > 
> > -- move wake_up() and friends under the logbuf section; so we can detect
>                                           ^^^^^^^^^^^^^^
> 					  section protected by logbuf_lock?
> 
> >    printk() recursion from wake_up()
> 
> Excuse me, but I fear that it can cause an unnecessary deadlock.

where? shouldn't it be

vprintk_emit()
  local_irq_save(flags);
  this_cpu = smp_processor_id();

  raw_spin_lock(&logbuf_lock);
  static logbuf_cpu = this_cpu;

     wake_up_process()
      spin_lock()
       spin_dump()
        vprintk_emit()				<< recursion
         local_irq_save(flags);
         this_cpu = smp_processor_id();

         if (unlikely(logbuf_cpu == this_cpu)) {
            recursion_bug = true;
            return;   				<< break recursion
         }

?

> Furthermore it cannot be handled if it is caused by logbuf_lock.

sure, I'm not even trying to address a logbuf spin_dump recursion problem here.
I'm only trying to minimize the impact of newly introduced spin locks.

I don't have a very clear vision (at the moment) of how to fix printk recursion
caused by logbuf lock of console_sem corruptions, etc. Should spin_dump
be aware of the locks that can be taken by printk()? Hm, I can't even count all
the locks that possibly can be taken by printk->console_drivers and most likely
don't even see all of the cases where printk can recurse. Disable lock debug
in vprintk_emit() the same way lockdep is desabled? Hm...

Ingo's POV is that printk must be reworked and become smarter in this aspect.

> I mean that it would be better to keep the wake_up and friend out of the
> critical section by logbuf_lock.

in this case wake_up_process() can recurse us forever.

	-ss

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC][PATCH v4 1/2] printk: Make printk() completely async
  2016-03-16  5:39       ` Byungchul Park
  2016-03-16  6:58         ` Sergey Senozhatsky
@ 2016-03-16  7:00         ` Byungchul Park
  2016-03-16  7:07           ` Sergey Senozhatsky
  1 sibling, 1 reply; 23+ messages in thread
From: Byungchul Park @ 2016-03-16  7:00 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Jan Kara, Andrew Morton, Jan Kara, Petr Mladek, Tejun Heo,
	Tetsuo Handa, linux-kernel, Sergey Senozhatsky

On Wed, Mar 16, 2016 at 02:39:44PM +0900, Byungchul Park wrote:
> On Tue, Mar 15, 2016 at 11:07:38PM +0900, Sergey Senozhatsky wrote:
> > 
> > something like this (*minimally tested so far*).
> > 
> > -- move wake_up() and friends under the logbuf section; so we can detect
>                                           ^^^^^^^^^^^^^^
> 					  section protected by logbuf_lock?

I am afraid that I mis-interpreted the meaning of *under*. :-( You may mean
*below*, but I interpreted it as *within*. I am sorry for that caused by my
poor english. I absolutely agree with your mention, if it mean *below*.

> 
> >    printk() recursion from wake_up()
> 
> Excuse me, but I fear that it can cause an unnecessary deadlock.
> Furthermore it cannot be handled if it is caused by logbuf_lock. IMHO,
> nothing including synced printk can help us in this case. Is there
> something I missed? I'am sorry if I did, but could you let me know if so?
> 
> I mean that it would be better to keep the wake_up and friend out of the
> critical section by logbuf_lock.
> 
> Thanks,
> Byungchul
> 
> > 
> > -- in recursion_bug branch switch to sync printk. we don't know why did we
> >    recurse, may be because of wake_up()->spin_lock(). doing
> >    kthread_stop()->wake_up_process() can be unsafe, I guess, just set
> >    `printk_sync' to true.
> > 
> > 
> > Signed-off-by: Jan Kara <jack@suse.cz>
> > Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
> > ---
> >  Documentation/kernel-parameters.txt |  10 ++
> >  kernel/printk/printk.c              | 202 +++++++++++++++++++++++++-----------
> >  2 files changed, 154 insertions(+), 58 deletions(-)
> > 
> > diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
> > index 1e58ae9..454999e 100644
> > --- a/Documentation/kernel-parameters.txt
> > +++ b/Documentation/kernel-parameters.txt
> > @@ -3114,6 +3114,16 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
> >  	printk.time=	Show timing data prefixed to each printk message line
> >  			Format: <bool>  (1/Y/y=enable, 0/N/n=disable)
> >  
> > +	printk.synchronous=
> > +			By default kernel messages are printed to console
> > +			asynchronously (except during early boot or when oops
> > +			is happening). That avoids kernel stalling behind slow
> > +			serial console and thus avoids softlockups, interrupt
> > +			timeouts, or userspace timing out during heavy printing.
> > +			However for debugging problems, printing messages to
> > +			console immediately may be desirable. This option
> > +			enables such behavior.
> > +
> >  	processor.max_cstate=	[HW,ACPI]
> >  			Limit processor to maximum C-state
> >  			max_cstate=9 overrides any DMI blacklist limit.
> > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> > index d5fd844..38baed1 100644
> > --- a/kernel/printk/printk.c
> > +++ b/kernel/printk/printk.c
> > @@ -46,6 +46,7 @@
> >  #include <linux/utsname.h>
> >  #include <linux/ctype.h>
> >  #include <linux/uio.h>
> > +#include <linux/kthread.h>
> >  
> >  #include <asm/uaccess.h>
> >  #include <asm-generic/sections.h>
> > @@ -284,6 +285,105 @@ static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN);
> >  static char *log_buf = __log_buf;
> >  static u32 log_buf_len = __LOG_BUF_LEN;
> >  
> > +/*
> > + * When true, printing to console will happen synchronously unless someone else
> > + * is already printing messages.
> > + *
> > + * The default value on UP systems is 'true'.
> > + */
> > +static bool __read_mostly printk_sync = !IS_ENABLED(CONFIG_SMP);
> > +module_param_named(synchronous, printk_sync, bool, S_IRUGO | S_IWUSR);
> > +MODULE_PARM_DESC(synchronous, "make printing to console synchronous");
> > +
> > +/* Printing kthread for async vprintk_emit() */
> > +static struct task_struct *printk_thread;
> > +/* Wait for printing wakeups from async vprintk_emit() */
> > +static DECLARE_WAIT_QUEUE_HEAD(printing_wait);
> > +
> > +static int printing_func(void *data)
> > +{
> > +	while (1) {
> > +		DECLARE_WAITQUEUE(wait, current);
> > +		set_current_state(TASK_INTERRUPTIBLE);
> > +		add_wait_queue(&printing_wait, &wait);
> > +
> > +		schedule();
> > +		remove_wait_queue(&printing_wait, &wait);
> > +
> > +		console_lock();
> > +		console_unlock();
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> > +static int __init init_printk_thread(void)
> > +{
> > +	if (printk_sync)
> > +		return 0;
> > +
> > +	printk_thread = kthread_run(printing_func, NULL, "printk");
> > +	BUG_ON(IS_ERR(printk_thread));
> > +	return 0;
> > +}
> > +late_initcall(init_printk_thread);
> > +
> > +/*
> > + * Delayed printk version, for scheduler-internal messages:
> > + */
> > +#define PRINTK_PENDING_WAKEUP	(1<<0)
> > +#define PRINTK_PENDING_OUTPUT	(1<<1)
> > +
> > +static DEFINE_PER_CPU(int, printk_pending);
> > +
> > +static void wake_up_klogd_work_func(struct irq_work *irq_work)
> > +{
> > +	int pending = __this_cpu_xchg(printk_pending, 0);
> > +
> > +	if (pending & PRINTK_PENDING_OUTPUT) {
> > +		if (!printk_sync && printk_thread) {
> > +			wake_up(&printing_wait);
> > +		} else {
> > +			/*
> > +			 * If trylock fails, someone else is doing
> > +			 * the printing
> > +			 */
> > +			if (console_trylock())
> > +				console_unlock();
> > +		}
> > +	}
> > +
> > +	if (pending & PRINTK_PENDING_WAKEUP)
> > +		wake_up_interruptible(&log_wait);
> > +}
> > +
> > +static DEFINE_PER_CPU(struct irq_work, wake_up_klogd_work) = {
> > +	.func = wake_up_klogd_work_func,
> > +	.flags = IRQ_WORK_LAZY,
> > +};
> > +
> > +void wake_up_klogd(void)
> > +{
> > +	preempt_disable();
> > +	if (waitqueue_active(&log_wait)) {
> > +		this_cpu_or(printk_pending, PRINTK_PENDING_WAKEUP);
> > +		irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
> > +	}
> > +	preempt_enable();
> > +}
> > +
> > +int printk_deferred(const char *fmt, ...)
> > +{
> > +	va_list args;
> > +	int r;
> > +
> > +	va_start(args, fmt);
> > +	r = vprintk_emit(0, LOGLEVEL_SCHED, NULL, 0, fmt, args);
> > +	va_end(args);
> > +
> > +	return r;
> > +}
> > +
> >  /* Return log buffer address */
> >  char *log_buf_addr_get(void)
> >  {
> > @@ -1609,6 +1709,8 @@ asmlinkage int vprintk_emit(int facility, int level,
> >  			    const char *dict, size_t dictlen,
> >  			    const char *fmt, va_list args)
> >  {
> > +	/* cpu currently holding logbuf_lock in this function */
> > +	static unsigned int logbuf_cpu = UINT_MAX;
> >  	static bool recursion_bug;
> >  	static char textbuf[LOG_LINE_MAX];
> >  	char *text = textbuf;
> > @@ -1618,12 +1720,17 @@ asmlinkage int vprintk_emit(int facility, int level,
> >  	int this_cpu;
> >  	int printed_len = 0;
> >  	bool in_sched = false;
> > -	/* cpu currently holding logbuf_lock in this function */
> > -	static unsigned int logbuf_cpu = UINT_MAX;
> > +	bool in_panic = console_loglevel == CONSOLE_LOGLEVEL_MOTORMOUTH;
> > +	bool sync_print = printk_sync;
> >  
> >  	if (level == LOGLEVEL_SCHED) {
> >  		level = LOGLEVEL_DEFAULT;
> > +		/*
> > +		 * Deferred sched messages must not be printed
> > +		 * synchronously regardless the @printk_sync or @in_panic.
> > +		 */
> >  		in_sched = true;
> > +		sync_print = false;
> >  	}
> >  
> >  	boot_delay_msec(level);
> > @@ -1660,6 +1767,7 @@ asmlinkage int vprintk_emit(int facility, int level,
> >  		static const char recursion_msg[] =
> >  			"BUG: recent printk recursion!";
> >  
> > +		printk_sync = true;
> >  		recursion_bug = false;
> >  		/* emit KERN_CRIT message */
> >  		printed_len += log_store(0, 2, LOG_PREFIX|LOG_NEWLINE, 0,
> > @@ -1749,13 +1857,45 @@ asmlinkage int vprintk_emit(int facility, int level,
> >  						 dict, dictlen, text, text_len);
> >  	}
> >  
> > +	/*
> > +	 * By default we print message to console asynchronously so that kernel
> > +	 * doesn't get stalled due to slow serial console. That can lead to
> > +	 * softlockups, lost interrupts, or userspace timing out under heavy
> > +	 * printing load.
> > +	 *
> > +	 * However we resort to synchronous printing of messages during early
> > +	 * boot, when synchronous printing was explicitly requested by
> > +	 * kernel parameter, or when console_verbose() was called to print
> > +	 * everything during panic / oops.
> > +	 */
> > +	if (!sync_print) {
> > +		if (in_sched) {
> > +			/*
> > +			 * @in_sched messages may come too early, when we don't
> > +			 * yet have @printk_thread. We can't print deferred
> > +			 * messages directly, because this may deadlock, route
> > +			 * them via IRQ context.
> > +			 */
> > +			__this_cpu_or(printk_pending,
> > +					PRINTK_PENDING_OUTPUT);
> > +			irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
> > +		} else if (printk_thread && !in_panic) {
> > +			/*
> > +			 * This will wakeup the printing kthread and offload
> > +			 * printing to a schedulable context.
> > +			 */
> > +			wake_up(&printing_wait);
> > +		} else {
> > +			sync_print = true;
> > +		}
> > +	}
> > +
> >  	logbuf_cpu = UINT_MAX;
> >  	raw_spin_unlock(&logbuf_lock);
> >  	lockdep_on();
> >  	local_irq_restore(flags);
> >  
> > -	/* If called from the scheduler, we can not call up(). */
> > -	if (!in_sched) {
> > +	if (sync_print) {
> >  		lockdep_off();
> >  		/*
> >  		 * Try to acquire and then immediately release the console
> > @@ -2724,60 +2864,6 @@ late_initcall(printk_late_init);
> >  
> >  #if defined CONFIG_PRINTK
> >  /*
> > - * Delayed printk version, for scheduler-internal messages:
> > - */
> > -#define PRINTK_PENDING_WAKEUP	0x01
> > -#define PRINTK_PENDING_OUTPUT	0x02
> > -
> > -static DEFINE_PER_CPU(int, printk_pending);
> > -
> > -static void wake_up_klogd_work_func(struct irq_work *irq_work)
> > -{
> > -	int pending = __this_cpu_xchg(printk_pending, 0);
> > -
> > -	if (pending & PRINTK_PENDING_OUTPUT) {
> > -		/* If trylock fails, someone else is doing the printing */
> > -		if (console_trylock())
> > -			console_unlock();
> > -	}
> > -
> > -	if (pending & PRINTK_PENDING_WAKEUP)
> > -		wake_up_interruptible(&log_wait);
> > -}
> > -
> > -static DEFINE_PER_CPU(struct irq_work, wake_up_klogd_work) = {
> > -	.func = wake_up_klogd_work_func,
> > -	.flags = IRQ_WORK_LAZY,
> > -};
> > -
> > -void wake_up_klogd(void)
> > -{
> > -	preempt_disable();
> > -	if (waitqueue_active(&log_wait)) {
> > -		this_cpu_or(printk_pending, PRINTK_PENDING_WAKEUP);
> > -		irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
> > -	}
> > -	preempt_enable();
> > -}
> > -
> > -int printk_deferred(const char *fmt, ...)
> > -{
> > -	va_list args;
> > -	int r;
> > -
> > -	preempt_disable();
> > -	va_start(args, fmt);
> > -	r = vprintk_emit(0, LOGLEVEL_SCHED, NULL, 0, fmt, args);
> > -	va_end(args);
> > -
> > -	__this_cpu_or(printk_pending, PRINTK_PENDING_OUTPUT);
> > -	irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
> > -	preempt_enable();
> > -
> > -	return r;
> > -}
> > -
> > -/*
> >   * printk rate limiting, lifted from the networking subsystem.
> >   *
> >   * This enforces a rate limit: not more than 10 kernel messages
> > -- 
> > 2.8.0.rc0

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC][PATCH v4 1/2] printk: Make printk() completely async
  2016-03-16  7:00         ` Byungchul Park
@ 2016-03-16  7:07           ` Sergey Senozhatsky
  0 siblings, 0 replies; 23+ messages in thread
From: Sergey Senozhatsky @ 2016-03-16  7:07 UTC (permalink / raw)
  To: Byungchul Park
  Cc: Sergey Senozhatsky, Jan Kara, Andrew Morton, Jan Kara,
	Petr Mladek, Tejun Heo, Tetsuo Handa, linux-kernel,
	Sergey Senozhatsky

On (03/16/16 16:00), Byungchul Park wrote:
> On Wed, Mar 16, 2016 at 02:39:44PM +0900, Byungchul Park wrote:
> > On Tue, Mar 15, 2016 at 11:07:38PM +0900, Sergey Senozhatsky wrote:
> > > 
> > > something like this (*minimally tested so far*).
> > > 
> > > -- move wake_up() and friends under the logbuf section; so we can detect
> >                                           ^^^^^^^^^^^^^^
> > 					  section protected by logbuf_lock?
> 
> I am afraid that I mis-interpreted the meaning of *under*. :-( You may mean
> *below*, but I interpreted it as *within*. I am sorry for that caused by my
> poor english. I absolutely agree with your mention, if it mean *below*.

my bad. this should have been "under the logbuf LOCK".

	-ss

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC][PATCH v4 1/2] printk: Make printk() completely async
  2016-03-16  6:58         ` Sergey Senozhatsky
@ 2016-03-16  7:30           ` Byungchul Park
  2016-03-16  7:56             ` Sergey Senozhatsky
  0 siblings, 1 reply; 23+ messages in thread
From: Byungchul Park @ 2016-03-16  7:30 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Sergey Senozhatsky, Jan Kara, Andrew Morton, Jan Kara,
	Petr Mladek, Tejun Heo, Tetsuo Handa, linux-kernel

On Wed, Mar 16, 2016 at 03:58:51PM +0900, Sergey Senozhatsky wrote:
> I don't have a very clear vision (at the moment) of how to fix printk recursion
> caused by logbuf lock of console_sem corruptions, etc. Should spin_dump
> be aware of the locks that can be taken by printk()? Hm, I can't even count all
> the locks that possibly can be taken by printk->console_drivers and most likely
> don't even see all of the cases where printk can recurse. Disable lock debug
> in vprintk_emit() the same way lockdep is desabled? Hm...
> 
> Ingo's POV is that printk must be reworked and become smarter in this aspect.

Yeah, it must be smarter... :-(

> 
> > I mean that it would be better to keep the wake_up and friend out of the
> > critical section by logbuf_lock.
> 
> in this case wake_up_process() can recurse us forever.

I said they should be kept *out of* the critical section. :-) Otherwise, it
can recurse us forever.

Do you mean the wake_up_process() in console_unlock? If so, it must be
handled by some ways, e.g. my patch you know.

Thanks,
Byungchul

> 
> 	-ss

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC][PATCH v4 1/2] printk: Make printk() completely async
  2016-03-16  7:30           ` Byungchul Park
@ 2016-03-16  7:56             ` Sergey Senozhatsky
  2016-03-16 10:34               ` Byungchul Park
  0 siblings, 1 reply; 23+ messages in thread
From: Sergey Senozhatsky @ 2016-03-16  7:56 UTC (permalink / raw)
  To: Byungchul Park
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Jan Kara, Andrew Morton,
	Jan Kara, Petr Mladek, Tejun Heo, Tetsuo Handa, linux-kernel

On (03/16/16 16:30), Byungchul Park wrote:
>
> Do you mean the wake_up_process() in console_unlock?

no, I meant wake_up_process(printk_kthread), the newly added one.


-- if we are going to have wake_up_process() in wake_up_klogd_work_func(),
then we need `in_sched' message to potentially trigger a recursion chain

wake_up_klogd_work_func()->wake_up_process()->printk()->wake_up_process()->printk()...

to break this printk()->wake_up_process()->printk(), we need wake_up_process() to
be under the logbuf lock; so vprintk_emit()'s if (logbuf_cpu == this_cpu) will act.


-- if we are going to have wake_up_process() in console_unlock(), then

console_unlock()->{up(), wake_up_process()}->printk()->{console_lock(), console_unlock()}->{up(), wake_up_process()}->printk()...

is undetectable... by the time console_unlock() calls wake_up_process() there
are no printk() locks that this CPU owns.


> I said they should be kept *out of* the critical section. :-)
> Otherwise, it can recurse us forever.

can you explain?

	-ss

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC][PATCH v4 1/2] printk: Make printk() completely async
  2016-03-16  7:56             ` Sergey Senozhatsky
@ 2016-03-16 10:34               ` Byungchul Park
  2016-03-17  0:34                 ` Sergey Senozhatsky
  0 siblings, 1 reply; 23+ messages in thread
From: Byungchul Park @ 2016-03-16 10:34 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Sergey Senozhatsky, Jan Kara, Andrew Morton, Jan Kara,
	Petr Mladek, Tejun Heo, Tetsuo Handa, linux-kernel

On Wed, Mar 16, 2016 at 04:56:05PM +0900, Sergey Senozhatsky wrote:
> On (03/16/16 16:30), Byungchul Park wrote:
> >
> > Do you mean the wake_up_process() in console_unlock?
> 
> no, I meant wake_up_process(printk_kthread), the newly added one.

I got it. You are talking about wake_up_process() in Petr's patch.

> 
> 
> -- if we are going to have wake_up_process() in wake_up_klogd_work_func(),
> then we need `in_sched' message to potentially trigger a recursion chain
> 
> wake_up_klogd_work_func()->wake_up_process()->printk()->wake_up_process()->printk()...
> 
> to break this printk()->wake_up_process()->printk(), we need wake_up_process() to
> be under the logbuf lock; so vprintk_emit()'s if (logbuf_cpu == this_cpu) will act.

I am curious about how you make the wake_up_process() call and I may want
to talk about it at the next spin. Anyway, then we will lose the last
message when "if (logbuf_cpu == this_cpu)" acts. Is it acceptible?

IMHO it's not a good choice to use wake_up() and friend within a printk()
since it can additionally cause another recursion. Of course, it does not
happen if the condition (logbuf_cpu == this_cpu) acts. But I don't think
it's good to rely on the condition with losing a message. Anyway I really
really want to see your next spin and talk.

> 
> 
> -- if we are going to have wake_up_process() in console_unlock(), then
> 
> console_unlock()->{up(), wake_up_process()}->printk()->{console_lock(), console_unlock()}->{up(), wake_up_process()}->printk()...
> 

This cannot happen. console_lock() cannot continue because the prior
console_unlock() does not release console_sem.lock yet when
wake_up_process() is called. Only a deadlock exists. And my patch solves
the problem so that the deadlock cannot happen.

My patch is https://lkml.org/lkml/2016/3/11/192 as you already know.

> is undetectable... by the time console_unlock() calls wake_up_process() there
> are no printk() locks that this CPU owns.
> 
> 
> > I said they should be kept *out of* the critical section. :-)
> > Otherwise, it can recurse us forever.
> 
> can you explain?

Sorry. I was confused. I was wrong.

> 
> 	-ss

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC][PATCH v4 1/2] printk: Make printk() completely async
  2016-03-16 10:34               ` Byungchul Park
@ 2016-03-17  0:34                 ` Sergey Senozhatsky
  2016-03-18  5:49                   ` Byungchul Park
  0 siblings, 1 reply; 23+ messages in thread
From: Sergey Senozhatsky @ 2016-03-17  0:34 UTC (permalink / raw)
  To: Byungchul Park
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Jan Kara, Andrew Morton,
	Jan Kara, Petr Mladek, Tejun Heo, Tetsuo Handa, linux-kernel

On (03/16/16 19:34), Byungchul Park wrote:
[..]
> > -- if we are going to have wake_up_process() in wake_up_klogd_work_func(),
> > then we need `in_sched' message to potentially trigger a recursion chain
> > 
> > wake_up_klogd_work_func()->wake_up_process()->printk()->wake_up_process()->printk()...
> > 
> > to break this printk()->wake_up_process()->printk(), we need wake_up_process() to
> > be under the logbuf lock; so vprintk_emit()'s if (logbuf_cpu == this_cpu) will act.
> 
> I am curious about how you make the wake_up_process() call and I may want
> to talk about it at the next spin. Anyway, then we will lose the last
> message when "if (logbuf_cpu == this_cpu)" acts. Is it acceptible?

yes, this is how it is. "BUG: recent printk recursion!" will be printed
instead of the message.

> IMHO it's not a good choice to use wake_up() and friend within a printk()
> since it can additionally cause another recursion. Of course, it does not
> happen if the condition (logbuf_cpu == this_cpu) acts. But I don't think
> it's good to rely on the condition with losing a message. Anyway I really
> really want to see your next spin and talk.

the alternative is NOT significantly better. pending bit is checked in
IRQ, so one simply can do

	local_irq_save();
	while (xxx) printk();
	local_irq_restore();

and _in the worst case_ nothing will be printed to console until IRQ are
enabled on this CPU. (there are some 'if's, but the worst case is just
like this. http://marc.info/?l=linux-kernel&m=145734549308803).


I'd probably prefer to add wake_up_process() to vprintk_emit() and do it
under the logbuf lock. first, we don't suffer from disabled IRQs on current
CPU, second we have somewhat better chances to break printk() recursion
*in some cases*.

> > -- if we are going to have wake_up_process() in console_unlock(), then
> > 
> > console_unlock()->{up(), wake_up_process()}->printk()->{console_lock(), console_unlock()}->{up(), wake_up_process()}->printk()...
> > 
> 
> This cannot happen. console_lock() cannot continue because the prior
> console_unlock() does not release console_sem.lock yet when
> wake_up_process() is called. Only a deadlock exists. And my patch solves
> the problem so that the deadlock cannot happen.

ah, we lost in patches. I was talking about yet another patch
(you probably not aware of. you were not Cc'd. Sorry!) that
makes console_unlock() asynchronous:

http://marc.info/?l=linux-kernel&m=145750373530161

s/wake_up/wake_up_process/ is at the end of console_unlock().

while the patch belongs to another series, I still wanted to outline it
here, since we were talking about printk() recursion.

	-ss

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC][PATCH v4 2/2] printk: Skip messages on oops
  2016-03-14 14:13 ` [RFC][PATCH v4 2/2] printk: Skip messages on oops Sergey Senozhatsky
@ 2016-03-17 10:56   ` Sergey Senozhatsky
  0 siblings, 0 replies; 23+ messages in thread
From: Sergey Senozhatsky @ 2016-03-17 10:56 UTC (permalink / raw)
  To: Jan Kara
  Cc: Andrew Morton, Petr Mladek, Tejun Heo, Tetsuo Handa,
	linux-kernel, Byungchul Park, Sergey Senozhatsky,
	Sergey Senozhatsky

Hello Jan,

On (03/14/16 23:13), Sergey Senozhatsky wrote:
> 
> From: Jan Kara <jack@suse.cz>
> 
> When there are too many messages in the kernel printk buffer it can take
> very long to print them to console (especially when using slow serial
> console). This is undesirable during oops so when we encounter oops and
> there are more than 100 messages to print, print just the newest 100
> messages and then the oops message.

I think this patch will introduce a regression, so I'd probably prefer
not to include it now in the series.

the pattern "print something important then panic()" is quite common.
given that other CPUs can printk() a lot before panic_cpu send out
stop_ipi, we can lose the "print something important" part.

...
arch/metag/kernel/cachepart.c:                  pr_emerg("Potential cache aliasing detected in %s on Thread %d\n",
arch/metag/kernel/cachepart.c-                           cache_type ? "DCACHE" : "ICACHE", thread_id);
arch/metag/kernel/cachepart.c-                  pr_warn("Total %s size: %u bytes\n",
arch/metag/kernel/cachepart.c-                          cache_type ? "DCACHE" : "ICACHE",
arch/metag/kernel/cachepart.c-                          cache_type ? get_dcache_size()
arch/metag/kernel/cachepart.c-                          : get_icache_size());
arch/metag/kernel/cachepart.c-                  pr_warn("Thread %s size: %d bytes\n",
arch/metag/kernel/cachepart.c-                          cache_type ? "CACHE" : "ICACHE",
arch/metag/kernel/cachepart.c-                          thread_cache_size);
arch/metag/kernel/cachepart.c-                  pr_warn("Page Size: %lu bytes\n", PAGE_SIZE);
arch/metag/kernel/cachepart.c-                  panic("Potential cache aliasing detected");
...
arch/s390/kernel/jump_label.c:  pr_emerg("Jump label code mismatch at %pS [%p]\n", ipc, ipc);
arch/s390/kernel/jump_label.c:  pr_emerg("Found:    %6ph\n", ipc);
arch/s390/kernel/jump_label.c:  pr_emerg("Expected: %6ph\n", ipe);
arch/s390/kernel/jump_label.c:  pr_emerg("New:      %6ph\n", ipn);
arch/s390/kernel/jump_label.c-  panic("Corrupted kernel text");
...



another example is hardlockup detector with sysctl_hardlockup_all_cpu_backtrace.

static void watchdog_overflow_callback(...)
{
	...
	if (is_hardlockup()) {
	...
		if (sysctl_hardlockup_all_cpu_backtrace &&
				!test_and_set_bit(0, &hardlockup_allcpu_dumped))
			trigger_allbutself_cpu_backtrace();

		nmi_panic(regs, msg);
	...
	}
	...
}

trigger_allbutself_cpu_backtrace() can be much more than 100 lines.
trigger_allbutself_cpu_backtrace() may or may not be implemented via
NMI. for example arch/sparc/kernel/process_64.c

thus, we better avoid skipping any messages when in panic() I think.

	-ss

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC][PATCH v4 1/2] printk: Make printk() completely async
  2016-03-17  0:34                 ` Sergey Senozhatsky
@ 2016-03-18  5:49                   ` Byungchul Park
  2016-03-18  7:11                     ` Sergey Senozhatsky
  0 siblings, 1 reply; 23+ messages in thread
From: Byungchul Park @ 2016-03-18  5:49 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Sergey Senozhatsky, Jan Kara, Andrew Morton, Jan Kara,
	Petr Mladek, Tejun Heo, Tetsuo Handa, linux-kernel

On Thu, Mar 17, 2016 at 09:34:50AM +0900, Sergey Senozhatsky wrote:
> > I am curious about how you make the wake_up_process() call and I may want
> > to talk about it at the next spin. Anyway, then we will lose the last
> > message when "if (logbuf_cpu == this_cpu)" acts. Is it acceptible?
> 
> yes, this is how it is. "BUG: recent printk recursion!" will be printed
> instead of the message.

I am not sure if it's the best way. For example, in the case suspecting
rq->lock, we cannot inform about the rq's "lockup suspected" while a
printk() is printing something, whatever it is, asynchronously. We can
avoid the infinite recursion with the patch I attached below, even though
the wake_up() and friends are used out of the section protected by
logbuf_lock.

> > IMHO it's not a good choice to use wake_up() and friend within a printk()
> > since it can additionally cause another recursion. Of course, it does not
> > happen if the condition (logbuf_cpu == this_cpu) acts. But I don't think
> > it's good to rely on the condition with losing a message. Anyway I really
> > really want to see your next spin and talk.
> 
> the alternative is NOT significantly better. pending bit is checked in
> IRQ, so one simply can do
> 
> 	local_irq_save();
> 	while (xxx) printk();
> 	local_irq_restore();
> 
> and _in the worst case_ nothing will be printed to console until IRQ are

Yes, you are right. But I am not sure yet.

> I'd probably prefer to add wake_up_process() to vprintk_emit() and do it
> under the logbuf lock. first, we don't suffer from disabled IRQs on current
> CPU, second we have somewhat better chances to break printk() recursion
> *in some cases*.

I think the logbuf_cpu is not for it. It's a kind of last resort. It would
be better to avoid using it if we can. And we can.

> > This cannot happen. console_lock() cannot continue because the prior
> > console_unlock() does not release console_sem.lock yet when
> > wake_up_process() is called. Only a deadlock exists. And my patch solves
> > the problem so that the deadlock cannot happen.
> 
> ah, we lost in patches. I was talking about yet another patch
> (you probably not aware of. you were not Cc'd. Sorry!) that
> makes console_unlock() asynchronous:
> 
> http://marc.info/?l=linux-kernel&m=145750373530161

I checked it now. Do you mean the wake_up_process() introduced in the new
patch in console_unlock()? If so, I also think it does not make a deadlock,
just can make a recursion in the worst case. I thought it was the
wake_up_process() in up() which is eventually called from console_unlock().
A deadlock can happen with the wake_up_proces() in up(). :-)

Thanks,
Byungchul

-----8<-----
>From 81f06a6f9c7f2e782267a2539c6c869d4214354c Mon Sep 17 00:00:00 2001
From: Byungchul Park <byungchul.park@lge.com>
Date: Fri, 18 Mar 2016 11:35:24 +0900
Subject: [PATCH] lib/spinlock_debug: Prevent a unnecessary recursive
 spin_dump()

Printing "lockup suspected" for the same lock more than once is
meaningless. Furtheremore, it can cause an infinite recursion if it's
on the way printing something by printk().

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 kernel/locking/spinlock_debug.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/kernel/locking/spinlock_debug.c b/kernel/locking/spinlock_debug.c
index fd24588..30559c6 100644
--- a/kernel/locking/spinlock_debug.c
+++ b/kernel/locking/spinlock_debug.c
@@ -138,14 +138,25 @@ static void __spin_lock_debug(raw_spinlock_t *lock)
 {
 	u64 i;
 	u64 loops = loops_per_jiffy * HZ;
+	static raw_spinlock_t *suspected_lock = NULL;
 
 	for (i = 0; i < loops; i++) {
 		if (arch_spin_trylock(&lock->raw_lock))
 			return;
 		__delay(1);
 	}
-	/* lockup suspected: */
-	spin_dump(lock, "lockup suspected");
+
+	/*
+	 * When we suspect a lockup, it's good enough to inform it once for
+	 * the same lock. Otherwise it could cause an infinite recursion if
+	 * it's within printk().
+	 */
+	if (suspected_lock != lock) {
+		suspected_lock = lock;
+		/* lockup suspected: */
+		spin_dump(lock, "lockup suspected");
+		suspected_lock = NULL;
+	}
 #ifdef CONFIG_SMP
 	trigger_all_cpu_backtrace();
 #endif
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [RFC][PATCH v4 1/2] printk: Make printk() completely async
  2016-03-18  5:49                   ` Byungchul Park
@ 2016-03-18  7:11                     ` Sergey Senozhatsky
  2016-03-18  8:23                       ` byungchul.park
  0 siblings, 1 reply; 23+ messages in thread
From: Sergey Senozhatsky @ 2016-03-18  7:11 UTC (permalink / raw)
  To: Byungchul Park
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Jan Kara, Andrew Morton,
	Jan Kara, Petr Mladek, Tejun Heo, Tetsuo Handa, linux-kernel

On (03/18/16 14:49), Byungchul Park wrote:
[..]
> > http://marc.info/?l=linux-kernel&m=145750373530161
> 
> I checked it now. Do you mean the wake_up_process() introduced in the new
> patch in console_unlock()? If so, I also think it does not make a deadlock,
> just can make a recursion in the worst case. I thought it was the
> wake_up_process() in up() which is eventually called from console_unlock().
> A deadlock can happen with the wake_up_proces() in up(). :-)

I'm not addressing already existing problems here. I'm trying to
minimise the impact of new code only.

[..]
> diff --git a/kernel/locking/spinlock_debug.c b/kernel/locking/spinlock_debug.c
> index fd24588..30559c6 100644
> --- a/kernel/locking/spinlock_debug.c
> +++ b/kernel/locking/spinlock_debug.c
> @@ -138,14 +138,25 @@ static void __spin_lock_debug(raw_spinlock_t *lock)
>  {
>  	u64 i;
>  	u64 loops = loops_per_jiffy * HZ;
> +	static raw_spinlock_t *suspected_lock = NULL;

this has no chances to survive on SMP systems that have spin_lockup-ed on at
least two different spin locks.

I'd really prefer not to mix-in spin_dump/printk recursion problems into this
patch set. it makes sense not to make printk recursion detection worse due to
newly added spin_locks to vprintk_emit(), but that's it. this patch set set is
fixing other things in the first place.

	-ss

>  	for (i = 0; i < loops; i++) {
>  		if (arch_spin_trylock(&lock->raw_lock))
>  			return;
>  		__delay(1);
>  	}
> -	/* lockup suspected: */
> -	spin_dump(lock, "lockup suspected");
> +
> +	/*
> +	 * When we suspect a lockup, it's good enough to inform it once for
> +	 * the same lock. Otherwise it could cause an infinite recursion if
> +	 * it's within printk().
> +	 */
> +	if (suspected_lock != lock) {
> +		suspected_lock = lock;
> +		/* lockup suspected: */
> +		spin_dump(lock, "lockup suspected");
> +		suspected_lock = NULL;
> +	}
>  #ifdef CONFIG_SMP
>  	trigger_all_cpu_backtrace();
>  #endif
> -- 
> 1.9.1
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: [RFC][PATCH v4 1/2] printk: Make printk() completely async
  2016-03-18  7:11                     ` Sergey Senozhatsky
@ 2016-03-18  8:23                       ` byungchul.park
  0 siblings, 0 replies; 23+ messages in thread
From: byungchul.park @ 2016-03-18  8:23 UTC (permalink / raw)
  To: 'Sergey Senozhatsky'
  Cc: 'Sergey Senozhatsky', 'Jan Kara',
	'Andrew Morton', 'Jan Kara',
	'Petr Mladek', 'Tejun Heo',
	'Tetsuo Handa',
	linux-kernel

> [..]
> > diff --git a/kernel/locking/spinlock_debug.c
> b/kernel/locking/spinlock_debug.c
> > index fd24588..30559c6 100644
> > --- a/kernel/locking/spinlock_debug.c
> > +++ b/kernel/locking/spinlock_debug.c
> > @@ -138,14 +138,25 @@ static void __spin_lock_debug(raw_spinlock_t
*lock)
> >  {
> >  	u64 i;
> >  	u64 loops = loops_per_jiffy * HZ;
> > +	static raw_spinlock_t *suspected_lock = NULL;
> 
> this has no chances to survive on SMP systems that have spin_lockup-ed on
> at
> least two different spin locks.

I don't think so. It can have chances to survive on SMP with 2 different
locks. Remind the condition this problem can happen.

1. when it is "lockup suspected".
2. when it is within a printk().

2 different locks mean 2 different timing. Therefore it's important to
prevent a recursion at a place trying to obtain the lock. After a lock
is solved, the other lock can be solved step by step.

> I'd really prefer not to mix-in spin_dump/printk recursion problems into
> this

I didn't mix-in it into this. I only focused your patch. Your approach
introduces unnecessary losing a message which we don't want. But you
worried about "infinite recursion" when implementing it without losing
the message. That's why I am saying it doesn't need to be worried and
there's another way, that is, my suggestion.

> patch set. it makes sense not to make printk recursion detection worse due
> to

It makes more sense not to introduce newly added spin_locks. You are 
currently utilize the last resort for detecting recursion. It's too bad.
Don't you think so?

> newly added spin_locks to vprintk_emit(), but that's it. this patch set
> set is
> fixing other things in the first place.

As I said, this patch's trying to fix a problem in your patch, which
is introduced while discussing yours.

Thanks,
Byungchul

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC][PATCH v4 0/2] printk: Make printk() completely async
  2016-03-14 14:13 [RFC][PATCH v4 0/2] printk: Make printk() completely async Sergey Senozhatsky
  2016-03-14 14:13 ` [RFC][PATCH v4 1/2] " Sergey Senozhatsky
  2016-03-14 14:13 ` [RFC][PATCH v4 2/2] printk: Skip messages on oops Sergey Senozhatsky
@ 2016-04-23 19:36 ` Pavel Machek
  2016-04-24  5:03   ` Sergey Senozhatsky
  2 siblings, 1 reply; 23+ messages in thread
From: Pavel Machek @ 2016-04-23 19:36 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Andrew Morton, Jan Kara, Petr Mladek, Tejun Heo, Tetsuo Handa,
	linux-kernel, Byungchul Park, Sergey Senozhatsky

On Mon 2016-03-14 23:13:38, Sergey Senozhatsky wrote:
> Hello,
> 
>  RFC
> 
>  The patch set is based on slightly updated Jan Kara's patches.
> 
> This patch set makes printk() completely asynchronous: new messages
> are getting upended to the kernel printk buffer, but instead of 'direct'
> printing the actual print job is performed by a dedicated kthread.
> This has the advantage that printing always happens from a schedulable
> context and thus we don't lockup any particular CPU or even
> interrupts.

And that means that printk() will become mostly useless for debugging,
right?
									Pavel


-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC][PATCH v4 0/2] printk: Make printk() completely async
  2016-04-23 19:36 ` [RFC][PATCH v4 0/2] printk: Make printk() completely async Pavel Machek
@ 2016-04-24  5:03   ` Sergey Senozhatsky
  0 siblings, 0 replies; 23+ messages in thread
From: Sergey Senozhatsky @ 2016-04-24  5:03 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Sergey Senozhatsky, Andrew Morton, Jan Kara, Petr Mladek,
	Tejun Heo, Tetsuo Handa, linux-kernel, Byungchul Park,
	Sergey Senozhatsky

Hi,

On (04/23/16 21:36), Pavel Machek wrote:
> >  The patch set is based on slightly updated Jan Kara's patches.
> > 
> > This patch set makes printk() completely asynchronous: new messages
> > are getting upended to the kernel printk buffer, but instead of 'direct'
> > printing the actual print job is performed by a dedicated kthread.
> > This has the advantage that printing always happens from a schedulable
> > context and thus we don't lockup any particular CPU or even
> > interrupts.
> 
> And that means that printk() will become mostly useless for debugging,
> right?

what do you mean? printk by default operates in 'old' sync mode
and can be switched to async for those who experience problems
with the sync one (whilst actually debugging). I'm not sure I
got your point.

	-ss

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2016-04-24  4:05 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-14 14:13 [RFC][PATCH v4 0/2] printk: Make printk() completely async Sergey Senozhatsky
2016-03-14 14:13 ` [RFC][PATCH v4 1/2] " Sergey Senozhatsky
2016-03-15 10:03   ` Jan Kara
2016-03-15 14:07     ` Sergey Senozhatsky
2016-03-16  5:39       ` Byungchul Park
2016-03-16  6:58         ` Sergey Senozhatsky
2016-03-16  7:30           ` Byungchul Park
2016-03-16  7:56             ` Sergey Senozhatsky
2016-03-16 10:34               ` Byungchul Park
2016-03-17  0:34                 ` Sergey Senozhatsky
2016-03-18  5:49                   ` Byungchul Park
2016-03-18  7:11                     ` Sergey Senozhatsky
2016-03-18  8:23                       ` byungchul.park
2016-03-16  7:00         ` Byungchul Park
2016-03-16  7:07           ` Sergey Senozhatsky
2016-03-15 15:58   ` Petr Mladek
2016-03-16  2:01     ` Sergey Senozhatsky
2016-03-16  2:10       ` Byungchul Park
2016-03-16  2:31         ` Sergey Senozhatsky
2016-03-14 14:13 ` [RFC][PATCH v4 2/2] printk: Skip messages on oops Sergey Senozhatsky
2016-03-17 10:56   ` Sergey Senozhatsky
2016-04-23 19:36 ` [RFC][PATCH v4 0/2] printk: Make printk() completely async Pavel Machek
2016-04-24  5:03   ` Sergey Senozhatsky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).