All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/6 v2] printk: Cleanups and softlockup avoidance
@ 2014-03-13 15:58 Jan Kara
  2014-03-13 15:58 ` [PATCH 1/6] printk: Remove outdated comment Jan Kara
                   ` (5 more replies)
  0 siblings, 6 replies; 13+ messages in thread
From: Jan Kara @ 2014-03-13 15:58 UTC (permalink / raw)
  To: Andrew Morton
  Cc: LKML, pmladek, Steven Rostedt, Frederic Weisbecker, Jan Kara

  Hello,

  this is another piece of the printk softlockup saga series. Let me first
remind the problem:

Currently, console_unlock() prints messages from kernel printk buffer to
console while the buffer is non-empty. When serial console is attached,
printing is slow and thus other CPUs in the system have plenty of time
to append new messages to the buffer while one CPU is printing. Thus the
CPU can spend unbounded amount of time doing printing in console_unlock().
This is especially serious since vprintk_emit() calls console_unlock()
with interrupts disabled.
    
In practice users have observed a CPU can spend tens of seconds printing
in console_unlock() (usually during boot when hundreds of SCSI devices
are discovered) resulting in RCU stalls (CPU doing printing doesn't
reach quiescent state for a long time), softlockup reports (IPIs for the
printing CPU don't get served and thus other CPUs are spinning waiting
for the printing CPU to process IPIs), and eventually a machine death
(as messages from stalls and lockups append to printk buffer faster than
we are able to print). So these machines are unable to boot with serial
console attached. Also during artificial stress testing SATA disk
disappears from the system because its interrupts aren't served for too
long.
---

This is a revised series using my new approach to the problem which doesn't
let CPU out of console_unlock() until there's someone else to take over the
printing. The main difference since the last version is that instead of
passing printing duty to different CPUs via IPIs we use dedicated kthreads.
This method is somewhat less reliable (in a sense that there are more
situations in which handover needn't work at all - e.g. when the currently
printing CPU holds a spinlock and the CPU where kthread is scheduled to run is
spinning on this spinlock) but the code is much simpler and in my practical
testing kthread approach was good enough to avoid any problems (with one
exception - see below).

The patch set is organized as follows:

Patch 1 is just a cleanup which can be taken on its own (a result of my
research in kernel history ;).

Patches 2 and 3 change vprintk_emit() to call console_unlock() with interrupts
enabled so they help to reduce interrupt latency in common case when printk()
itself is called with interrupts enabled.

Patch 4 is a cleanup of printk_sched() facility from Steven.

Patch 5 is the meat of this series implementing passing of printing duty from
current CPU after printing printk.offload_chars. We wake up a kthread which
starts spinning on console_sem to take over printing.

Patch 6 fixes lockups in one situation which I hit in my testing - if someone
calls stop_machine(), handing over of printing stops working (see commit
message for details). If you find this too hacky, I'm ok with dropping this
patch at least for now because I'm not sure this problem can be hit without
artificially stressing the machine.

What do you guys think?

						Merry Christmas ;)
								Honza

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/6] printk: Remove outdated comment
  2014-03-13 15:58 [PATCH 0/6 v2] printk: Cleanups and softlockup avoidance Jan Kara
@ 2014-03-13 15:58 ` Jan Kara
  2014-03-13 15:58 ` [PATCH 2/6] printk: Release lockbuf_lock before calling console_trylock_for_printk() Jan Kara
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 13+ messages in thread
From: Jan Kara @ 2014-03-13 15:58 UTC (permalink / raw)
  To: Andrew Morton
  Cc: LKML, pmladek, Steven Rostedt, Frederic Weisbecker, Jan Kara

Comment about interesting interlocking between lockbuf_lock and
console_sem is outdated. It was added in 2002 by commit
a880f45a48be2956d2c78a839c472287d54435c1 during conversion of
console_lock to console_sem + lockbuf_lock. At that time
release_console_sem() (today's equivalent is console_unlock()) was
indeed using lockbuf_lock to avoid races between trylock on console_sem
in printk() and unlock of console_sem. However these days the
interlocking is gone and the races are avoided by rechecking logbuf
state after releasing console_sem.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 kernel/printk/printk.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 4dae9cbe9259..bd7ee2a9f960 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -206,8 +206,7 @@ struct printk_log {
 };
 
 /*
- * The logbuf_lock protects kmsg buffer, indices, counters. It is also
- * used in interesting ways to provide interlocking in console_unlock();
+ * The logbuf_lock protects kmsg buffer, indices, counters.
  */
 static DEFINE_RAW_SPINLOCK(logbuf_lock);
 
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/6] printk: Release lockbuf_lock before calling console_trylock_for_printk()
  2014-03-13 15:58 [PATCH 0/6 v2] printk: Cleanups and softlockup avoidance Jan Kara
  2014-03-13 15:58 ` [PATCH 1/6] printk: Remove outdated comment Jan Kara
@ 2014-03-13 15:58 ` Jan Kara
  2014-03-13 15:58 ` [PATCH 3/6] printk: Enable interrupts " Jan Kara
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 13+ messages in thread
From: Jan Kara @ 2014-03-13 15:58 UTC (permalink / raw)
  To: Andrew Morton
  Cc: LKML, pmladek, Steven Rostedt, Frederic Weisbecker, Jan Kara

There's no reason to hold lockbuf_lock when entering
console_trylock_for_printk(). The first thing this function does is
calling down_trylock(console_sem) and if that fails it immediately
unlocks lockbuf_lock. So lockbuf_lock isn't needed for that branch.
When down_trylock() succeeds, the rest of console_trylock() is OK
without lockbuf_lock (it is called without it from other places), and
the only remaining thing in console_trylock_for_printk() is
can_use_console() call. For that call console_sem is enough (it
iterates all consoles and checks CON_ANYTIME flag).

So we drop logbuf_lock before entering console_trylock_for_printk()
which simplifies the code.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 kernel/printk/printk.c | 49 +++++++++++++++++--------------------------------
 1 file changed, 17 insertions(+), 32 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index bd7ee2a9f960..7a8ffd89875c 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -249,9 +249,6 @@ static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN);
 static char *log_buf = __log_buf;
 static u32 log_buf_len = __LOG_BUF_LEN;
 
-/* cpu currently holding logbuf_lock */
-static volatile unsigned int logbuf_cpu = UINT_MAX;
-
 /* human readable text of the record */
 static char *log_text(const struct printk_log *msg)
 {
@@ -1332,36 +1329,22 @@ static inline int can_use_console(unsigned int cpu)
  * messages from a 'printk'. Return true (and with the
  * console_lock held, and 'console_locked' set) if it
  * is successful, false otherwise.
- *
- * This gets called with the 'logbuf_lock' spinlock held and
- * interrupts disabled. It should return with 'lockbuf_lock'
- * released but interrupts still disabled.
  */
 static int console_trylock_for_printk(unsigned int cpu)
-	__releases(&logbuf_lock)
 {
-	int retval = 0, wake = 0;
-
-	if (console_trylock()) {
-		retval = 1;
-
-		/*
-		 * If we can't use the console, we need to release
-		 * the console semaphore by hand to avoid flushing
-		 * the buffer. We need to hold the console semaphore
-		 * in order to do this test safely.
-		 */
-		if (!can_use_console(cpu)) {
-			console_locked = 0;
-			wake = 1;
-			retval = 0;
-		}
-	}
-	logbuf_cpu = UINT_MAX;
-	raw_spin_unlock(&logbuf_lock);
-	if (wake)
+	if (!console_trylock())
+		return 0;
+	/*
+	 * If we can't use the console, we need to release the console
+	 * semaphore by hand to avoid flushing the buffer. We need to hold the
+	 * console semaphore in order to do this test safely.
+	 */
+	if (!can_use_console(cpu)) {
+		console_locked = 0;
 		up(&console_sem);
-	return retval;
+		return 0;
+	}
+	return 1;
 }
 
 int printk_delay_msec __read_mostly;
@@ -1494,6 +1477,9 @@ asmlinkage int vprintk_emit(int facility, int level,
 	unsigned long flags;
 	int this_cpu;
 	int printed_len = 0;
+	/* cpu currently holding logbuf_lock in this function */
+	static volatile unsigned int logbuf_cpu = UINT_MAX;
+
 
 	boot_delay_msec(level);
 	printk_delay();
@@ -1609,13 +1595,12 @@ asmlinkage int vprintk_emit(int facility, int level,
 	}
 	printed_len += text_len;
 
+	logbuf_cpu = UINT_MAX;
+	raw_spin_unlock(&logbuf_lock);
 	/*
 	 * Try to acquire and then immediately release the console semaphore.
 	 * The release will print out buffers and wake up /dev/kmsg and syslog()
 	 * users.
-	 *
-	 * The console_trylock_for_printk() function will release 'logbuf_lock'
-	 * regardless of whether it actually gets the console semaphore or not.
 	 */
 	if (console_trylock_for_printk(this_cpu))
 		console_unlock();
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 3/6] printk: Enable interrupts before calling console_trylock_for_printk()
  2014-03-13 15:58 [PATCH 0/6 v2] printk: Cleanups and softlockup avoidance Jan Kara
  2014-03-13 15:58 ` [PATCH 1/6] printk: Remove outdated comment Jan Kara
  2014-03-13 15:58 ` [PATCH 2/6] printk: Release lockbuf_lock before calling console_trylock_for_printk() Jan Kara
@ 2014-03-13 15:58 ` Jan Kara
  2014-03-13 15:58 ` [PATCH 4/6] printk: Remove separate printk_sched buffers and use printk buf instead Jan Kara
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 13+ messages in thread
From: Jan Kara @ 2014-03-13 15:58 UTC (permalink / raw)
  To: Andrew Morton
  Cc: LKML, pmladek, Steven Rostedt, Frederic Weisbecker, Jan Kara

We need interrupts disabled when calling console_trylock_for_printk()
only so that cpu id we pass to can_use_console() remains valid (for
other things console_sem provides all the exclusion we need and
deadlocks on console_sem due to interrupts are impossible because we use
down_trylock()).  However if we are rescheduled, we are guaranteed to
run on an online cpu so we can easily just get the cpu id in
can_use_console().

We can loose a bit of performance when we enable interrupts in
vprintk_emit() and then disable them again in console_unlock() but OTOH
it can somewhat reduce interrupt latency caused by console_unlock()
especially since later in the patch series we will want to spin on
console_sem in console_trylock_for_printk().

Signed-off-by: Jan Kara <jack@suse.cz>
---
 kernel/printk/printk.c | 29 ++++++++++++++++++-----------
 1 file changed, 18 insertions(+), 11 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 7a8ffd89875c..56649edfae9c 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1314,10 +1314,9 @@ static int have_callable_console(void)
 /*
  * Can we actually use the console at this time on this cpu?
  *
- * Console drivers may assume that per-cpu resources have
- * been allocated. So unless they're explicitly marked as
- * being able to cope (CON_ANYTIME) don't call them until
- * this CPU is officially up.
+ * Console drivers may assume that per-cpu resources have been allocated. So
+ * unless they're explicitly marked as being able to cope (CON_ANYTIME) don't
+ * call them until this CPU is officially up.
  */
 static inline int can_use_console(unsigned int cpu)
 {
@@ -1330,8 +1329,10 @@ static inline int can_use_console(unsigned int cpu)
  * console_lock held, and 'console_locked' set) if it
  * is successful, false otherwise.
  */
-static int console_trylock_for_printk(unsigned int cpu)
+static int console_trylock_for_printk(void)
 {
+	unsigned int cpu = smp_processor_id();
+
 	if (!console_trylock())
 		return 0;
 	/*
@@ -1501,7 +1502,8 @@ asmlinkage int vprintk_emit(int facility, int level,
 		 */
 		if (!oops_in_progress && !lockdep_recursing(current)) {
 			recursion_bug = 1;
-			goto out_restore_irqs;
+			local_irq_restore(flags);
+			return 0;
 		}
 		zap_locks();
 	}
@@ -1597,17 +1599,22 @@ asmlinkage int vprintk_emit(int facility, int level,
 
 	logbuf_cpu = UINT_MAX;
 	raw_spin_unlock(&logbuf_lock);
+	lockdep_on();
+	local_irq_restore(flags);
+
+	/*
+	 * Disable preemption to avoid being preempted while holding
+	 * console_sem which would prevent anyone from printing to console
+	 */
+	preempt_disable();
 	/*
 	 * Try to acquire and then immediately release the console semaphore.
 	 * The release will print out buffers and wake up /dev/kmsg and syslog()
 	 * users.
 	 */
-	if (console_trylock_for_printk(this_cpu))
+	if (console_trylock_for_printk())
 		console_unlock();
-
-	lockdep_on();
-out_restore_irqs:
-	local_irq_restore(flags);
+	preempt_enable();
 
 	return printed_len;
 }
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 4/6] printk: Remove separate printk_sched buffers and use printk buf instead
  2014-03-13 15:58 [PATCH 0/6 v2] printk: Cleanups and softlockup avoidance Jan Kara
                   ` (2 preceding siblings ...)
  2014-03-13 15:58 ` [PATCH 3/6] printk: Enable interrupts " Jan Kara
@ 2014-03-13 15:58 ` Jan Kara
  2014-03-13 15:58 ` [PATCH 5/6] printk: Hand over printing to console if printing too long Jan Kara
  2014-03-13 15:58 ` [PATCH 6/6] kernel: Avoid softlockups in stop_machine() during heavy printing Jan Kara
  5 siblings, 0 replies; 13+ messages in thread
From: Jan Kara @ 2014-03-13 15:58 UTC (permalink / raw)
  To: Andrew Morton
  Cc: LKML, pmladek, Steven Rostedt, Frederic Weisbecker, Jan Kara

From: Steven Rostedt <rostedt@goodmis.org>

To prevent deadlocks with doing a printk inside the scheduler,
printk_sched() was created. The issue is that printk has a console_sem
that it can grab and release. The release does a wake up if there's a
task pending on the sem, and this wake up grabs the rq locks that is
held in the scheduler. This leads to a possible deadlock if the wake up
uses the same rq as the one with the rq lock held already.

What printk_sched() does is to save the printk write in a per cpu buffer
and sets the PRINTK_PENDING_SCHED flag. On a timer tick, if this flag is
set, the printk() is done against the buffer.

There's a couple of issues with this approach.

1) If two printk_sched()s are called before the tick, the second one
will overwrite the first one.

2) The temporary buffer is 512 bytes and is per cpu. This is a quite a
bit of space wasted for something that is seldom used.

In order to remove this, the printk_sched() can use the printk buffer
instead, and delay the console_trylock()/console_unlock() to the queued
work.

Because printk_sched() would then be taking the logbuf_lock, the
logbuf_lock must not be held while doing anything that may call into the
scheduler functions, which includes wake ups. Unfortunately, printk()
also has a console_sem that it uses, and on release, the
up(&console_sem) may do a wake up of any pending waiters. This must be
avoided while holding the logbuf_lock.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 kernel/printk/printk.c | 47 +++++++++++++++++++++++++++++------------------
 1 file changed, 29 insertions(+), 18 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 56649edfae9c..91c554e027c5 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -68,6 +68,9 @@ int console_printk[4] = {
 	DEFAULT_CONSOLE_LOGLEVEL,	/* default_console_loglevel */
 };
 
+/* Deferred messaged from sched code are marked by this special level */
+#define SCHED_MESSAGE_LOGLEVEL -2
+
 /*
  * Low level drivers may need that to know if they can schedule in
  * their unblank() callback or not. So let's export it.
@@ -206,7 +209,9 @@ struct printk_log {
 };
 
 /*
- * The logbuf_lock protects kmsg buffer, indices, counters.
+ * The logbuf_lock protects kmsg buffer, indices, counters.  This can be taken
+ * within the scheduler's rq lock. It must be released before calling
+ * console_unlock() or anything else that might wake up a process.
  */
 static DEFINE_RAW_SPINLOCK(logbuf_lock);
 
@@ -1473,14 +1478,19 @@ asmlinkage int vprintk_emit(int facility, int level,
 	static int recursion_bug;
 	static char textbuf[LOG_LINE_MAX];
 	char *text = textbuf;
-	size_t text_len;
+	size_t text_len = 0;
 	enum log_flags lflags = 0;
 	unsigned long flags;
 	int this_cpu;
 	int printed_len = 0;
+	bool in_sched = false;
 	/* cpu currently holding logbuf_lock in this function */
 	static volatile unsigned int logbuf_cpu = UINT_MAX;
 
+	if (level == SCHED_MESSAGE_LOGLEVEL) {
+		level = -1;
+		in_sched = true;
+	}
 
 	boot_delay_msec(level);
 	printk_delay();
@@ -1527,7 +1537,12 @@ asmlinkage int vprintk_emit(int facility, int level,
 	 * The printf needs to come first; we need the syslog
 	 * prefix which might be passed-in as a parameter.
 	 */
-	text_len = vscnprintf(text, sizeof(textbuf), fmt, args);
+	if (in_sched)
+		text_len = scnprintf(text, sizeof(textbuf),
+				     KERN_WARNING "[sched_delayed] ");
+
+	text_len += vscnprintf(text + text_len,
+			       sizeof(textbuf) - text_len, fmt, args);
 
 	/* mark and strip a trailing newline */
 	if (text_len && text[text_len-1] == '\n') {
@@ -1602,6 +1617,10 @@ asmlinkage int vprintk_emit(int facility, int level,
 	lockdep_on();
 	local_irq_restore(flags);
 
+	/* If called from the scheduler, we can not call up(). */
+	if (in_sched)
+		return printed_len;
+
 	/*
 	 * Disable preemption to avoid being preempted while holding
 	 * console_sem which would prevent anyone from printing to console
@@ -2423,21 +2442,19 @@ late_initcall(printk_late_init);
 /*
  * Delayed printk version, for scheduler-internal messages:
  */
-#define PRINTK_BUF_SIZE		512
-
 #define PRINTK_PENDING_WAKEUP	0x01
-#define PRINTK_PENDING_SCHED	0x02
+#define PRINTK_PENDING_OUTPUT	0x02
 
 static DEFINE_PER_CPU(int, printk_pending);
-static DEFINE_PER_CPU(char [PRINTK_BUF_SIZE], printk_sched_buf);
 
 static void wake_up_klogd_work_func(struct irq_work *irq_work)
 {
 	int pending = __this_cpu_xchg(printk_pending, 0);
 
-	if (pending & PRINTK_PENDING_SCHED) {
-		char *buf = __get_cpu_var(printk_sched_buf);
-		pr_warn("[sched_delayed] %s", buf);
+	if (pending & PRINTK_PENDING_OUTPUT) {
+		/* If trylock fails, someone else is doing the printing */
+		if (console_trylock())
+			console_unlock();
 	}
 
 	if (pending & PRINTK_PENDING_WAKEUP)
@@ -2461,21 +2478,15 @@ void wake_up_klogd(void)
 
 int printk_sched(const char *fmt, ...)
 {
-	unsigned long flags;
 	va_list args;
-	char *buf;
 	int r;
 
-	local_irq_save(flags);
-	buf = __get_cpu_var(printk_sched_buf);
-
 	va_start(args, fmt);
-	r = vsnprintf(buf, PRINTK_BUF_SIZE, fmt, args);
+	r = vprintk_emit(0, SCHED_MESSAGE_LOGLEVEL, NULL, 0, fmt, args);
 	va_end(args);
 
-	__this_cpu_or(printk_pending, PRINTK_PENDING_SCHED);
+	__this_cpu_or(printk_pending, PRINTK_PENDING_OUTPUT);
 	irq_work_queue(&__get_cpu_var(wake_up_klogd_work));
-	local_irq_restore(flags);
 
 	return r;
 }
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 5/6] printk: Hand over printing to console if printing too long
  2014-03-13 15:58 [PATCH 0/6 v2] printk: Cleanups and softlockup avoidance Jan Kara
                   ` (3 preceding siblings ...)
  2014-03-13 15:58 ` [PATCH 4/6] printk: Remove separate printk_sched buffers and use printk buf instead Jan Kara
@ 2014-03-13 15:58 ` Jan Kara
  2014-03-13 23:06   ` Andrew Morton
  2014-03-13 15:58 ` [PATCH 6/6] kernel: Avoid softlockups in stop_machine() during heavy printing Jan Kara
  5 siblings, 1 reply; 13+ messages in thread
From: Jan Kara @ 2014-03-13 15:58 UTC (permalink / raw)
  To: Andrew Morton
  Cc: LKML, pmladek, Steven Rostedt, Frederic Weisbecker, Jan Kara

Currently, console_unlock() prints messages from kernel printk buffer to
console while the buffer is non-empty. When serial console is attached,
printing is slow and thus other CPUs in the system have plenty of time
to append new messages to the buffer while one CPU is printing. Thus the
CPU can spend unbounded amount of time doing printing in console_unlock().
This is especially serious problem if the printk() calling
console_unlock() was called with interrupts disabled.

In practice users have observed a CPU can spend tens of seconds printing
in console_unlock() (usually during boot when hundreds of SCSI devices
are discovered) resulting in RCU stalls (CPU doing printing doesn't
reach quiescent state for a long time), softlockup reports (IPIs for the
printing CPU don't get served and thus other CPUs are spinning waiting
for the printing CPU to process IPIs), and eventually a machine death
(as messages from stalls and lockups append to printk buffer faster than
we are able to print). So these machines are unable to boot with serial
console attached. Also during artificial stress testing SATA disk
disappears from the system because its interrupts aren't served for too
long.

This patch implements a mechanism where after printing specified number
of characters (tunable as a kernel parameter printk.offload_chars), CPU
doing printing asks for help by setting PRINTK_HANDOVER_B bit in
printk_handover_state variable and wakes up one of dedicated kthreads.
As soon as the printing CPU notices kthread got scheduled and is
spinning on console_sem, it drops console_sem and exits
console_unlock(). kthread then takes over printing instead.  This way no
CPU should spend printing too long even if there is heavy printk
traffic.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 Documentation/kernel-parameters.txt |  16 +++
 kernel/printk/printk.c              | 196 ++++++++++++++++++++++++++++++++----
 2 files changed, 195 insertions(+), 17 deletions(-)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 7116fda7077f..bb4318a6a2c7 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -2621,6 +2621,22 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
 			Format: <bool>  (1/Y/y=enable, 0/N/n=disable)
 			default: disabled
 
+	printk.offload_chars=
+			Printing to console can be relatively slow especially
+			in case of serial console. When there is intensive
+			printing happening from several cpus (as is the case
+			during boot), a cpu can be spending significant time
+			(seconds or more) doing printing. To avoid softlockups,
+			lost interrupts, and similar problems other cpus
+			will take over printing after the currently printing
+			cpu has printed 'printk.offload_chars' characters.
+			Higher value means possibly longer interrupt and other
+			latencies but lower latency of printing and lower
+			chance something goes wrong during system crash and
+			important message is not printed.
+			Format: <number>
+			default: 0 (disabled)
+
 	printk.time=	Show timing data prefixed to each printk message line
 			Format: <bool>  (1/Y/y=enable, 0/N/n=disable)
 
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 91c554e027c5..8ef179b170d4 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -45,6 +45,7 @@
 #include <linux/poll.h>
 #include <linux/irq_work.h>
 #include <linux/utsname.h>
+#include <linux/kthread.h>
 
 #include <asm/uaccess.h>
 
@@ -87,6 +88,31 @@ static DEFINE_SEMAPHORE(console_sem);
 struct console *console_drivers;
 EXPORT_SYMBOL_GPL(console_drivers);
 
+/* State machine for handing over printing */
+enum {
+	/*
+	 * Set by the holder of console_sem if currently printing task wants to
+	 * hand over printing. Cleared before console_sem is released.
+	 */
+	PRINTK_HANDOVER_B,
+	/*
+	 * Set if there's someone spinning on console_sem to take over printing.
+	 * Cleared after acquiring console_sem.
+	 */
+	PRINTK_CONSOLE_SPIN_B,
+};
+static long printk_handover_state;
+
+/*
+ * Number of kernel threads for offloading printing. We need at least two so
+ * that they can hand over printing from one to another one and thus switch
+ * CPUs.
+ */
+#define PRINTING_TASKS 2
+
+/* Wait queue printing tasks sleep on when idle */
+static DECLARE_WAIT_QUEUE_HEAD(print_queue); 
+
 #ifdef CONFIG_LOCKDEP
 static struct lockdep_map console_lock_dep_map = {
 	.name = "console_lock"
@@ -254,6 +280,19 @@ static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN);
 static char *log_buf = __log_buf;
 static u32 log_buf_len = __LOG_BUF_LEN;
 
+/*
+ * How many characters can we print in one call of printk before asking
+ * other cpus to continue printing. 0 means infinity. Tunable via
+ * printk.offload_chars kernel parameter. Our default 1000 means about
+ * 0.1s maximum latency due to printing.
+ */
+static unsigned int __read_mostly printk_offload_chars = 1000;
+
+module_param_named(offload_chars, printk_offload_chars, uint,
+		   S_IRUGO | S_IWUSR);
+MODULE_PARM_DESC(offload_chars, "offload printing to console to a different"
+	" cpu after this number of characters");
+
 /* human readable text of the record */
 static char *log_text(const struct printk_log *msg)
 {
@@ -1942,6 +1981,7 @@ void console_lock(void)
 	if (console_suspended)
 		return;
 	console_locked = 1;
+	printk_handover_state = 0;
 	console_may_schedule = 1;
 	mutex_acquire(&console_lock_dep_map, 0, 0, _RET_IP_);
 }
@@ -1964,12 +2004,45 @@ int console_trylock(void)
 		return 0;
 	}
 	console_locked = 1;
+	printk_handover_state = 0;
 	console_may_schedule = 0;
 	mutex_acquire(&console_lock_dep_map, 0, 1, _RET_IP_);
 	return 1;
 }
 EXPORT_SYMBOL(console_trylock);
 
+/*
+ * This is a version of console_lock() which spins to acquire console_sem.
+ * It is only for use by threads that take care of flushing printk buffer so
+ * that they can be sure they are not preempted while waiting for console_sem.
+ *
+ * The function returns 1 if we acquired console_sem, 0 if we failed (either
+ * someone else is already spinning, someone acquired console_sem, or console
+ * is suspended).
+ */
+static int console_lock_try_spin(void)
+{
+	WARN_ON_ONCE(!in_atomic());
+	/* Someone already spinning? Don't waste cpu time... */
+	if (test_and_set_bit(PRINTK_CONSOLE_SPIN_B, &printk_handover_state))
+		return 0;
+	while (down_trylock(&console_sem)) {
+		/* Someone else took console_sem? */
+		if (!test_bit(PRINTK_CONSOLE_SPIN_B, &printk_handover_state))
+			return 0;
+		cpu_relax();
+	}
+	printk_handover_state = 0;
+	if (console_suspended) {
+		up(&console_sem);
+		return 0;
+	}
+	console_locked = 1;
+	console_may_schedule = 0;
+	mutex_acquire(&console_lock_dep_map, 0, 0, _RET_IP_);
+	return 1;
+}
+
 int is_console_locked(void)
 {
 	return console_locked;
@@ -2004,15 +2077,44 @@ out:
 	raw_spin_unlock_irqrestore(&logbuf_lock, flags);
 }
 
+/*
+ * Returns true iff there is other cpu waiting to take over printing. This
+ * function also takes are of setting PRINTK_HANDOVER_B if we want to hand over
+ * printing to some other cpu.
+ */
+static bool cpu_stop_printing(int printed_chars)
+{
+	/* Oops? Print everything now to maximize chances user will see it */
+	if (oops_in_progress)
+		return false;
+	if (!printk_offload_chars || printed_chars < printk_offload_chars)
+		return false;
+	/* Someone is spinning on console_sem? Give away to him. */
+	if (test_bit(PRINTK_CONSOLE_SPIN_B, &printk_handover_state))
+		return true;
+	if (!test_bit(PRINTK_HANDOVER_B, &printk_handover_state)) {
+		set_bit(PRINTK_HANDOVER_B, &printk_handover_state);
+		/*
+		 * Paired with barrier in prepare_to_wait_exclusive() in
+		 * printing_task()
+		 */
+		smp_mb();
+		wake_up(&print_queue);
+	}
+	return false;
+}
+
 /**
  * console_unlock - unlock the console system
  *
  * Releases the console_lock which the caller holds on the console system
  * and the console driver list.
  *
- * While the console_lock was held, console output may have been buffered
- * by printk().  If this is the case, console_unlock(); emits
- * the output prior to releasing the lock.
+ * While the console_lock was held, console output may have been buffered by
+ * printk(). If this is the case, console_unlock() emits the output prior to
+ * releasing the lock. However we need not write all the data in the buffer if
+ * we would hog the CPU for too long. In such case we try to hand over printing
+ * to a different cpu.
  *
  * If there is output waiting, we wake /dev/kmsg and syslog() users.
  *
@@ -2025,6 +2127,8 @@ void console_unlock(void)
 	unsigned long flags;
 	bool wake_klogd = false;
 	bool retry;
+	bool hand_over = false;
+	int printed_chars = 0;
 
 	if (console_suspended) {
 		up(&console_sem);
@@ -2041,6 +2145,11 @@ again:
 		size_t len;
 		int level;
 
+		if (cpu_stop_printing(printed_chars)) {
+			hand_over = true;
+			break;
+		}
+
 		raw_spin_lock_irqsave(&logbuf_lock, flags);
 		if (seen_seq != log_next_seq) {
 			wake_klogd = true;
@@ -2054,8 +2163,10 @@ again:
 			console_prev = 0;
 		}
 skip:
-		if (console_seq == log_next_seq)
+		if (console_seq == log_next_seq) {
+			raw_spin_unlock(&logbuf_lock);
 			break;
+		}
 
 		msg = log_from_idx(console_idx);
 		if (msg->flags & LOG_NOCONS) {
@@ -2086,31 +2197,39 @@ skip:
 		stop_critical_timings();	/* don't trace print latency */
 		call_console_drivers(level, text, len);
 		start_critical_timings();
+		printed_chars += len;
 		local_irq_restore(flags);
 	}
-	console_locked = 0;
-	mutex_release(&console_lock_dep_map, 1, _RET_IP_);
 
 	/* Release the exclusive_console once it is used */
 	if (unlikely(exclusive_console))
 		exclusive_console = NULL;
 
-	raw_spin_unlock(&logbuf_lock);
-
+	/* Save modification of printk_handover_state in the common fast path */
+	if (test_bit(PRINTK_HANDOVER_B, &printk_handover_state))
+		clear_bit(PRINTK_HANDOVER_B, &printk_handover_state);
+	console_locked = 0;
+	mutex_release(&console_lock_dep_map, 1, _RET_IP_);
 	up(&console_sem);
 
 	/*
-	 * Someone could have filled up the buffer again, so re-check if there's
-	 * something to flush. In case we cannot trylock the console_sem again,
-	 * there's a new owner and the console_unlock() from them will do the
-	 * flush, no worries.
+	 * Subtlety: We have interrupts disabled iff hand_over == false (to
+	 * save one cli/sti pair in the fast path.
 	 */
-	raw_spin_lock(&logbuf_lock);
-	retry = console_seq != log_next_seq;
-	raw_spin_unlock_irqrestore(&logbuf_lock, flags);
+	if (!hand_over) {
+		/*
+		 * Someone could have filled up the buffer again, so re-check
+		 * if there's something to flush. In case we cannot trylock the
+		 * console_sem again, there's a new owner and the
+		 * console_unlock() from them will do the flush, no worries.
+		 */
+		raw_spin_lock(&logbuf_lock);
+		retry = console_seq != log_next_seq;
+		raw_spin_unlock_irqrestore(&logbuf_lock, flags);
 
-	if (retry && console_trylock())
-		goto again;
+		if (retry && console_trylock())
+			goto again;
+	}
 
 	if (wake_klogd)
 		wake_up_klogd();
@@ -2148,6 +2267,7 @@ void console_unblank(void)
 		console_lock();
 
 	console_locked = 1;
+	printk_handover_state = 0;
 	console_may_schedule = 0;
 	for_each_console(c)
 		if ((c->flags & CON_ENABLED) && c->unblank)
@@ -2424,9 +2544,38 @@ int unregister_console(struct console *console)
 }
 EXPORT_SYMBOL(unregister_console);
 
+/* Kthread which takes over printing from a CPU which asks for help */
+static int printing_task(void *arg)
+{
+	DEFINE_WAIT(wait);
+
+	while (1) {
+		prepare_to_wait_exclusive(&print_queue, &wait,
+					  TASK_INTERRUPTIBLE);
+		if (!test_bit(PRINTK_HANDOVER_B, &printk_handover_state))
+			schedule();
+		finish_wait(&print_queue, &wait);
+		/*
+		 * We don't want to be scheduled away once we got the CPU (that
+		 * would be especially problematic if we hold console_sem at
+		 * that moment since noone else could print to console). So
+		 * disable preemption and spin on console_sem. We shouldn't
+		 * spin for long since printing CPU drops console_sem as soon
+		 * as it notices there is someone spinning on it.
+		 */
+		preempt_disable();
+		if (console_lock_try_spin())
+			console_unlock();
+		preempt_enable();
+	}
+	return 0;
+}
+
 static int __init printk_late_init(void)
 {
 	struct console *con;
+	int i;
+	struct task_struct *task;
 
 	for_each_console(con) {
 		if (!keep_bootcon && con->flags & CON_BOOT) {
@@ -2434,6 +2583,19 @@ static int __init printk_late_init(void)
 		}
 	}
 	hotcpu_notifier(console_cpu_notify, 0);
+
+	/* Does any handover of printing have any sence? */
+	if (num_possible_cpus() <= 1)
+		return 0;
+
+	for (i = 0; i < PRINTING_TASKS; i++) {
+		task = kthread_run(printing_task, NULL, "print/%d", i);
+		if (IS_ERR(task)) {
+			pr_err("printk: Cannot create printing thread: %ld\n",
+			       PTR_ERR(task));
+		}
+	}
+
 	return 0;
 }
 late_initcall(printk_late_init);
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 6/6] kernel: Avoid softlockups in stop_machine() during heavy printing
  2014-03-13 15:58 [PATCH 0/6 v2] printk: Cleanups and softlockup avoidance Jan Kara
                   ` (4 preceding siblings ...)
  2014-03-13 15:58 ` [PATCH 5/6] printk: Hand over printing to console if printing too long Jan Kara
@ 2014-03-13 15:58 ` Jan Kara
  2014-03-13 23:09   ` Andrew Morton
  5 siblings, 1 reply; 13+ messages in thread
From: Jan Kara @ 2014-03-13 15:58 UTC (permalink / raw)
  To: Andrew Morton
  Cc: LKML, pmladek, Steven Rostedt, Frederic Weisbecker, Jan Kara

When there are lots of messages accumulated in printk buffer, printing
them (especially over serial console) can take a long time (tens of
seconds). stop_machine() will effectively make all cpus spin in
multi_cpu_stop() waiting for the CPU doing printing to print all the
messages which triggers NMI softlockup watchdog and RCU stall detector
which add even more to the messages to print. Since machine doesn't do
anything (except serving interrupts) during this time, also network
connections are dropped and other disturbances may happen.

Paper over the problem by waiting for printk buffer to be empty before
starting to stop CPUs. In theory a burst of new messages can be appended
to the printk buffer before CPUs enter multi_cpu_stop() so this isn't a 100%
solution but it works OK in practice and I'm not aware of a reasonably
simple better solution.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 include/linux/console.h |  1 +
 kernel/printk/printk.c  | 22 ++++++++++++++++++++++
 kernel/stop_machine.c   |  9 +++++++++
 3 files changed, 32 insertions(+)

diff --git a/include/linux/console.h b/include/linux/console.h
index 7571a16bd653..c61c169f85b3 100644
--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -150,6 +150,7 @@ extern int console_trylock(void);
 extern void console_unlock(void);
 extern void console_conditional_schedule(void);
 extern void console_unblank(void);
+extern void console_flush(void);
 extern struct tty_driver *console_device(int *);
 extern void console_stop(struct console *);
 extern void console_start(struct console *);
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 8ef179b170d4..9e76db8b0a53 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2296,6 +2296,28 @@ struct tty_driver *console_device(int *index)
 }
 
 /*
+ * Wait until all messages accumulated in the printk buffer are printed to
+ * console. Note that as soon as this function returns, new messages may be
+ * added to the printk buffer by other CPUs.
+ */
+void console_flush(void)
+{
+	bool retry;
+	unsigned long flags;
+
+	while (1) {
+		raw_spin_lock_irqsave(&logbuf_lock, flags);
+		retry = console_seq != log_next_seq;
+		raw_spin_unlock_irqrestore(&logbuf_lock, flags);
+		if (!retry)
+			break;
+		/* Cycle console_sem to wait for outstanding printing */
+		console_lock();
+		console_unlock();
+	}
+}
+
+/*
  * Prevent further output on the passed console device so that (for example)
  * serial drivers can disable console output before suspending a port, and can
  * re-enable output afterwards.
diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index 84571e09c907..14ac740e0c7f 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -21,6 +21,7 @@
 #include <linux/smpboot.h>
 #include <linux/atomic.h>
 #include <linux/lglock.h>
+#include <linux/console.h>
 
 /*
  * Structure to determine completion condition and record errors.  May
@@ -574,6 +575,14 @@ int __stop_machine(int (*fn)(void *), void *data, const struct cpumask *cpus)
 		return ret;
 	}
 
+	/*
+	 * If there are lots of outstanding messages, printing them can take a
+	 * long time and all cpus would be spinning waiting for the printing to
+	 * finish thus triggering NMI watchdog, RCU lockups etc. Wait for the
+	 * printing here to avoid these.
+	 */
+	console_flush();
+
 	/* Set the initial state and stop all online cpus. */
 	set_state(&msdata, MULTI_STOP_PREPARE);
 	return stop_cpus(cpu_online_mask, multi_cpu_stop, &msdata);
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 5/6] printk: Hand over printing to console if printing too long
  2014-03-13 15:58 ` [PATCH 5/6] printk: Hand over printing to console if printing too long Jan Kara
@ 2014-03-13 23:06   ` Andrew Morton
  2014-03-13 23:26     ` One Thousand Gnomes
                       ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Andrew Morton @ 2014-03-13 23:06 UTC (permalink / raw)
  To: Jan Kara; +Cc: LKML, pmladek, Steven Rostedt, Frederic Weisbecker

On Thu, 13 Mar 2014 16:58:37 +0100 Jan Kara <jack@suse.cz> wrote:

> Currently, console_unlock() prints messages from kernel printk buffer to
> console while the buffer is non-empty. When serial console is attached,
> printing is slow and thus other CPUs in the system have plenty of time
> to append new messages to the buffer while one CPU is printing. Thus the
> CPU can spend unbounded amount of time doing printing in console_unlock().
> This is especially serious problem if the printk() calling
> console_unlock() was called with interrupts disabled.
> 
> In practice users have observed a CPU can spend tens of seconds printing
> in console_unlock() (usually during boot when hundreds of SCSI devices
> are discovered) resulting in RCU stalls (CPU doing printing doesn't
> reach quiescent state for a long time), softlockup reports (IPIs for the
> printing CPU don't get served and thus other CPUs are spinning waiting
> for the printing CPU to process IPIs), and eventually a machine death
> (as messages from stalls and lockups append to printk buffer faster than
> we are able to print). So these machines are unable to boot with serial
> console attached. Also during artificial stress testing SATA disk
> disappears from the system because its interrupts aren't served for too
> long.
> 
> This patch implements a mechanism where after printing specified number
> of characters (tunable as a kernel parameter printk.offload_chars), CPU
> doing printing asks for help by setting PRINTK_HANDOVER_B bit in
> printk_handover_state variable and wakes up one of dedicated kthreads.
> As soon as the printing CPU notices kthread got scheduled and is
> spinning on console_sem, it drops console_sem and exits
> console_unlock(). kthread then takes over printing instead.  This way no
> CPU should spend printing too long even if there is heavy printk
> traffic.

Sigh ;)

Creating yet more kernel threads to solve this problem is a bit sad. 
Is there no way in which we can borrow one of the existing threads?

Do we need those threads (or this code) at all on uniprocessor builds?

I wonder if we should make printk_offload_chars=0 the default, so only
those people with a squillion disks incur the overhead.  But we'd still
have the "print" kernel threads, so maybe a different knob to enable the
feature?  If we do this we should set the knob to "true" for a month or
two to get testing coverage.

> --- a/Documentation/kernel-parameters.txt
> +++ b/Documentation/kernel-parameters.txt
> @@ -2621,6 +2621,22 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
>  			Format: <bool>  (1/Y/y=enable, 0/N/n=disable)
>  			default: disabled
>  
> +	printk.offload_chars=
> +			Printing to console can be relatively slow especially
> +			in case of serial console. When there is intensive
> +			printing happening from several cpus (as is the case
> +			during boot), a cpu can be spending significant time
> +			(seconds or more) doing printing. To avoid softlockups,
> +			lost interrupts, and similar problems other cpus
> +			will take over printing after the currently printing
> +			cpu has printed 'printk.offload_chars' characters.
> +			Higher value means possibly longer interrupt and other
> +			latencies but lower latency of printing and lower
> +			chance something goes wrong during system crash and
> +			important message is not printed.
> +			Format: <number>
> +			default: 0 (disabled)

The default is 1000!

>  	printk.time=	Show timing data prefixed to each printk message line
>  			Format: <bool>  (1/Y/y=enable, 0/N/n=disable)
>  
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
>
> ...
>
> +/*
> + * Returns true iff there is other cpu waiting to take over printing. This
> + * function also takes are of setting PRINTK_HANDOVER_B if we want to hand over
> + * printing to some other cpu.
> + */
> +static bool cpu_stop_printing(int printed_chars)
> +{
> +	/* Oops? Print everything now to maximize chances user will see it */
> +	if (oops_in_progress)
> +		return false;
> +	if (!printk_offload_chars || printed_chars < printk_offload_chars)
> +		return false;
> +	/* Someone is spinning on console_sem? Give away to him. */
> +	if (test_bit(PRINTK_CONSOLE_SPIN_B, &printk_handover_state))
> +		return true;
> +	if (!test_bit(PRINTK_HANDOVER_B, &printk_handover_state)) {
> +		set_bit(PRINTK_HANDOVER_B, &printk_handover_state);
> +		/*
> +		 * Paired with barrier in prepare_to_wait_exclusive() in
> +		 * printing_task()
> +		 */
> +		smp_mb();
> +		wake_up(&print_queue);
> +	}
> +	return false;
> +}

What happens if someone does a massive printk really early, perhaps
before the CPU scheduler is up and running?  Presumably things still
work OK, and the output will appear later in boot?

It might be sane to return false until system_state==something, unsure.

>  /**
>   * console_unlock - unlock the console system
>   *
>   * Releases the console_lock which the caller holds on the console system
>   * and the console driver list.
>   *
> - * While the console_lock was held, console output may have been buffered
> - * by printk().  If this is the case, console_unlock(); emits
> - * the output prior to releasing the lock.
> + * While the console_lock was held, console output may have been buffered by
> + * printk(). If this is the case, console_unlock() emits the output prior to
> + * releasing the lock. However we need not write all the data in the buffer if
> + * we would hog the CPU for too long. In such case we try to hand over printing
> + * to a different cpu.
>   *
>   * If there is output waiting, we wake /dev/kmsg and syslog() users.
>   *
> @@ -2025,6 +2127,8 @@ void console_unlock(void)
>  	unsigned long flags;
>  	bool wake_klogd = false;
>  	bool retry;
> +	bool hand_over = false;
> +	int printed_chars = 0;
>  
>  	if (console_suspended) {
>  		up(&console_sem);
> @@ -2041,6 +2145,11 @@ again:
>  		size_t len;
>  		int level;
>  
> +		if (cpu_stop_printing(printed_chars)) {
> +			hand_over = true;
> +			break;
> +		}
> +
>  		raw_spin_lock_irqsave(&logbuf_lock, flags);
>  		if (seen_seq != log_next_seq) {
>  			wake_klogd = true;
> @@ -2054,8 +2163,10 @@ again:
>  			console_prev = 0;
>  		}
>  skip:
> -		if (console_seq == log_next_seq)
> +		if (console_seq == log_next_seq) {
> +			raw_spin_unlock(&logbuf_lock);
>  			break;
> +		}
>  
>  		msg = log_from_idx(console_idx);
>  		if (msg->flags & LOG_NOCONS) {
> @@ -2086,31 +2197,39 @@ skip:
>  		stop_critical_timings();	/* don't trace print latency */
>  		call_console_drivers(level, text, len);
>  		start_critical_timings();
> +		printed_chars += len;
>  		local_irq_restore(flags);
>  	}
> -	console_locked = 0;
> -	mutex_release(&console_lock_dep_map, 1, _RET_IP_);
>  
>  	/* Release the exclusive_console once it is used */
>  	if (unlikely(exclusive_console))
>  		exclusive_console = NULL;
>  
> -	raw_spin_unlock(&logbuf_lock);
> -
> +	/* Save modification of printk_handover_state in the common fast path */
> +	if (test_bit(PRINTK_HANDOVER_B, &printk_handover_state))
> +		clear_bit(PRINTK_HANDOVER_B, &printk_handover_state);

If this is safe then we can use __clear_bit() ;)

> +	console_locked = 0;
> +	mutex_release(&console_lock_dep_map, 1, _RET_IP_);
>  	up(&console_sem);
>  
>  	/*
> -	 * Someone could have filled up the buffer again, so re-check if there's
> -	 * something to flush. In case we cannot trylock the console_sem again,
> -	 * there's a new owner and the console_unlock() from them will do the
> -	 * flush, no worries.
> +	 * Subtlety: We have interrupts disabled iff hand_over == false (to
> +	 * save one cli/sti pair in the fast path.
>  	 */
> -	raw_spin_lock(&logbuf_lock);
> -	retry = console_seq != log_next_seq;
> -	raw_spin_unlock_irqrestore(&logbuf_lock, flags);
> +	if (!hand_over) {
> +		/*
> +		 * Someone could have filled up the buffer again, so re-check
> +		 * if there's something to flush. In case we cannot trylock the
> +		 * console_sem again, there's a new owner and the
> +		 * console_unlock() from them will do the flush, no worries.
> +		 */
> +		raw_spin_lock(&logbuf_lock);
> +		retry = console_seq != log_next_seq;
> +		raw_spin_unlock_irqrestore(&logbuf_lock, flags);
>  
> -	if (retry && console_trylock())
> -		goto again;
> +		if (retry && console_trylock())
> +			goto again;
> +	}
>  
>  	if (wake_klogd)
>  		wake_up_klogd();
>
> ...
>
> @@ -2434,6 +2583,19 @@ static int __init printk_late_init(void)
>  		}
>  	}
>  	hotcpu_notifier(console_cpu_notify, 0);
> +
> +	/* Does any handover of printing have any sence? */

"Does handover of printing make any sense?"

> +	if (num_possible_cpus() <= 1)
> +		return 0;
> +
> +	for (i = 0; i < PRINTING_TASKS; i++) {
> +		task = kthread_run(printing_task, NULL, "print/%d", i);
> +		if (IS_ERR(task)) {
> +			pr_err("printk: Cannot create printing thread: %ld\n",
> +			       PTR_ERR(task));
> +		}
> +	}
> +


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 6/6] kernel: Avoid softlockups in stop_machine() during heavy printing
  2014-03-13 15:58 ` [PATCH 6/6] kernel: Avoid softlockups in stop_machine() during heavy printing Jan Kara
@ 2014-03-13 23:09   ` Andrew Morton
  0 siblings, 0 replies; 13+ messages in thread
From: Andrew Morton @ 2014-03-13 23:09 UTC (permalink / raw)
  To: Jan Kara; +Cc: LKML, pmladek, Steven Rostedt, Frederic Weisbecker

On Thu, 13 Mar 2014 16:58:38 +0100 Jan Kara <jack@suse.cz> wrote:

> When there are lots of messages accumulated in printk buffer, printing
> them (especially over serial console) can take a long time (tens of
> seconds). stop_machine() will effectively make all cpus spin in
> multi_cpu_stop() waiting for the CPU doing printing to print all the
> messages which triggers NMI softlockup watchdog and RCU stall detector
> which add even more to the messages to print. Since machine doesn't do
> anything (except serving interrupts) during this time, also network
> connections are dropped and other disturbances may happen.
> 
> Paper over the problem by waiting for printk buffer to be empty before
> starting to stop CPUs. In theory a burst of new messages can be appended
> to the printk buffer before CPUs enter multi_cpu_stop() so this isn't a 100%
> solution but it works OK in practice and I'm not aware of a reasonably
> simple better solution.
> 

Yes it's rather hacky, but it's simple and direct and explicit and
obvious.  It's the stealth hackiness which causes harm.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 5/6] printk: Hand over printing to console if printing too long
  2014-03-13 23:06   ` Andrew Morton
@ 2014-03-13 23:26     ` One Thousand Gnomes
  2014-03-14 12:26       ` Jan Kara
  2014-03-14 13:00     ` Jan Kara
  2014-03-14 13:30     ` Jan Kara
  2 siblings, 1 reply; 13+ messages in thread
From: One Thousand Gnomes @ 2014-03-13 23:26 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Jan Kara, LKML, pmladek, Steven Rostedt, Frederic Weisbecker

> Creating yet more kernel threads to solve this problem is a bit sad. 
> Is there no way in which we can borrow one of the existing threads?

schedule_work()

Surely that's all we need, if we get bunged up for a bit then it ends up
on a work queue item and everyone with a billion disks and a 9600 baud
serial port is still happy. Nobody else will ever see it.

Alan

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 5/6] printk: Hand over printing to console if printing too long
  2014-03-13 23:26     ` One Thousand Gnomes
@ 2014-03-14 12:26       ` Jan Kara
  0 siblings, 0 replies; 13+ messages in thread
From: Jan Kara @ 2014-03-14 12:26 UTC (permalink / raw)
  To: One Thousand Gnomes
  Cc: Andrew Morton, Jan Kara, LKML, pmladek, Steven Rostedt,
	Frederic Weisbecker

On Thu 13-03-14 23:26:36, One Thousand Gnomes wrote:
> > Creating yet more kernel threads to solve this problem is a bit sad. 
> > Is there no way in which we can borrow one of the existing threads?
> 
> schedule_work()
> 
> Surely that's all we need, if we get bunged up for a bit then it ends up
> on a work queue item and everyone with a billion disks and a 9600 baud
> serial port is still happy. Nobody else will ever see it.
  Yes, I know about schedule_work() and I've used it about an year (and ~10
patchset iterations) ago when first trying to solve this problem. The
problem with using workqueues is that if workqueue code ever does printk()
while holding some workqueue lock, we could deadlock when trying to do
schedule_work() from printk which might need the same lock. And people were
objecting to restricting workqueue code to not call printk(). That's why
I've chosen dedicated kthreads were we have tight control over what they
do.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 5/6] printk: Hand over printing to console if printing too long
  2014-03-13 23:06   ` Andrew Morton
  2014-03-13 23:26     ` One Thousand Gnomes
@ 2014-03-14 13:00     ` Jan Kara
  2014-03-14 13:30     ` Jan Kara
  2 siblings, 0 replies; 13+ messages in thread
From: Jan Kara @ 2014-03-14 13:00 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Jan Kara, LKML, pmladek, Steven Rostedt, Frederic Weisbecker

On Thu 13-03-14 16:06:25, Andrew Morton wrote:
> On Thu, 13 Mar 2014 16:58:37 +0100 Jan Kara <jack@suse.cz> wrote:
> 
> > Currently, console_unlock() prints messages from kernel printk buffer to
> > console while the buffer is non-empty. When serial console is attached,
> > printing is slow and thus other CPUs in the system have plenty of time
> > to append new messages to the buffer while one CPU is printing. Thus the
> > CPU can spend unbounded amount of time doing printing in console_unlock().
> > This is especially serious problem if the printk() calling
> > console_unlock() was called with interrupts disabled.
> > 
> > In practice users have observed a CPU can spend tens of seconds printing
> > in console_unlock() (usually during boot when hundreds of SCSI devices
> > are discovered) resulting in RCU stalls (CPU doing printing doesn't
> > reach quiescent state for a long time), softlockup reports (IPIs for the
> > printing CPU don't get served and thus other CPUs are spinning waiting
> > for the printing CPU to process IPIs), and eventually a machine death
> > (as messages from stalls and lockups append to printk buffer faster than
> > we are able to print). So these machines are unable to boot with serial
> > console attached. Also during artificial stress testing SATA disk
> > disappears from the system because its interrupts aren't served for too
> > long.
> > 
> > This patch implements a mechanism where after printing specified number
> > of characters (tunable as a kernel parameter printk.offload_chars), CPU
> > doing printing asks for help by setting PRINTK_HANDOVER_B bit in
> > printk_handover_state variable and wakes up one of dedicated kthreads.
> > As soon as the printing CPU notices kthread got scheduled and is
> > spinning on console_sem, it drops console_sem and exits
> > console_unlock(). kthread then takes over printing instead.  This way no
> > CPU should spend printing too long even if there is heavy printk
> > traffic.
> 
> Sigh ;)
> 
> Creating yet more kernel threads to solve this problem is a bit sad. 
> Is there no way in which we can borrow one of the existing threads?
> 
> Do we need those threads (or this code) at all on uniprocessor builds?
  No. Actually, I don't start the threads if num_possible_cpus <= 1. You
are right we could save some code on UP as well but is it really worth it?
But you are right most of embedded devices won't need this offloading
although they aren't UP. So maybe I can hide it behind a dedicated config
option?

> I wonder if we should make printk_offload_chars=0 the default, so only
> those people with a squillion disks incur the overhead.  But we'd still
> have the "print" kernel threads, so maybe a different knob to enable the
> feature?  If we do this we should set the knob to "true" for a month or
> two to get testing coverage.
  I agree with default 0 after some initial testing. I think we can just
start the threads first time someone sets printk_offload_chars > 0. OK?

> > --- a/Documentation/kernel-parameters.txt
> > +++ b/Documentation/kernel-parameters.txt
> > @@ -2621,6 +2621,22 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
> >  			Format: <bool>  (1/Y/y=enable, 0/N/n=disable)
> >  			default: disabled
> >  
> > +	printk.offload_chars=
> > +			Printing to console can be relatively slow especially
> > +			in case of serial console. When there is intensive
> > +			printing happening from several cpus (as is the case
> > +			during boot), a cpu can be spending significant time
> > +			(seconds or more) doing printing. To avoid softlockups,
> > +			lost interrupts, and similar problems other cpus
> > +			will take over printing after the currently printing
> > +			cpu has printed 'printk.offload_chars' characters.
> > +			Higher value means possibly longer interrupt and other
> > +			latencies but lower latency of printing and lower
> > +			chance something goes wrong during system crash and
> > +			important message is not printed.
> > +			Format: <number>
> > +			default: 0 (disabled)
> 
> The default is 1000!
  Thanks for catching this. Will update it (and after some testing we can
really set the default to 0).

> >  	printk.time=	Show timing data prefixed to each printk message line
> >  			Format: <bool>  (1/Y/y=enable, 0/N/n=disable)
> >  
> > --- a/kernel/printk/printk.c
> > +++ b/kernel/printk/printk.c
> >
> > ...
> >
> > +/*
> > + * Returns true iff there is other cpu waiting to take over printing. This
> > + * function also takes are of setting PRINTK_HANDOVER_B if we want to hand over
> > + * printing to some other cpu.
> > + */
> > +static bool cpu_stop_printing(int printed_chars)
> > +{
> > +	/* Oops? Print everything now to maximize chances user will see it */
> > +	if (oops_in_progress)
> > +		return false;
> > +	if (!printk_offload_chars || printed_chars < printk_offload_chars)
> > +		return false;
> > +	/* Someone is spinning on console_sem? Give away to him. */
> > +	if (test_bit(PRINTK_CONSOLE_SPIN_B, &printk_handover_state))
> > +		return true;
> > +	if (!test_bit(PRINTK_HANDOVER_B, &printk_handover_state)) {
> > +		set_bit(PRINTK_HANDOVER_B, &printk_handover_state);
> > +		/*
> > +		 * Paired with barrier in prepare_to_wait_exclusive() in
> > +		 * printing_task()
> > +		 */
> > +		smp_mb();
> > +		wake_up(&print_queue);
> > +	}
> > +	return false;
> > +}
> 
> What happens if someone does a massive printk really early, perhaps
> before the CPU scheduler is up and running?  Presumably things still
> work OK, and the output will appear later in boot?
  We never hand over printing until there is kthread spinning on
console_sem. So until system has booted enough, printk will work the same
wait it did so far.

> It might be sane to return false until system_state==something, unsure.
  Frankly, I don't think those few tests saved are worth it...

> >  /**
> >   * console_unlock - unlock the console system
> >   *
> >   * Releases the console_lock which the caller holds on the console system
> >   * and the console driver list.
> >   *
> > - * While the console_lock was held, console output may have been buffered
> > - * by printk().  If this is the case, console_unlock(); emits
> > - * the output prior to releasing the lock.
> > + * While the console_lock was held, console output may have been buffered by
> > + * printk(). If this is the case, console_unlock() emits the output prior to
> > + * releasing the lock. However we need not write all the data in the buffer if
> > + * we would hog the CPU for too long. In such case we try to hand over printing
> > + * to a different cpu.
> >   *
> >   * If there is output waiting, we wake /dev/kmsg and syslog() users.
> >   *
> > @@ -2025,6 +2127,8 @@ void console_unlock(void)
> >  	unsigned long flags;
> >  	bool wake_klogd = false;
> >  	bool retry;
> > +	bool hand_over = false;
> > +	int printed_chars = 0;
> >  
> >  	if (console_suspended) {
> >  		up(&console_sem);
> > @@ -2041,6 +2145,11 @@ again:
> >  		size_t len;
> >  		int level;
> >  
> > +		if (cpu_stop_printing(printed_chars)) {
> > +			hand_over = true;
> > +			break;
> > +		}
> > +
> >  		raw_spin_lock_irqsave(&logbuf_lock, flags);
> >  		if (seen_seq != log_next_seq) {
> >  			wake_klogd = true;
> > @@ -2054,8 +2163,10 @@ again:
> >  			console_prev = 0;
> >  		}
> >  skip:
> > -		if (console_seq == log_next_seq)
> > +		if (console_seq == log_next_seq) {
> > +			raw_spin_unlock(&logbuf_lock);
> >  			break;
> > +		}
> >  
> >  		msg = log_from_idx(console_idx);
> >  		if (msg->flags & LOG_NOCONS) {
> > @@ -2086,31 +2197,39 @@ skip:
> >  		stop_critical_timings();	/* don't trace print latency */
> >  		call_console_drivers(level, text, len);
> >  		start_critical_timings();
> > +		printed_chars += len;
> >  		local_irq_restore(flags);
> >  	}
> > -	console_locked = 0;
> > -	mutex_release(&console_lock_dep_map, 1, _RET_IP_);
> >  
> >  	/* Release the exclusive_console once it is used */
> >  	if (unlikely(exclusive_console))
> >  		exclusive_console = NULL;
> >  
> > -	raw_spin_unlock(&logbuf_lock);
> > -
> > +	/* Save modification of printk_handover_state in the common fast path */
> > +	if (test_bit(PRINTK_HANDOVER_B, &printk_handover_state))
> > +		clear_bit(PRINTK_HANDOVER_B, &printk_handover_state);
> 
> If this is safe then we can use __clear_bit() ;)
  Hum, not sure. We know noone will change PRINTK_HANDOVER_B since we hold
console_sem. That's why the code is safe. OTOH someone can change
PRINTK_CONSOLE_SPIN_B independently and if __clear_bit() is implemented as
RMW cycle of the whole long, we could miss clearing of *that* bit.

> > +	console_locked = 0;
> > +	mutex_release(&console_lock_dep_map, 1, _RET_IP_);
> >  	up(&console_sem);
> >  
> >  	/*
> > -	 * Someone could have filled up the buffer again, so re-check if there's
> > -	 * something to flush. In case we cannot trylock the console_sem again,
> > -	 * there's a new owner and the console_unlock() from them will do the
> > -	 * flush, no worries.
> > +	 * Subtlety: We have interrupts disabled iff hand_over == false (to
> > +	 * save one cli/sti pair in the fast path.
> >  	 */
> > -	raw_spin_lock(&logbuf_lock);
> > -	retry = console_seq != log_next_seq;
> > -	raw_spin_unlock_irqrestore(&logbuf_lock, flags);
> > +	if (!hand_over) {
> > +		/*
> > +		 * Someone could have filled up the buffer again, so re-check
> > +		 * if there's something to flush. In case we cannot trylock the
> > +		 * console_sem again, there's a new owner and the
> > +		 * console_unlock() from them will do the flush, no worries.
> > +		 */
> > +		raw_spin_lock(&logbuf_lock);
> > +		retry = console_seq != log_next_seq;
> > +		raw_spin_unlock_irqrestore(&logbuf_lock, flags);
> >  
> > -	if (retry && console_trylock())
> > -		goto again;
> > +		if (retry && console_trylock())
> > +			goto again;
> > +	}
> >  
> >  	if (wake_klogd)
> >  		wake_up_klogd();
> >
> > ...
> >
> > @@ -2434,6 +2583,19 @@ static int __init printk_late_init(void)
> >  		}
> >  	}
> >  	hotcpu_notifier(console_cpu_notify, 0);
> > +
> > +	/* Does any handover of printing have any sence? */
> 
> "Does handover of printing make any sense?"
  Thanks.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 5/6] printk: Hand over printing to console if printing too long
  2014-03-13 23:06   ` Andrew Morton
  2014-03-13 23:26     ` One Thousand Gnomes
  2014-03-14 13:00     ` Jan Kara
@ 2014-03-14 13:30     ` Jan Kara
  2 siblings, 0 replies; 13+ messages in thread
From: Jan Kara @ 2014-03-14 13:30 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Jan Kara, LKML, pmladek, Steven Rostedt, Frederic Weisbecker

On Thu 13-03-14 16:06:25, Andrew Morton wrote:
> On Thu, 13 Mar 2014 16:58:37 +0100 Jan Kara <jack@suse.cz> wrote:
> 
> > Currently, console_unlock() prints messages from kernel printk buffer to
> > console while the buffer is non-empty. When serial console is attached,
> > printing is slow and thus other CPUs in the system have plenty of time
> > to append new messages to the buffer while one CPU is printing. Thus the
> > CPU can spend unbounded amount of time doing printing in console_unlock().
> > This is especially serious problem if the printk() calling
> > console_unlock() was called with interrupts disabled.
> > 
> > In practice users have observed a CPU can spend tens of seconds printing
> > in console_unlock() (usually during boot when hundreds of SCSI devices
> > are discovered) resulting in RCU stalls (CPU doing printing doesn't
> > reach quiescent state for a long time), softlockup reports (IPIs for the
> > printing CPU don't get served and thus other CPUs are spinning waiting
> > for the printing CPU to process IPIs), and eventually a machine death
> > (as messages from stalls and lockups append to printk buffer faster than
> > we are able to print). So these machines are unable to boot with serial
> > console attached. Also during artificial stress testing SATA disk
> > disappears from the system because its interrupts aren't served for too
> > long.
> > 
> > This patch implements a mechanism where after printing specified number
> > of characters (tunable as a kernel parameter printk.offload_chars), CPU
> > doing printing asks for help by setting PRINTK_HANDOVER_B bit in
> > printk_handover_state variable and wakes up one of dedicated kthreads.
> > As soon as the printing CPU notices kthread got scheduled and is
> > spinning on console_sem, it drops console_sem and exits
> > console_unlock(). kthread then takes over printing instead.  This way no
> > CPU should spend printing too long even if there is heavy printk
> > traffic.
> 
> Sigh ;)
> 
> Creating yet more kernel threads to solve this problem is a bit sad. 
> Is there no way in which we can borrow one of the existing threads?
  I realized I didn't react to the above. I've checked which kernel threads
are running on my system and I can see:
rcu related threads, ksoftirqd, various fs threads, kauditd - nothing
really seems as a good fit. Then there are various workqueue related
threads - these aren't really suitable either. Finally I can see migration
threads (from kernel/stop_machine.c) - I'm not sure I want to mess with
these - especially since they can run any function somebody passes to them
and when that function calls printk() things might get interesting.

So I would go for starting the threads only if printk.offload_chars ever
gets set to value > 0.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2014-03-14 13:30 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-03-13 15:58 [PATCH 0/6 v2] printk: Cleanups and softlockup avoidance Jan Kara
2014-03-13 15:58 ` [PATCH 1/6] printk: Remove outdated comment Jan Kara
2014-03-13 15:58 ` [PATCH 2/6] printk: Release lockbuf_lock before calling console_trylock_for_printk() Jan Kara
2014-03-13 15:58 ` [PATCH 3/6] printk: Enable interrupts " Jan Kara
2014-03-13 15:58 ` [PATCH 4/6] printk: Remove separate printk_sched buffers and use printk buf instead Jan Kara
2014-03-13 15:58 ` [PATCH 5/6] printk: Hand over printing to console if printing too long Jan Kara
2014-03-13 23:06   ` Andrew Morton
2014-03-13 23:26     ` One Thousand Gnomes
2014-03-14 12:26       ` Jan Kara
2014-03-14 13:00     ` Jan Kara
2014-03-14 13:30     ` Jan Kara
2014-03-13 15:58 ` [PATCH 6/6] kernel: Avoid softlockups in stop_machine() during heavy printing Jan Kara
2014-03-13 23:09   ` Andrew Morton

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.