linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4 v6] Avoid softlockups in console_unlock()
@ 2013-11-07 21:48 Jan Kara
  2013-11-07 21:48 ` [PATCH 1/4] printk: Remove separate printk_sched buffers and use printk buf instead Jan Kara
                   ` (3 more replies)
  0 siblings, 4 replies; 33+ messages in thread
From: Jan Kara @ 2013-11-07 21:48 UTC (permalink / raw)
  To: Andrew Morton; +Cc: LKML, mhocko, Steven Rostedt, Jan Kara

  Hello,

  This is the next iteration of my printk patchset. Since v5 I've made the
limit for printing configurable via kernel parameter and let it default to 0.
So unless user sets printk.offload_chars on kernel command line, there will
be no difference to current printk behavior.

Summary:

These patches avoid softlockups when a CPU gets caught in console_unlock() for
a long time during heavy printing from other CPU. As is discussed in patch 3/4
it isn't enough to just silence the watchdog because if CPU spends too long in
console_unlock() also RCU will complain, other CPUs can be blocked waiting for
printing CPU to process IPI, and even disk can be offlined because interrupts
would be disabled for too long.

This patch series solves the problem by stopping printing in console_unlock()
after X (tunable) characters and the printing is postponed to irq work. To
avoid hogging a single CPU (irq work gets processed on the same CPU where it
was queued so it doesn't really help to reduce the printing load on that CPU)
we introduce a new type of lazy irq work - IRQ_WORK_UNBOUND - which can be
processed by any CPU.

The patch series has survived my testing without any softlockup reports.  I've
tested running sysrq-t (lots of printk output) while inserting modules (to
generate IPIs and also some printk traffic) and also running two delayed works
printing 10 KB of text each. All this was with simulated 9600 baud serial
console.

								Honza

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 1/4] printk: Remove separate printk_sched buffers and use printk buf instead
  2013-11-07 21:48 [PATCH 0/4 v6] Avoid softlockups in console_unlock() Jan Kara
@ 2013-11-07 21:48 ` Jan Kara
  2013-11-07 21:48 ` [PATCH 2/4] irq_work: Provide a irq work that can be processed on any cpu Jan Kara
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 33+ messages in thread
From: Jan Kara @ 2013-11-07 21:48 UTC (permalink / raw)
  To: Andrew Morton; +Cc: LKML, mhocko, Steven Rostedt, Jan Kara

From: Steven Rostedt <rostedt@goodmis.org>

To prevent deadlocks with doing a printk inside the scheduler,
printk_sched() was created. The issue is that printk has a console_sem
that it can grab and release. The release does a wake up if there's a
task pending on the sem, and this wake up grabs the rq locks that is
held in the scheduler. This leads to a possible deadlock if the wake up
uses the same rq as the one with the rq lock held already.

What printk_sched() does is to save the printk write in a per cpu buffer
and sets the PRINTK_PENDING_SCHED flag. On a timer tick, if this flag is
set, the printk() is done against the buffer.

There's a couple of issues with this approach.

1) If two printk_sched()s are called before the tick, the second one
will overwrite the first one.

2) The temporary buffer is 512 bytes and is per cpu. This is a quite a
bit of space wasted for something that is seldom used.

In order to remove this, the printk_sched() can use the printk buffer
instead, and delay the console_trylock()/console_unlock() to the queued
work.

Because printk_sched() would then be taking the logbuf_lock, the
logbuf_lock must not be held while doing anything that may call into the
scheduler functions, which includes wake ups. Unfortunately, printk()
also has a console_sem that it uses, and on release, the
up(&console_sem) may do a wake up of any pending waiters. This must be
avoided while holding the logbuf_lock.

Luckily, there's not many places that do the unlock, or hold the
logbuf_lock. By moving things around a little, the console_sem can be
released without ever holding the logbuf_lock, and we can safely have
printk_sched() use the printk buffer directly.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 kernel/printk/printk.c | 86 +++++++++++++++++++++++++++++++-------------------
 1 file changed, 54 insertions(+), 32 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index b4e8500afdb3..572fb1f438b6 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -208,6 +208,9 @@ struct printk_log {
 /*
  * The logbuf_lock protects kmsg buffer, indices, counters. It is also
  * used in interesting ways to provide interlocking in console_unlock();
+ * This can be taken within the scheduler's rq lock. It must be released
+ * before calling console_unlock() or anything else that might wake up
+ * a process.
  */
 static DEFINE_RAW_SPINLOCK(logbuf_lock);
 
@@ -1343,27 +1346,43 @@ static inline int can_use_console(unsigned int cpu)
  * interrupts disabled. It should return with 'lockbuf_lock'
  * released but interrupts still disabled.
  */
-static int console_trylock_for_printk(unsigned int cpu)
+static int console_trylock_for_printk(unsigned int cpu, bool in_sched)
 	__releases(&logbuf_lock)
 {
 	int retval = 0, wake = 0;
 
-	if (console_trylock()) {
-		retval = 1;
+	/* if called from the scheduler, we can not call up() */
+	if (in_sched)
+		goto out;
 
-		/*
-		 * If we can't use the console, we need to release
-		 * the console semaphore by hand to avoid flushing
-		 * the buffer. We need to hold the console semaphore
-		 * in order to do this test safely.
-		 */
-		if (!can_use_console(cpu)) {
-			console_locked = 0;
-			wake = 1;
-			retval = 0;
-		}
+	if (down_trylock(&console_sem))
+		goto out;
+
+	/*
+	 * If we can't use the console, we need to release
+	 * the console semaphore by hand to avoid flushing
+	 * the buffer. We need to hold the console semaphore
+	 * in order to do this test safely.
+	 */
+	if (console_suspended || !can_use_console(cpu)) {
+		wake = 1;
+		goto out;
 	}
+
+	/* console is now locked */
+
+	console_locked = 1;
+	console_may_schedule = 0;
+	mutex_acquire(&console_lock_dep_map, 0, 1, _RET_IP_);
+
+	retval = 1;
+
+out:
 	logbuf_cpu = UINT_MAX;
+	/*
+	 * The logbuf_lock must not be held when doing a wake up,
+	 * which the up(&console_sem) can do.
+	 */
 	raw_spin_unlock(&logbuf_lock);
 	if (wake)
 		up(&console_sem);
@@ -1495,11 +1514,17 @@ asmlinkage int vprintk_emit(int facility, int level,
 	static int recursion_bug;
 	static char textbuf[LOG_LINE_MAX];
 	char *text = textbuf;
-	size_t text_len;
+	size_t text_len = 0;
 	enum log_flags lflags = 0;
 	unsigned long flags;
 	int this_cpu;
 	int printed_len = 0;
+	bool in_sched = false;
+
+	if (level == -2) {
+		level = -1;
+		in_sched = true;
+	}
 
 	boot_delay_msec(level);
 	printk_delay();
@@ -1545,7 +1570,12 @@ asmlinkage int vprintk_emit(int facility, int level,
 	 * The printf needs to come first; we need the syslog
 	 * prefix which might be passed-in as a parameter.
 	 */
-	text_len = vscnprintf(text, sizeof(textbuf), fmt, args);
+	if (in_sched)
+		text_len = scnprintf(text, sizeof(textbuf),
+				     KERN_WARNING "[sched_delayed] ");
+
+	text_len += vscnprintf(text + text_len,
+			       sizeof(textbuf) - text_len, fmt, args);
 
 	/* mark and strip a trailing newline */
 	if (text_len && text[text_len-1] == '\n') {
@@ -1620,7 +1650,7 @@ asmlinkage int vprintk_emit(int facility, int level,
 	 * The console_trylock_for_printk() function will release 'logbuf_lock'
 	 * regardless of whether it actually gets the console semaphore or not.
 	 */
-	if (console_trylock_for_printk(this_cpu))
+	if (console_trylock_for_printk(this_cpu, in_sched))
 		console_unlock();
 
 	lockdep_on();
@@ -2435,21 +2465,19 @@ late_initcall(printk_late_init);
 /*
  * Delayed printk version, for scheduler-internal messages:
  */
-#define PRINTK_BUF_SIZE		512
-
 #define PRINTK_PENDING_WAKEUP	0x01
-#define PRINTK_PENDING_SCHED	0x02
+#define PRINTK_PENDING_OUTPUT	0x02
 
 static DEFINE_PER_CPU(int, printk_pending);
-static DEFINE_PER_CPU(char [PRINTK_BUF_SIZE], printk_sched_buf);
 
 static void wake_up_klogd_work_func(struct irq_work *irq_work)
 {
 	int pending = __this_cpu_xchg(printk_pending, 0);
 
-	if (pending & PRINTK_PENDING_SCHED) {
-		char *buf = __get_cpu_var(printk_sched_buf);
-		printk(KERN_WARNING "[sched_delayed] %s", buf);
+	if (pending & PRINTK_PENDING_OUTPUT) {
+		/* If trylock fails, someone else is doing the printing */
+		if (console_trylock())
+			console_unlock();
 	}
 
 	if (pending & PRINTK_PENDING_WAKEUP)
@@ -2473,21 +2501,15 @@ void wake_up_klogd(void)
 
 int printk_sched(const char *fmt, ...)
 {
-	unsigned long flags;
 	va_list args;
-	char *buf;
 	int r;
 
-	local_irq_save(flags);
-	buf = __get_cpu_var(printk_sched_buf);
-
 	va_start(args, fmt);
-	r = vsnprintf(buf, PRINTK_BUF_SIZE, fmt, args);
+	r = vprintk_emit(0, -2, NULL, 0, fmt, args);
 	va_end(args);
 
-	__this_cpu_or(printk_pending, PRINTK_PENDING_SCHED);
+	__this_cpu_or(printk_pending, PRINTK_PENDING_OUTPUT);
 	irq_work_queue(&__get_cpu_var(wake_up_klogd_work));
-	local_irq_restore(flags);
 
 	return r;
 }
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 2/4] irq_work: Provide a irq work that can be processed on any cpu
  2013-11-07 21:48 [PATCH 0/4 v6] Avoid softlockups in console_unlock() Jan Kara
  2013-11-07 21:48 ` [PATCH 1/4] printk: Remove separate printk_sched buffers and use printk buf instead Jan Kara
@ 2013-11-07 21:48 ` Jan Kara
  2013-11-07 22:13   ` Frederic Weisbecker
  2013-11-07 21:48 ` [PATCH 3/4] printk: Defer printing to irq work when we printed too much Jan Kara
  2013-11-07 21:48 ` [PATCH 4/4] printk: Use unbound irq work for printing and waking Jan Kara
  3 siblings, 1 reply; 33+ messages in thread
From: Jan Kara @ 2013-11-07 21:48 UTC (permalink / raw)
  To: Andrew Morton; +Cc: LKML, mhocko, Steven Rostedt, Jan Kara

Provide new irq work flag - IRQ_WORK_UNBOUND - meaning that can be
processed on any cpu. This flag implies IRQ_WORK_LAZY so that things are
simple and we don't have to pick any particular cpu to do the work. We
just do the work from a timer tick on whichever cpu it happens first.
This is useful as a lightweight and simple code path without locking or
other dependencies to offload work to other cpu if possible.

We will use this type of irq work to make a guarantee of forward
progress of printing to a (serial) console when printing on one cpu
would cause interrupts to be disabled for too long.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 include/linux/irq_work.h |  2 ++
 kernel/irq_work.c        | 41 +++++++++++++++++++++++++----------------
 2 files changed, 27 insertions(+), 16 deletions(-)

diff --git a/include/linux/irq_work.h b/include/linux/irq_work.h
index 66017028dcb3..ca07a16355ed 100644
--- a/include/linux/irq_work.h
+++ b/include/linux/irq_work.h
@@ -16,6 +16,8 @@
 #define IRQ_WORK_BUSY		2UL
 #define IRQ_WORK_FLAGS		3UL
 #define IRQ_WORK_LAZY		4UL /* Doesn't want IPI, wait for tick */
+#define __IRQ_WORK_UNBOUND      8UL /* Use IRQ_WORK_UNBOUND instead! */
+#define IRQ_WORK_UNBOUND	(__IRQ_WORK_UNBOUND | IRQ_WORK_LAZY) /* Any cpu can process this work */
 
 struct irq_work {
 	unsigned long flags;
diff --git a/kernel/irq_work.c b/kernel/irq_work.c
index 55fcce6065cf..b06350b63c67 100644
--- a/kernel/irq_work.c
+++ b/kernel/irq_work.c
@@ -22,6 +22,9 @@
 static DEFINE_PER_CPU(struct llist_head, irq_work_list);
 static DEFINE_PER_CPU(int, irq_work_raised);
 
+/* List of irq-work any CPU can pick up */
+static LLIST_HEAD(unbound_irq_work_list);
+
 /*
  * Claim the entry so that no one else will poke at it.
  */
@@ -70,12 +73,16 @@ void irq_work_queue(struct irq_work *work)
 	/* Queue the entry and raise the IPI if needed. */
 	preempt_disable();
 
-	llist_add(&work->llnode, &__get_cpu_var(irq_work_list));
+	if (work->flags & __IRQ_WORK_UNBOUND)
+		llist_add(&work->llnode, &unbound_irq_work_list);
+	else
+		llist_add(&work->llnode, &__get_cpu_var(irq_work_list));
 
 	/*
 	 * If the work is not "lazy" or the tick is stopped, raise the irq
 	 * work interrupt (if supported by the arch), otherwise, just wait
-	 * for the next tick.
+	 * for the next tick. We do this even for unbound work to make sure
+	 * *some* CPU will be doing the work.
 	 */
 	if (!(work->flags & IRQ_WORK_LAZY) || tick_nohz_tick_stopped()) {
 		if (!this_cpu_cmpxchg(irq_work_raised, 0, 1))
@@ -100,28 +107,17 @@ bool irq_work_needs_cpu(void)
 	return true;
 }
 
-static void __irq_work_run(void)
+static void process_irq_work_list(struct llist_head *llhead)
 {
 	unsigned long flags;
 	struct irq_work *work;
-	struct llist_head *this_list;
 	struct llist_node *llnode;
 
-
-	/*
-	 * Reset the "raised" state right before we check the list because
-	 * an NMI may enqueue after we find the list empty from the runner.
-	 */
-	__this_cpu_write(irq_work_raised, 0);
-	barrier();
-
-	this_list = &__get_cpu_var(irq_work_list);
-	if (llist_empty(this_list))
+	if (llist_empty(llhead))
 		return;
 
 	BUG_ON(!irqs_disabled());
-
-	llnode = llist_del_all(this_list);
+	llnode = llist_del_all(llhead);
 	while (llnode != NULL) {
 		work = llist_entry(llnode, struct irq_work, llnode);
 
@@ -146,6 +142,19 @@ static void __irq_work_run(void)
 	}
 }
 
+static void __irq_work_run(void)
+{
+	/*
+	 * Reset the "raised" state right before we check the list because
+	 * an NMI may enqueue after we find the list empty from the runner.
+	 */
+	__this_cpu_write(irq_work_raised, 0);
+	barrier();
+
+	process_irq_work_list(&__get_cpu_var(irq_work_list));
+	process_irq_work_list(&unbound_irq_work_list);
+}
+
 /*
  * Run the irq_work entries on this cpu. Requires to be ran from hardirq
  * context with local IRQs disabled.
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 3/4] printk: Defer printing to irq work when we printed too much
  2013-11-07 21:48 [PATCH 0/4 v6] Avoid softlockups in console_unlock() Jan Kara
  2013-11-07 21:48 ` [PATCH 1/4] printk: Remove separate printk_sched buffers and use printk buf instead Jan Kara
  2013-11-07 21:48 ` [PATCH 2/4] irq_work: Provide a irq work that can be processed on any cpu Jan Kara
@ 2013-11-07 21:48 ` Jan Kara
  2013-11-07 22:43   ` Frederic Weisbecker
  2013-11-07 21:48 ` [PATCH 4/4] printk: Use unbound irq work for printing and waking Jan Kara
  3 siblings, 1 reply; 33+ messages in thread
From: Jan Kara @ 2013-11-07 21:48 UTC (permalink / raw)
  To: Andrew Morton; +Cc: LKML, mhocko, Steven Rostedt, Jan Kara

A CPU can be caught in console_unlock() for a long time (tens of seconds
are reported by our customers) when other CPUs are using printk heavily
and serial console makes printing slow. Despite serial console drivers
are calling touch_nmi_watchdog() this triggers softlockup warnings
because interrupts are disabled for the whole time console_unlock() runs
(e.g. vprintk() calls console_unlock() with interrupts disabled). Thus
IPIs cannot be processed and other CPUs get stuck spinning in calls like
smp_call_function_many(). Also RCU eventually starts reporting lockups.

In my artifical testing I can also easily trigger a situation when disk
disappears from the system apparently because interrupt from it wasn't
served for too long. This is why just silencing watchdogs isn't a
reliable solution to the problem and we simply have to avoid spending
too long in console_unlock() with interrupts disabled.

The solution this patch works toward is to postpone printing to a later
moment / different CPU when we already printed over X characters in
current console_unlock() invocation. This is a crude heuristic but
measuring time we spent printing doesn't seem to be really viable - we
cannot rely on high resolution time being available and with interrupts
disabled jiffies are not updated. User can tune the value X via
printk.offload_chars kernel parameter.

Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 Documentation/kernel-parameters.txt | 17 ++++++++++
 kernel/printk/printk.c              | 68 +++++++++++++++++++++++++++++++------
 2 files changed, 75 insertions(+), 10 deletions(-)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index fcbb736d55fe..579815bd90eb 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -2540,6 +2540,23 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
 			Format: <bool>  (1/Y/y=enable, 0/N/n=disable)
 			default: disabled
 
+	printk.offload_chars=
+			Printing to console can be relatively slow especially
+			in case of serial console. When there is intensive
+			printing happening from several cpus (as is the case
+			during boot), a cpu can be spending significant time
+			(seconds or more) doing printing. To avoid softlockups,
+			lost interrupts, and similar problems a cpu stops
+			printing to console after printing
+			'printk.offload_chars' characters.  Another cpu will
+			then continue printing. Higher value means possibly
+			longer interrupt and other latencies but lower latency
+			of printing and lower chance something goes wrong
+			during system crash and important message is not
+			printed.
+			Format: <number>
+			default: 0 (disabled)
+
 	printk.time=	Show timing data prefixed to each printk message line
 			Format: <bool>  (1/Y/y=enable, 0/N/n=disable)
 
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 572fb1f438b6..35bb70ea6427 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -253,6 +253,18 @@ static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN);
 static char *log_buf = __log_buf;
 static u32 log_buf_len = __LOG_BUF_LEN;
 
+/*
+ * How much characters can we print in one call of printk before offloading
+ * printing work (0 means infinity). Tunable via
+ * /proc/sys/kernel/printk_offload_chars.
+ */
+static unsigned int __read_mostly printk_offload_chars = 0;
+
+module_param_named(offload_chars, printk_offload_chars, uint,
+		   S_IRUGO | S_IWUSR);
+MODULE_PARM_DESC(offload_chars, "offload printing to console to a different"
+	" cpu after this number of characters");
+
 /* cpu currently holding logbuf_lock */
 static volatile unsigned int logbuf_cpu = UINT_MAX;
 
@@ -2026,31 +2038,43 @@ out:
 	raw_spin_unlock_irqrestore(&logbuf_lock, flags);
 }
 
+/* Should we stop printing on this cpu? */
+static bool cpu_stop_printing(int printed_chars)
+{
+	/* Oops? Print everything now to maximize chances user will see it */
+	if (oops_in_progress)
+		return false;
+	return printk_offload_chars && printed_chars > printk_offload_chars;
+}
+
 /**
- * console_unlock - unlock the console system
+ * __console_unlock - unlock the console system
  *
  * Releases the console_lock which the caller holds on the console system
  * and the console driver list.
  *
- * While the console_lock was held, console output may have been buffered
- * by printk().  If this is the case, console_unlock(); emits
- * the output prior to releasing the lock.
+ * While the console_lock was held, console output may have been buffered by
+ * printk(). If this is the case, __console_unlock() emits the output prior to
+ * releasing the lock. However we need not write all the data in the buffer if
+ * we would hog the CPU for too long. Function returns true, if there's more
+ * data that needs printing in the buffer.
  *
  * If there is output waiting, we wake /dev/kmsg and syslog() users.
  *
- * console_unlock(); may be called from any context.
+ * __console_unlock(); may be called from any context.
  */
-void console_unlock(void)
+static bool __console_unlock(void)
 {
 	static char text[LOG_LINE_MAX + PREFIX_MAX];
 	static u64 seen_seq;
 	unsigned long flags;
 	bool wake_klogd = false;
 	bool retry;
+	int printed_chars = 0;
 
 	if (console_suspended) {
 		up(&console_sem);
-		return;
+		return false;
 	}
 
 	console_may_schedule = 0;
@@ -2078,6 +2102,8 @@ again:
 skip:
 		if (console_seq == log_next_seq)
 			break;
+		if (cpu_stop_printing(printed_chars))
+			break;
 
 		msg = log_from_idx(console_idx);
 		if (msg->flags & LOG_NOCONS) {
@@ -2108,6 +2134,7 @@ skip:
 		stop_critical_timings();	/* don't trace print latency */
 		call_console_drivers(level, text, len);
 		start_critical_timings();
+		printed_chars += len;
 		local_irq_restore(flags);
 	}
 	console_locked = 0;
@@ -2131,13 +2158,20 @@ skip:
 	retry = console_seq != log_next_seq;
 	raw_spin_unlock_irqrestore(&logbuf_lock, flags);
 
-	if (retry && console_trylock())
-		goto again;
+	if (retry && !cpu_stop_printing(printed_chars)) {
+		if (console_trylock())
+			goto again;
+		/*
+		 * Someone else is printing so the caller doesn't have to
+		 * schedule irq work
+		 */
+		retry = false;
+	}
 
 	if (wake_klogd)
 		wake_up_klogd();
+	return retry;
 }
-EXPORT_SYMBOL(console_unlock);
 
 /**
  * console_conditional_schedule - yield the CPU if required
@@ -2515,6 +2549,20 @@ int printk_sched(const char *fmt, ...)
 }
 
 /*
+ * This is a wrapper for __console_unlock() that makes sure the rest of buffer
+ * is printed in future.
+ */
+void console_unlock(void)
+{
+       if (__console_unlock()) {
+               /* Leave the rest of printing for a timer tick */
+		__this_cpu_or(printk_pending, PRINTK_PENDING_OUTPUT);
+		irq_work_queue(&__get_cpu_var(wake_up_klogd_work));
+	}
+}
+EXPORT_SYMBOL(console_unlock);
+
+/*
  * printk rate limiting, lifted from the networking subsystem.
  *
  * This enforces a rate limit: not more than 10 kernel messages
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 4/4] printk: Use unbound irq work for printing and waking
  2013-11-07 21:48 [PATCH 0/4 v6] Avoid softlockups in console_unlock() Jan Kara
                   ` (2 preceding siblings ...)
  2013-11-07 21:48 ` [PATCH 3/4] printk: Defer printing to irq work when we printed too much Jan Kara
@ 2013-11-07 21:48 ` Jan Kara
  3 siblings, 0 replies; 33+ messages in thread
From: Jan Kara @ 2013-11-07 21:48 UTC (permalink / raw)
  To: Andrew Morton; +Cc: LKML, mhocko, Steven Rostedt, Jan Kara

Now when per-cpu printk buffers are gone, there's no need to have printk
flags or printk irq_work per cpu. Just make printk_pending a single
variable operated by atomic operations and have single unbound irq work
doing the waking and printing. This has an advantage that any cpu can do the
printing / wakeup work thus lowering the latency of printing and better
distributing the printing work over cpus.

Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 kernel/printk/printk.c | 47 ++++++++++++++++++++++++++---------------------
 1 file changed, 26 insertions(+), 21 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 35bb70ea6427..dc314f074e8f 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2499,36 +2499,41 @@ late_initcall(printk_late_init);
 /*
  * Delayed printk version, for scheduler-internal messages:
  */
-#define PRINTK_PENDING_WAKEUP	0x01
-#define PRINTK_PENDING_OUTPUT	0x02
+#define PRINTK_PENDING_WAKEUP	0
+#define PRINTK_PENDING_OUTPUT	1
 
-static DEFINE_PER_CPU(int, printk_pending);
+static unsigned long printk_pending;
 
-static void wake_up_klogd_work_func(struct irq_work *irq_work)
+static void printk_irq_work_func(struct irq_work *irq_work)
 {
-	int pending = __this_cpu_xchg(printk_pending, 0);
+	if (printk_pending) {
+		unsigned long pending = xchg(&printk_pending, 0);
 
-	if (pending & PRINTK_PENDING_OUTPUT) {
-		/* If trylock fails, someone else is doing the printing */
-		if (console_trylock())
-			console_unlock();
-	}
+		if (test_bit(PRINTK_PENDING_OUTPUT, &pending)) {
+			/*
+			 * If trylock fails, someone else is doing the
+			 * printing
+			 */
+			if (console_trylock())
+				console_unlock();
+		}
 
-	if (pending & PRINTK_PENDING_WAKEUP)
-		wake_up_interruptible(&log_wait);
+		if (test_bit(PRINTK_PENDING_WAKEUP, &pending))
+			wake_up_interruptible(&log_wait);
+	}
 }
 
-static DEFINE_PER_CPU(struct irq_work, wake_up_klogd_work) = {
-	.func = wake_up_klogd_work_func,
-	.flags = IRQ_WORK_LAZY,
+static struct irq_work printk_irq_work = {
+	.func = printk_irq_work_func,
+	.flags = IRQ_WORK_UNBOUND,
 };
 
 void wake_up_klogd(void)
 {
 	preempt_disable();
 	if (waitqueue_active(&log_wait)) {
-		this_cpu_or(printk_pending, PRINTK_PENDING_WAKEUP);
-		irq_work_queue(&__get_cpu_var(wake_up_klogd_work));
+		set_bit(PRINTK_PENDING_WAKEUP, &printk_pending);
+		irq_work_queue(&printk_irq_work);
 	}
 	preempt_enable();
 }
@@ -2542,8 +2547,8 @@ int printk_sched(const char *fmt, ...)
 	r = vprintk_emit(0, -2, NULL, 0, fmt, args);
 	va_end(args);
 
-	__this_cpu_or(printk_pending, PRINTK_PENDING_OUTPUT);
-	irq_work_queue(&__get_cpu_var(wake_up_klogd_work));
+	set_bit(PRINTK_PENDING_OUTPUT, &printk_pending);
+	irq_work_queue(&printk_irq_work);
 
 	return r;
 }
@@ -2556,8 +2561,8 @@ void console_unlock(void)
 {
        if (__console_unlock()) {
                /* Leave the rest of printing for a timer tick */
-		__this_cpu_or(printk_pending, PRINTK_PENDING_OUTPUT);
-		irq_work_queue(&__get_cpu_var(wake_up_klogd_work));
+		set_bit(PRINTK_PENDING_OUTPUT, &printk_pending);
+		irq_work_queue(&printk_irq_work);
 	}
 }
 EXPORT_SYMBOL(console_unlock);
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/4] irq_work: Provide a irq work that can be processed on any cpu
  2013-11-07 21:48 ` [PATCH 2/4] irq_work: Provide a irq work that can be processed on any cpu Jan Kara
@ 2013-11-07 22:13   ` Frederic Weisbecker
  2013-11-07 22:19     ` Jan Kara
  0 siblings, 1 reply; 33+ messages in thread
From: Frederic Weisbecker @ 2013-11-07 22:13 UTC (permalink / raw)
  To: Jan Kara; +Cc: Andrew Morton, LKML, Michal Hocko, Steven Rostedt

2013/11/7 Jan Kara <jack@suse.cz>:
> Provide new irq work flag - IRQ_WORK_UNBOUND - meaning that can be
> processed on any cpu. This flag implies IRQ_WORK_LAZY so that things are
> simple and we don't have to pick any particular cpu to do the work. We
> just do the work from a timer tick on whichever cpu it happens first.
> This is useful as a lightweight and simple code path without locking or
> other dependencies to offload work to other cpu if possible.
>
> We will use this type of irq work to make a guarantee of forward
> progress of printing to a (serial) console when printing on one cpu
> would cause interrupts to be disabled for too long.
>
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  include/linux/irq_work.h |  2 ++
>  kernel/irq_work.c        | 41 +++++++++++++++++++++++++----------------
>  2 files changed, 27 insertions(+), 16 deletions(-)
>
> diff --git a/include/linux/irq_work.h b/include/linux/irq_work.h
> index 66017028dcb3..ca07a16355ed 100644
> --- a/include/linux/irq_work.h
> +++ b/include/linux/irq_work.h
> @@ -16,6 +16,8 @@
>  #define IRQ_WORK_BUSY          2UL
>  #define IRQ_WORK_FLAGS         3UL
>  #define IRQ_WORK_LAZY          4UL /* Doesn't want IPI, wait for tick */
> +#define __IRQ_WORK_UNBOUND      8UL /* Use IRQ_WORK_UNBOUND instead! */
> +#define IRQ_WORK_UNBOUND       (__IRQ_WORK_UNBOUND | IRQ_WORK_LAZY) /* Any cpu can process this work */
>
>  struct irq_work {
>         unsigned long flags;
> diff --git a/kernel/irq_work.c b/kernel/irq_work.c
> index 55fcce6065cf..b06350b63c67 100644
> --- a/kernel/irq_work.c
> +++ b/kernel/irq_work.c
> @@ -22,6 +22,9 @@
>  static DEFINE_PER_CPU(struct llist_head, irq_work_list);
>  static DEFINE_PER_CPU(int, irq_work_raised);
>
> +/* List of irq-work any CPU can pick up */
> +static LLIST_HEAD(unbound_irq_work_list);
> +
>  /*
>   * Claim the entry so that no one else will poke at it.
>   */
> @@ -70,12 +73,16 @@ void irq_work_queue(struct irq_work *work)
>         /* Queue the entry and raise the IPI if needed. */
>         preempt_disable();
>
> -       llist_add(&work->llnode, &__get_cpu_var(irq_work_list));
> +       if (work->flags & __IRQ_WORK_UNBOUND)
> +               llist_add(&work->llnode, &unbound_irq_work_list);
> +       else
> +               llist_add(&work->llnode, &__get_cpu_var(irq_work_list));
>
>         /*
>          * If the work is not "lazy" or the tick is stopped, raise the irq
>          * work interrupt (if supported by the arch), otherwise, just wait
> -        * for the next tick.
> +        * for the next tick. We do this even for unbound work to make sure
> +        * *some* CPU will be doing the work.
>          */
>         if (!(work->flags & IRQ_WORK_LAZY) || tick_nohz_tick_stopped()) {
>                 if (!this_cpu_cmpxchg(irq_work_raised, 0, 1))
> @@ -100,28 +107,17 @@ bool irq_work_needs_cpu(void)
>         return true;
>  }
>
> -static void __irq_work_run(void)
> +static void process_irq_work_list(struct llist_head *llhead)
>  {
>         unsigned long flags;
>         struct irq_work *work;
> -       struct llist_head *this_list;
>         struct llist_node *llnode;
>
> -
> -       /*
> -        * Reset the "raised" state right before we check the list because
> -        * an NMI may enqueue after we find the list empty from the runner.
> -        */
> -       __this_cpu_write(irq_work_raised, 0);
> -       barrier();
> -
> -       this_list = &__get_cpu_var(irq_work_list);
> -       if (llist_empty(this_list))
> +       if (llist_empty(llhead))
>                 return;
>
>         BUG_ON(!irqs_disabled());
> -
> -       llnode = llist_del_all(this_list);
> +       llnode = llist_del_all(llhead);
>         while (llnode != NULL) {
>                 work = llist_entry(llnode, struct irq_work, llnode);
>
> @@ -146,6 +142,19 @@ static void __irq_work_run(void)
>         }
>  }
>
> +static void __irq_work_run(void)
> +{
> +       /*
> +        * Reset the "raised" state right before we check the list because
> +        * an NMI may enqueue after we find the list empty from the runner.
> +        */
> +       __this_cpu_write(irq_work_raised, 0);
> +       barrier();
> +
> +       process_irq_work_list(&__get_cpu_var(irq_work_list));
> +       process_irq_work_list(&unbound_irq_work_list);
> +}
> +

But then, who's going to process that work if every CPUs is idle?

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/4] irq_work: Provide a irq work that can be processed on any cpu
  2013-11-07 22:13   ` Frederic Weisbecker
@ 2013-11-07 22:19     ` Jan Kara
  2013-11-07 22:23       ` Frederic Weisbecker
  2013-11-07 22:32       ` Frederic Weisbecker
  0 siblings, 2 replies; 33+ messages in thread
From: Jan Kara @ 2013-11-07 22:19 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Jan Kara, Andrew Morton, LKML, Michal Hocko, Steven Rostedt

On Thu 07-11-13 23:13:39, Frederic Weisbecker wrote:
> 2013/11/7 Jan Kara <jack@suse.cz>:
> > Provide new irq work flag - IRQ_WORK_UNBOUND - meaning that can be
> > processed on any cpu. This flag implies IRQ_WORK_LAZY so that things are
> > simple and we don't have to pick any particular cpu to do the work. We
> > just do the work from a timer tick on whichever cpu it happens first.
> > This is useful as a lightweight and simple code path without locking or
> > other dependencies to offload work to other cpu if possible.
> >
> > We will use this type of irq work to make a guarantee of forward
> > progress of printing to a (serial) console when printing on one cpu
> > would cause interrupts to be disabled for too long.
> >
> > Signed-off-by: Jan Kara <jack@suse.cz>
> > ---
> >  include/linux/irq_work.h |  2 ++
> >  kernel/irq_work.c        | 41 +++++++++++++++++++++++++----------------
> >  2 files changed, 27 insertions(+), 16 deletions(-)
> >
> > diff --git a/include/linux/irq_work.h b/include/linux/irq_work.h
> > index 66017028dcb3..ca07a16355ed 100644
> > --- a/include/linux/irq_work.h
> > +++ b/include/linux/irq_work.h
> > @@ -16,6 +16,8 @@
> >  #define IRQ_WORK_BUSY          2UL
> >  #define IRQ_WORK_FLAGS         3UL
> >  #define IRQ_WORK_LAZY          4UL /* Doesn't want IPI, wait for tick */
> > +#define __IRQ_WORK_UNBOUND      8UL /* Use IRQ_WORK_UNBOUND instead! */
> > +#define IRQ_WORK_UNBOUND       (__IRQ_WORK_UNBOUND | IRQ_WORK_LAZY) /* Any cpu can process this work */
> >
> >  struct irq_work {
> >         unsigned long flags;
> > diff --git a/kernel/irq_work.c b/kernel/irq_work.c
> > index 55fcce6065cf..b06350b63c67 100644
> > --- a/kernel/irq_work.c
> > +++ b/kernel/irq_work.c
> > @@ -22,6 +22,9 @@
> >  static DEFINE_PER_CPU(struct llist_head, irq_work_list);
> >  static DEFINE_PER_CPU(int, irq_work_raised);
> >
> > +/* List of irq-work any CPU can pick up */
> > +static LLIST_HEAD(unbound_irq_work_list);
> > +
> >  /*
> >   * Claim the entry so that no one else will poke at it.
> >   */
> > @@ -70,12 +73,16 @@ void irq_work_queue(struct irq_work *work)
> >         /* Queue the entry and raise the IPI if needed. */
> >         preempt_disable();
> >
> > -       llist_add(&work->llnode, &__get_cpu_var(irq_work_list));
> > +       if (work->flags & __IRQ_WORK_UNBOUND)
> > +               llist_add(&work->llnode, &unbound_irq_work_list);
> > +       else
> > +               llist_add(&work->llnode, &__get_cpu_var(irq_work_list));
> >
> >         /*
> >          * If the work is not "lazy" or the tick is stopped, raise the irq
> >          * work interrupt (if supported by the arch), otherwise, just wait
> > -        * for the next tick.
> > +        * for the next tick. We do this even for unbound work to make sure
> > +        * *some* CPU will be doing the work.
> >          */
> >         if (!(work->flags & IRQ_WORK_LAZY) || tick_nohz_tick_stopped()) {
> >                 if (!this_cpu_cmpxchg(irq_work_raised, 0, 1))
> > @@ -100,28 +107,17 @@ bool irq_work_needs_cpu(void)
> >         return true;
> >  }
> >
> > -static void __irq_work_run(void)
> > +static void process_irq_work_list(struct llist_head *llhead)
> >  {
> >         unsigned long flags;
> >         struct irq_work *work;
> > -       struct llist_head *this_list;
> >         struct llist_node *llnode;
> >
> > -
> > -       /*
> > -        * Reset the "raised" state right before we check the list because
> > -        * an NMI may enqueue after we find the list empty from the runner.
> > -        */
> > -       __this_cpu_write(irq_work_raised, 0);
> > -       barrier();
> > -
> > -       this_list = &__get_cpu_var(irq_work_list);
> > -       if (llist_empty(this_list))
> > +       if (llist_empty(llhead))
> >                 return;
> >
> >         BUG_ON(!irqs_disabled());
> > -
> > -       llnode = llist_del_all(this_list);
> > +       llnode = llist_del_all(llhead);
> >         while (llnode != NULL) {
> >                 work = llist_entry(llnode, struct irq_work, llnode);
> >
> > @@ -146,6 +142,19 @@ static void __irq_work_run(void)
> >         }
> >  }
> >
> > +static void __irq_work_run(void)
> > +{
> > +       /*
> > +        * Reset the "raised" state right before we check the list because
> > +        * an NMI may enqueue after we find the list empty from the runner.
> > +        */
> > +       __this_cpu_write(irq_work_raised, 0);
> > +       barrier();
> > +
> > +       process_irq_work_list(&__get_cpu_var(irq_work_list));
> > +       process_irq_work_list(&unbound_irq_work_list);
> > +}
> > +
> 
> But then, who's going to process that work if every CPUs is idle?
  Have a look into irq_work_queue(). There is:
        /*
         * If the work is not "lazy" or the tick is stopped, raise the irq
         * work interrupt (if supported by the arch), otherwise, just wait
         * for the next tick. We do this even for unbound work to make sure
         * *some* CPU will be doing the work.
         */
        if (!(work->flags & IRQ_WORK_LAZY) || tick_nohz_tick_stopped()) {
                if (!this_cpu_cmpxchg(irq_work_raised, 0, 1))
                        arch_irq_work_raise();
        }

  So we raise an interrupt if there would be no timer ticking (which is
what I suppose you mean by "CPU is idle"). That is nothing changed by my
patches...
								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/4] irq_work: Provide a irq work that can be processed on any cpu
  2013-11-07 22:19     ` Jan Kara
@ 2013-11-07 22:23       ` Frederic Weisbecker
  2013-11-07 22:50         ` Jan Kara
  2013-11-07 22:32       ` Frederic Weisbecker
  1 sibling, 1 reply; 33+ messages in thread
From: Frederic Weisbecker @ 2013-11-07 22:23 UTC (permalink / raw)
  To: Jan Kara; +Cc: Andrew Morton, LKML, Michal Hocko, Steven Rostedt

On Thu, Nov 07, 2013 at 11:19:04PM +0100, Jan Kara wrote:
> On Thu 07-11-13 23:13:39, Frederic Weisbecker wrote:
> > But then, who's going to process that work if every CPUs is idle?
>   Have a look into irq_work_queue(). There is:
>         /*
>          * If the work is not "lazy" or the tick is stopped, raise the irq
>          * work interrupt (if supported by the arch), otherwise, just wait
>          * for the next tick. We do this even for unbound work to make sure
>          * *some* CPU will be doing the work.
>          */
>         if (!(work->flags & IRQ_WORK_LAZY) || tick_nohz_tick_stopped()) {
>                 if (!this_cpu_cmpxchg(irq_work_raised, 0, 1))
>                         arch_irq_work_raise();
>         }
> 
>   So we raise an interrupt if there would be no timer ticking (which is
> what I suppose you mean by "CPU is idle"). That is nothing changed by my
> patches...

Ok but we raise that interrupt locally, not to the other CPUs.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/4] irq_work: Provide a irq work that can be processed on any cpu
  2013-11-07 22:19     ` Jan Kara
  2013-11-07 22:23       ` Frederic Weisbecker
@ 2013-11-07 22:32       ` Frederic Weisbecker
  1 sibling, 0 replies; 33+ messages in thread
From: Frederic Weisbecker @ 2013-11-07 22:32 UTC (permalink / raw)
  To: Jan Kara; +Cc: Andrew Morton, LKML, Michal Hocko, Steven Rostedt

On Thu, Nov 07, 2013 at 11:19:04PM +0100, Jan Kara wrote:
> On Thu 07-11-13 23:13:39, Frederic Weisbecker wrote:
> > But then, who's going to process that work if every CPUs is idle?
>   Have a look into irq_work_queue(). There is:
>         /*
>          * If the work is not "lazy" or the tick is stopped, raise the irq
>          * work interrupt (if supported by the arch), otherwise, just wait
>          * for the next tick. We do this even for unbound work to make sure
>          * *some* CPU will be doing the work.
>          */
>         if (!(work->flags & IRQ_WORK_LAZY) || tick_nohz_tick_stopped()) {
>                 if (!this_cpu_cmpxchg(irq_work_raised, 0, 1))
>                         arch_irq_work_raise();
>         }
> 
>   So we raise an interrupt if there would be no timer ticking (which is
> what I suppose you mean by "CPU is idle"). That is nothing changed by my
> patches...

That said I agree that it would be nice to have smp_call_function_many() support
non waiting calls, something based on llist, that would be less deadlock prone
to begin with.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 3/4] printk: Defer printing to irq work when we printed too much
  2013-11-07 21:48 ` [PATCH 3/4] printk: Defer printing to irq work when we printed too much Jan Kara
@ 2013-11-07 22:43   ` Frederic Weisbecker
  2013-11-07 22:57     ` Jan Kara
  2013-11-07 22:59     ` Steven Rostedt
  0 siblings, 2 replies; 33+ messages in thread
From: Frederic Weisbecker @ 2013-11-07 22:43 UTC (permalink / raw)
  To: Jan Kara; +Cc: Andrew Morton, LKML, Michal Hocko, Steven Rostedt

2013/11/7 Jan Kara <jack@suse.cz>:
> A CPU can be caught in console_unlock() for a long time (tens of seconds
> are reported by our customers) when other CPUs are using printk heavily
> and serial console makes printing slow. Despite serial console drivers
> are calling touch_nmi_watchdog() this triggers softlockup warnings
> because interrupts are disabled for the whole time console_unlock() runs
> (e.g. vprintk() calls console_unlock() with interrupts disabled). Thus
> IPIs cannot be processed and other CPUs get stuck spinning in calls like
> smp_call_function_many(). Also RCU eventually starts reporting lockups.
>
> In my artifical testing I can also easily trigger a situation when disk
> disappears from the system apparently because interrupt from it wasn't
> served for too long. This is why just silencing watchdogs isn't a
> reliable solution to the problem and we simply have to avoid spending
> too long in console_unlock() with interrupts disabled.
>
> The solution this patch works toward is to postpone printing to a later
> moment / different CPU when we already printed over X characters in
> current console_unlock() invocation. This is a crude heuristic but
> measuring time we spent printing doesn't seem to be really viable - we
> cannot rely on high resolution time being available and with interrupts
> disabled jiffies are not updated. User can tune the value X via
> printk.offload_chars kernel parameter.
>
> Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
> Signed-off-by: Jan Kara <jack@suse.cz>

When a message takes tens of seconds to be printed, it usually means
we are in trouble somehow :)
I wonder what printk source can trigger such a high volume.

May be cutting some huge message into smaller chunks could help? That
would re enable interrupts between each call.

It's hard to tell without the context, but using other CPUs for
rescuing doesn't look like a good solution. What if the issue happens
in UP to begin with?

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/4] irq_work: Provide a irq work that can be processed on any cpu
  2013-11-07 22:23       ` Frederic Weisbecker
@ 2013-11-07 22:50         ` Jan Kara
  2013-11-07 22:54           ` Frederic Weisbecker
  0 siblings, 1 reply; 33+ messages in thread
From: Jan Kara @ 2013-11-07 22:50 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Jan Kara, Andrew Morton, LKML, Michal Hocko, Steven Rostedt

On Thu 07-11-13 23:23:14, Frederic Weisbecker wrote:
> On Thu, Nov 07, 2013 at 11:19:04PM +0100, Jan Kara wrote:
> > On Thu 07-11-13 23:13:39, Frederic Weisbecker wrote:
> > > But then, who's going to process that work if every CPUs is idle?
> >   Have a look into irq_work_queue(). There is:
> >         /*
> >          * If the work is not "lazy" or the tick is stopped, raise the irq
> >          * work interrupt (if supported by the arch), otherwise, just wait
> >          * for the next tick. We do this even for unbound work to make sure
> >          * *some* CPU will be doing the work.
> >          */
> >         if (!(work->flags & IRQ_WORK_LAZY) || tick_nohz_tick_stopped()) {
> >                 if (!this_cpu_cmpxchg(irq_work_raised, 0, 1))
> >                         arch_irq_work_raise();
> >         }
> > 
> >   So we raise an interrupt if there would be no timer ticking (which is
> > what I suppose you mean by "CPU is idle"). That is nothing changed by my
> > patches...
> 
> Ok but we raise that interrupt locally, not to the other CPUs.
  True, but that doesn't really matter in this case. Any CPU (including the
local one) can handle the unbound work. So from the definition of the
unbound work things are OK.

Regarding my use for printk - if all (other) CPUs are idle then we can
easily afford making the current cpu busy printing, that's not a problem.
There's nothing else to do than to print what's remaining in the printk
buffer...

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/4] irq_work: Provide a irq work that can be processed on any cpu
  2013-11-07 22:50         ` Jan Kara
@ 2013-11-07 22:54           ` Frederic Weisbecker
  2013-11-07 23:01             ` Jan Kara
  0 siblings, 1 reply; 33+ messages in thread
From: Frederic Weisbecker @ 2013-11-07 22:54 UTC (permalink / raw)
  To: Jan Kara; +Cc: Andrew Morton, LKML, Michal Hocko, Steven Rostedt

On Thu, Nov 07, 2013 at 11:50:34PM +0100, Jan Kara wrote:
> On Thu 07-11-13 23:23:14, Frederic Weisbecker wrote:
> > On Thu, Nov 07, 2013 at 11:19:04PM +0100, Jan Kara wrote:
> > > On Thu 07-11-13 23:13:39, Frederic Weisbecker wrote:
> > > > But then, who's going to process that work if every CPUs is idle?
> > >   Have a look into irq_work_queue(). There is:
> > >         /*
> > >          * If the work is not "lazy" or the tick is stopped, raise the irq
> > >          * work interrupt (if supported by the arch), otherwise, just wait
> > >          * for the next tick. We do this even for unbound work to make sure
> > >          * *some* CPU will be doing the work.
> > >          */
> > >         if (!(work->flags & IRQ_WORK_LAZY) || tick_nohz_tick_stopped()) {
> > >                 if (!this_cpu_cmpxchg(irq_work_raised, 0, 1))
> > >                         arch_irq_work_raise();
> > >         }
> > > 
> > >   So we raise an interrupt if there would be no timer ticking (which is
> > > what I suppose you mean by "CPU is idle"). That is nothing changed by my
> > > patches...
> > 
> > Ok but we raise that interrupt locally, not to the other CPUs.
>   True, but that doesn't really matter in this case. Any CPU (including the
> local one) can handle the unbound work. So from the definition of the
> unbound work things are OK.

I don't see how that can be ok. You want to offline a work because the local CPU
can't handle it, right? If the local CPU can handle it you can just use local
irq works.

> 
> Regarding my use for printk - if all (other) CPUs are idle then we can
> easily afford making the current cpu busy printing, that's not a problem.
> There's nothing else to do than to print what's remaining in the printk
> buffer...

So if the current CPU can handle it, what is the problem?

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 3/4] printk: Defer printing to irq work when we printed too much
  2013-11-07 22:43   ` Frederic Weisbecker
@ 2013-11-07 22:57     ` Jan Kara
  2013-11-07 23:21       ` Frederic Weisbecker
  2013-11-11 21:54       ` Pavel Machek
  2013-11-07 22:59     ` Steven Rostedt
  1 sibling, 2 replies; 33+ messages in thread
From: Jan Kara @ 2013-11-07 22:57 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Jan Kara, Andrew Morton, LKML, Michal Hocko, Steven Rostedt

On Thu 07-11-13 23:43:52, Frederic Weisbecker wrote:
> 2013/11/7 Jan Kara <jack@suse.cz>:
> > A CPU can be caught in console_unlock() for a long time (tens of seconds
> > are reported by our customers) when other CPUs are using printk heavily
> > and serial console makes printing slow. Despite serial console drivers
> > are calling touch_nmi_watchdog() this triggers softlockup warnings
> > because interrupts are disabled for the whole time console_unlock() runs
> > (e.g. vprintk() calls console_unlock() with interrupts disabled). Thus
> > IPIs cannot be processed and other CPUs get stuck spinning in calls like
> > smp_call_function_many(). Also RCU eventually starts reporting lockups.
> >
> > In my artifical testing I can also easily trigger a situation when disk
> > disappears from the system apparently because interrupt from it wasn't
> > served for too long. This is why just silencing watchdogs isn't a
> > reliable solution to the problem and we simply have to avoid spending
> > too long in console_unlock() with interrupts disabled.
> >
> > The solution this patch works toward is to postpone printing to a later
> > moment / different CPU when we already printed over X characters in
> > current console_unlock() invocation. This is a crude heuristic but
> > measuring time we spent printing doesn't seem to be really viable - we
> > cannot rely on high resolution time being available and with interrupts
> > disabled jiffies are not updated. User can tune the value X via
> > printk.offload_chars kernel parameter.
> >
> > Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
> > Signed-off-by: Jan Kara <jack@suse.cz>
> 
> When a message takes tens of seconds to be printed, it usually means
> we are in trouble somehow :)
> I wonder what printk source can trigger such a high volume.
  Machines with tens of processors and thousands of scsi devices. When
device discovery happens on boot, all processors are busily reporting new
scsi devices and one poor looser is bound to do the printing for ever and
ever until the machine dies...

Or try running sysrq-t on a large machine with serial console connected. The
machine will die because of lockups (although in this case I agree it is more
of a problem of sysrq-t doing lots of printing in interrupt-disabled
context).

> May be cutting some huge message into smaller chunks could help? That
> would re enable interrupts between each call.
> 
> It's hard to tell without the context, but using other CPUs for
> rescuing doesn't look like a good solution. What if the issue happens
> in UP to begin with?
  The real trouble in practice is that while one cpu is doing printing,
other cpus are appending to the printk buffer. So the cpu can be printing
for a *long* time. So offloading the work to other cpus which are also
appending messages seems as a fair thing to do.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 3/4] printk: Defer printing to irq work when we printed too much
  2013-11-07 22:43   ` Frederic Weisbecker
  2013-11-07 22:57     ` Jan Kara
@ 2013-11-07 22:59     ` Steven Rostedt
  1 sibling, 0 replies; 33+ messages in thread
From: Steven Rostedt @ 2013-11-07 22:59 UTC (permalink / raw)
  To: Frederic Weisbecker; +Cc: Jan Kara, Andrew Morton, LKML, Michal Hocko

On Thu, 7 Nov 2013 23:43:52 +0100
Frederic Weisbecker <fweisbec@gmail.com> wrote:


> When a message takes tens of seconds to be printed, it usually means
> we are in trouble somehow :)
> I wonder what printk source can trigger such a high volume.

The only ones that I'm aware of is the prints from sysrq. Which
includes a ftrace_dump as well. These have caused issues in the past,
but usually I do them on boxes that I reboot afterward anyway.

-- Steve

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/4] irq_work: Provide a irq work that can be processed on any cpu
  2013-11-07 22:54           ` Frederic Weisbecker
@ 2013-11-07 23:01             ` Jan Kara
  2013-11-07 23:31               ` Steven Rostedt
  0 siblings, 1 reply; 33+ messages in thread
From: Jan Kara @ 2013-11-07 23:01 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Jan Kara, Andrew Morton, LKML, Michal Hocko, Steven Rostedt

On Thu 07-11-13 23:54:10, Frederic Weisbecker wrote:
> On Thu, Nov 07, 2013 at 11:50:34PM +0100, Jan Kara wrote:
> > On Thu 07-11-13 23:23:14, Frederic Weisbecker wrote:
> > > On Thu, Nov 07, 2013 at 11:19:04PM +0100, Jan Kara wrote:
> > > > On Thu 07-11-13 23:13:39, Frederic Weisbecker wrote:
> > > > > But then, who's going to process that work if every CPUs is idle?
> > > >   Have a look into irq_work_queue(). There is:
> > > >         /*
> > > >          * If the work is not "lazy" or the tick is stopped, raise the irq
> > > >          * work interrupt (if supported by the arch), otherwise, just wait
> > > >          * for the next tick. We do this even for unbound work to make sure
> > > >          * *some* CPU will be doing the work.
> > > >          */
> > > >         if (!(work->flags & IRQ_WORK_LAZY) || tick_nohz_tick_stopped()) {
> > > >                 if (!this_cpu_cmpxchg(irq_work_raised, 0, 1))
> > > >                         arch_irq_work_raise();
> > > >         }
> > > > 
> > > >   So we raise an interrupt if there would be no timer ticking (which is
> > > > what I suppose you mean by "CPU is idle"). That is nothing changed by my
> > > > patches...
> > > 
> > > Ok but we raise that interrupt locally, not to the other CPUs.
> >   True, but that doesn't really matter in this case. Any CPU (including the
> > local one) can handle the unbound work. So from the definition of the
> > unbound work things are OK.
> 
> I don't see how that can be ok. You want to offline a work because the
> local CPU can't handle it, right? If the local CPU can handle it you can
> just use local irq works.
> 
> > Regarding my use for printk - if all (other) CPUs are idle then we can
> > easily afford making the current cpu busy printing, that's not a problem.
> > There's nothing else to do than to print what's remaining in the printk
> > buffer...
> 
> So if the current CPU can handle it, what is the problem?
  I hope this gets cleared out in my other email. But to make sure: If
other CPUs are idle (i.e. not appending to the printk buffer), we can well
handle the printing on the current CPU (with some breaks to allow
interrupts to be served etc.). If other CPUs are also appending to printk
buffer, that's where we really want to push the work to other CPUs.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 3/4] printk: Defer printing to irq work when we printed too much
  2013-11-07 22:57     ` Jan Kara
@ 2013-11-07 23:21       ` Frederic Weisbecker
  2013-11-07 23:37         ` Steven Rostedt
  2013-11-11 21:54       ` Pavel Machek
  1 sibling, 1 reply; 33+ messages in thread
From: Frederic Weisbecker @ 2013-11-07 23:21 UTC (permalink / raw)
  To: Jan Kara; +Cc: Andrew Morton, LKML, Michal Hocko, Steven Rostedt

On Thu, Nov 07, 2013 at 11:57:33PM +0100, Jan Kara wrote:
> On Thu 07-11-13 23:43:52, Frederic Weisbecker wrote:
> > 2013/11/7 Jan Kara <jack@suse.cz>:
> > > A CPU can be caught in console_unlock() for a long time (tens of seconds
> > > are reported by our customers) when other CPUs are using printk heavily
> > > and serial console makes printing slow. Despite serial console drivers
> > > are calling touch_nmi_watchdog() this triggers softlockup warnings
> > > because interrupts are disabled for the whole time console_unlock() runs
> > > (e.g. vprintk() calls console_unlock() with interrupts disabled). Thus
> > > IPIs cannot be processed and other CPUs get stuck spinning in calls like
> > > smp_call_function_many(). Also RCU eventually starts reporting lockups.
> > >
> > > In my artifical testing I can also easily trigger a situation when disk
> > > disappears from the system apparently because interrupt from it wasn't
> > > served for too long. This is why just silencing watchdogs isn't a
> > > reliable solution to the problem and we simply have to avoid spending
> > > too long in console_unlock() with interrupts disabled.
> > >
> > > The solution this patch works toward is to postpone printing to a later
> > > moment / different CPU when we already printed over X characters in
> > > current console_unlock() invocation. This is a crude heuristic but
> > > measuring time we spent printing doesn't seem to be really viable - we
> > > cannot rely on high resolution time being available and with interrupts
> > > disabled jiffies are not updated. User can tune the value X via
> > > printk.offload_chars kernel parameter.
> > >
> > > Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
> > > Signed-off-by: Jan Kara <jack@suse.cz>
> > 
> > When a message takes tens of seconds to be printed, it usually means
> > we are in trouble somehow :)
> > I wonder what printk source can trigger such a high volume.
>   Machines with tens of processors and thousands of scsi devices. When
> device discovery happens on boot, all processors are busily reporting new
> scsi devices and one poor looser is bound to do the printing for ever and
> ever until the machine dies...
> 
> Or try running sysrq-t on a large machine with serial console connected. The
> machine will die because of lockups (although in this case I agree it is more
> of a problem of sysrq-t doing lots of printing in interrupt-disabled
> context).
> 
> > May be cutting some huge message into smaller chunks could help? That
> > would re enable interrupts between each call.
> > 
> > It's hard to tell without the context, but using other CPUs for
> > rescuing doesn't look like a good solution. What if the issue happens
> > in UP to begin with?
>   The real trouble in practice is that while one cpu is doing printing,
> other cpus are appending to the printk buffer. So the cpu can be printing
> for a *long* time. So offloading the work to other cpus which are also
> appending messages seems as a fair thing to do.

Ok I see now.

But then this irq_work based solution won't work if, say, you run in full dynticks
mode. Also the hook on the timer interrupt is something that I wish we get rid
of on archs that can trigger self-IPIs.

Notwithstanding it's going to have scalibility issues as irq work then converges
to a single list for unbound works.

Offloading to a workqueue would be perhaps better, and writing to the serial
console could then be done with interrupts enabled, preemptible context, etc...


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/4] irq_work: Provide a irq work that can be processed on any cpu
  2013-11-07 23:01             ` Jan Kara
@ 2013-11-07 23:31               ` Steven Rostedt
  2013-11-08 10:18                 ` Jan Kara
  0 siblings, 1 reply; 33+ messages in thread
From: Steven Rostedt @ 2013-11-07 23:31 UTC (permalink / raw)
  To: Jan Kara; +Cc: Frederic Weisbecker, Andrew Morton, LKML, Michal Hocko

On Fri, 8 Nov 2013 00:01:11 +0100
Jan Kara <jack@suse.cz> wrote:

> On Thu 07-11-13 23:54:10, Frederic Weisbecker wrote:

> > So if the current CPU can handle it, what is the problem?
>   I hope this gets cleared out in my other email. But to make sure: If
> other CPUs are idle (i.e. not appending to the printk buffer), we can well
> handle the printing on the current CPU (with some breaks to allow
> interrupts to be served etc.). If other CPUs are also appending to printk
> buffer, that's where we really want to push the work to other CPUs.

I guess the question is, how does it migrate? I guess that's not so
clear. Or do you just hope that the timer tick on another CPU will come
in first and finish the job for you? As the list is global and all CPUs
get to see it.

-- Steve

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 3/4] printk: Defer printing to irq work when we printed too much
  2013-11-07 23:21       ` Frederic Weisbecker
@ 2013-11-07 23:37         ` Steven Rostedt
  2013-11-07 23:44           ` Frederic Weisbecker
  2013-11-07 23:46           ` Frederic Weisbecker
  0 siblings, 2 replies; 33+ messages in thread
From: Steven Rostedt @ 2013-11-07 23:37 UTC (permalink / raw)
  To: Frederic Weisbecker; +Cc: Jan Kara, Andrew Morton, LKML, Michal Hocko

On Fri, 8 Nov 2013 00:21:51 +0100
Frederic Weisbecker <fweisbec@gmail.com> wrote:
 
> Ok I see now.
> 
> But then this irq_work based solution won't work if, say, you run in full dynticks
> mode. Also the hook on the timer interrupt is something that I wish we get rid
> of on archs that can trigger self-IPIs.

Do we really want that? What about users that use LAZY? That is, the
work isn't that important to trigger right now (added interrupt
expense).

> 
> Notwithstanding it's going to have scalibility issues as irq work then converges
> to a single list for unbound works.

Well, it doesn't seem to be something that would be called often. All
CPUs checking a variable that is seldom changed should not have any
scalability issues. Unless of course it happens to share a cache line
with a variable that does change often.

> 
> Offloading to a workqueue would be perhaps better, and writing to the serial
> console could then be done with interrupts enabled, preemptible context, etc...

Oh God no ;-)  Adding workqueue logic into printk just spells a
nightmare of much more complexity for a critical kernel infrastructure.

-- Steve

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 3/4] printk: Defer printing to irq work when we printed too much
  2013-11-07 23:37         ` Steven Rostedt
@ 2013-11-07 23:44           ` Frederic Weisbecker
  2013-11-07 23:46           ` Frederic Weisbecker
  1 sibling, 0 replies; 33+ messages in thread
From: Frederic Weisbecker @ 2013-11-07 23:44 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Jan Kara, Andrew Morton, LKML, Michal Hocko

On Thu, Nov 07, 2013 at 06:37:17PM -0500, Steven Rostedt wrote:
> On Fri, 8 Nov 2013 00:21:51 +0100
> Frederic Weisbecker <fweisbec@gmail.com> wrote:
>  
> > Ok I see now.
> > 
> > But then this irq_work based solution won't work if, say, you run in full dynticks
> > mode. Also the hook on the timer interrupt is something that I wish we get rid
> > of on archs that can trigger self-IPIs.
> 
> Do we really want that? What about users that use LAZY? That is, the
> work isn't that important to trigger right now (added interrupt
> expense).

Ah right, I forgot that :)

> 
> > 
> > Notwithstanding it's going to have scalibility issues as irq work then converges
> > to a single list for unbound works.
> 
> Well, it doesn't seem to be something that would be called often. All
> CPUs checking a variable that is seldom changed should not have any
> scalability issues. Unless of course it happens to share a cache line
> with a variable that does change often.

Right, if it was printk specific code I wouldn't mind much in fact. But having that in
such a globally available API like irq work make me feel uncomfortable.

> 
> > 
> > Offloading to a workqueue would be perhaps better, and writing to the serial
> > console could then be done with interrupts enabled, preemptible context, etc...
> 
> Oh God no ;-)  Adding workqueue logic into printk just spells a
> nightmare of much more complexity for a critical kernel infrastructure.

True, in fact I was thinking about raising an irq work that enqueues a workqueue, circumvoluted right? ;)

But it would make it safe, as irq work can be enqueued everywhere, and as it seems that
we can have high bandwidth of stuffs to print to the serial device, a process context looks much saner to me.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 3/4] printk: Defer printing to irq work when we printed too much
  2013-11-07 23:37         ` Steven Rostedt
  2013-11-07 23:44           ` Frederic Weisbecker
@ 2013-11-07 23:46           ` Frederic Weisbecker
  2013-11-08 10:21             ` Jan Kara
  1 sibling, 1 reply; 33+ messages in thread
From: Frederic Weisbecker @ 2013-11-07 23:46 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Jan Kara, Andrew Morton, LKML, Michal Hocko

On Thu, Nov 07, 2013 at 06:37:17PM -0500, Steven Rostedt wrote:
> On Fri, 8 Nov 2013 00:21:51 +0100
> Frederic Weisbecker <fweisbec@gmail.com> wrote:
> > 
> > Offloading to a workqueue would be perhaps better, and writing to the serial
> > console could then be done with interrupts enabled, preemptible context, etc...
> 
> Oh God no ;-)  Adding workqueue logic into printk just spells a
> nightmare of much more complexity for a critical kernel infrastructure.

But yeah that's scary, that means workqueues itself can't printk that safely.
So, you're right after all.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/4] irq_work: Provide a irq work that can be processed on any cpu
  2013-11-07 23:31               ` Steven Rostedt
@ 2013-11-08 10:18                 ` Jan Kara
  0 siblings, 0 replies; 33+ messages in thread
From: Jan Kara @ 2013-11-08 10:18 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Jan Kara, Frederic Weisbecker, Andrew Morton, LKML, Michal Hocko

On Thu 07-11-13 18:31:42, Steven Rostedt wrote:
> On Fri, 8 Nov 2013 00:01:11 +0100
> Jan Kara <jack@suse.cz> wrote:
> 
> > On Thu 07-11-13 23:54:10, Frederic Weisbecker wrote:
> 
> > > So if the current CPU can handle it, what is the problem?
> >   I hope this gets cleared out in my other email. But to make sure: If
> > other CPUs are idle (i.e. not appending to the printk buffer), we can well
> > handle the printing on the current CPU (with some breaks to allow
> > interrupts to be served etc.). If other CPUs are also appending to printk
> > buffer, that's where we really want to push the work to other CPUs.
> 
> I guess the question is, how does it migrate? I guess that's not so
> clear. Or do you just hope that the timer tick on another CPU will come
> in first and finish the job for you? As the list is global and all CPUs
> get to see it.
  There are two possibilities how the migration can happen:
1) a tick happens first on some other cpu and thus it will pick up the work
2) a printk happens first on some other cpu and thus it will pick printing
   in its console_unlock() call.

  I agree in both cases we depend on statistics, there's no real guarantee
the migration happens. But in my testing this seems to be enough.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 3/4] printk: Defer printing to irq work when we printed too much
  2013-11-07 23:46           ` Frederic Weisbecker
@ 2013-11-08 10:21             ` Jan Kara
  2013-11-22 23:27               ` Andrew Morton
  0 siblings, 1 reply; 33+ messages in thread
From: Jan Kara @ 2013-11-08 10:21 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Steven Rostedt, Jan Kara, Andrew Morton, LKML, Michal Hocko

On Fri 08-11-13 00:46:49, Frederic Weisbecker wrote:
> On Thu, Nov 07, 2013 at 06:37:17PM -0500, Steven Rostedt wrote:
> > On Fri, 8 Nov 2013 00:21:51 +0100
> > Frederic Weisbecker <fweisbec@gmail.com> wrote:
> > > 
> > > Offloading to a workqueue would be perhaps better, and writing to the serial
> > > console could then be done with interrupts enabled, preemptible context, etc...
> > 
> > Oh God no ;-)  Adding workqueue logic into printk just spells a
> > nightmare of much more complexity for a critical kernel infrastructure.
> 
> But yeah that's scary, that means workqueues itself can't printk that safely.
> So, you're right after all.
  Yeah, we've been there (that was actually my initial proposal). But
Andrew and Steven (rightfully) objected and suggested irq_work should be
used instead.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 3/4] printk: Defer printing to irq work when we printed too much
  2013-11-07 22:57     ` Jan Kara
  2013-11-07 23:21       ` Frederic Weisbecker
@ 2013-11-11 21:54       ` Pavel Machek
  2013-11-11 22:17         ` Jan Kara
  1 sibling, 1 reply; 33+ messages in thread
From: Pavel Machek @ 2013-11-11 21:54 UTC (permalink / raw)
  To: Jan Kara
  Cc: Frederic Weisbecker, Andrew Morton, LKML, Michal Hocko, Steven Rostedt

Hi!

> > > A CPU can be caught in console_unlock() for a long time (tens of seconds
> > > are reported by our customers) when other CPUs are using printk heavily
> > > and serial console makes printing slow. Despite serial console drivers
> > > are calling touch_nmi_watchdog() this triggers softlockup warnings
> > > because interrupts are disabled for the whole time console_unlock() runs
> > > (e.g. vprintk() calls console_unlock() with interrupts disabled). Thus
> > > IPIs cannot be processed and other CPUs get stuck spinning in calls like
> > > smp_call_function_many(). Also RCU eventually starts reporting lockups.
> > >
> > > In my artifical testing I can also easily trigger a situation when disk
> > > disappears from the system apparently because interrupt from it wasn't
> > > served for too long. This is why just silencing watchdogs isn't a
> > > reliable solution to the problem and we simply have to avoid spending
> > > too long in console_unlock() with interrupts disabled.
> > >
> > > The solution this patch works toward is to postpone printing to a later
> > > moment / different CPU when we already printed over X characters in
> > > current console_unlock() invocation. This is a crude heuristic but
> > > measuring time we spent printing doesn't seem to be really viable - we
> > > cannot rely on high resolution time being available and with interrupts
> > > disabled jiffies are not updated. User can tune the value X via
> > > printk.offload_chars kernel parameter.
> > >
> > > Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
> > > Signed-off-by: Jan Kara <jack@suse.cz>
> > 
> > When a message takes tens of seconds to be printed, it usually means
> > we are in trouble somehow :)
> > I wonder what printk source can trigger such a high volume.
>   Machines with tens of processors and thousands of scsi devices. When
> device discovery happens on boot, all processors are busily reporting new
> scsi devices and one poor looser is bound to do the printing for ever and
> ever until the machine dies...

Dunno. In these cases, would it make sense to:

1) reduce amount of text printed

2) just print [XXX characters lost] on overruns?

									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 3/4] printk: Defer printing to irq work when we printed too much
  2013-11-11 21:54       ` Pavel Machek
@ 2013-11-11 22:17         ` Jan Kara
  2013-11-16 11:35           ` Pavel Machek
  0 siblings, 1 reply; 33+ messages in thread
From: Jan Kara @ 2013-11-11 22:17 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Jan Kara, Frederic Weisbecker, Andrew Morton, LKML, Michal Hocko,
	Steven Rostedt

On Mon 11-11-13 22:54:15, Pavel Machek wrote:
> Hi!
> 
> > > > A CPU can be caught in console_unlock() for a long time (tens of seconds
> > > > are reported by our customers) when other CPUs are using printk heavily
> > > > and serial console makes printing slow. Despite serial console drivers
> > > > are calling touch_nmi_watchdog() this triggers softlockup warnings
> > > > because interrupts are disabled for the whole time console_unlock() runs
> > > > (e.g. vprintk() calls console_unlock() with interrupts disabled). Thus
> > > > IPIs cannot be processed and other CPUs get stuck spinning in calls like
> > > > smp_call_function_many(). Also RCU eventually starts reporting lockups.
> > > >
> > > > In my artifical testing I can also easily trigger a situation when disk
> > > > disappears from the system apparently because interrupt from it wasn't
> > > > served for too long. This is why just silencing watchdogs isn't a
> > > > reliable solution to the problem and we simply have to avoid spending
> > > > too long in console_unlock() with interrupts disabled.
> > > >
> > > > The solution this patch works toward is to postpone printing to a later
> > > > moment / different CPU when we already printed over X characters in
> > > > current console_unlock() invocation. This is a crude heuristic but
> > > > measuring time we spent printing doesn't seem to be really viable - we
> > > > cannot rely on high resolution time being available and with interrupts
> > > > disabled jiffies are not updated. User can tune the value X via
> > > > printk.offload_chars kernel parameter.
> > > >
> > > > Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
> > > > Signed-off-by: Jan Kara <jack@suse.cz>
> > > 
> > > When a message takes tens of seconds to be printed, it usually means
> > > we are in trouble somehow :)
> > > I wonder what printk source can trigger such a high volume.
> >   Machines with tens of processors and thousands of scsi devices. When
> > device discovery happens on boot, all processors are busily reporting new
> > scsi devices and one poor looser is bound to do the printing for ever and
> > ever until the machine dies...
> 
> Dunno. In these cases, would it make sense to:
> 
> 1) reduce amount of text printed
  I thought about this as well. But
a) It doesn't seem practical as you would have to modify lots of drivers
   and keep them rather silent. That seems rather fragile. Plus you will
   not display some potentially useful information.
b) It doesn't address the real underlying problem that the way printk() is
   currently implemented, there is no bound on the time CPU spends in the
   loop printing from buffer to console. And the fact that this loop
   sometimes happens with interrupts disabled makes the situation even
   worse.
 
> 2) just print [XXX characters lost] on overruns?
  We don't overrun the printk buffer so no characters are lost. It just
takes too long to feed the whole printk buffer through serial console...

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 3/4] printk: Defer printing to irq work when we printed too much
  2013-11-11 22:17         ` Jan Kara
@ 2013-11-16 11:35           ` Pavel Machek
  0 siblings, 0 replies; 33+ messages in thread
From: Pavel Machek @ 2013-11-16 11:35 UTC (permalink / raw)
  To: Jan Kara
  Cc: Frederic Weisbecker, Andrew Morton, LKML, Michal Hocko, Steven Rostedt

Hi!

> > > > > Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
> > > > > Signed-off-by: Jan Kara <jack@suse.cz>
> > > > 
> > > > When a message takes tens of seconds to be printed, it usually means
> > > > we are in trouble somehow :)
> > > > I wonder what printk source can trigger such a high volume.
> > >   Machines with tens of processors and thousands of scsi devices. When
> > > device discovery happens on boot, all processors are busily reporting new
> > > scsi devices and one poor looser is bound to do the printing for ever and
> > > ever until the machine dies...
> > 
> > Dunno. In these cases, would it make sense to:
> > 
> > 1) reduce amount of text printed
>   I thought about this as well. But
> a) It doesn't seem practical as you would have to modify lots of drivers
>    and keep them rather silent. That seems rather fragile. Plus you will
>    not display some potentially useful information.
> b) It doesn't address the real underlying problem that the way printk() is
>    currently implemented, there is no bound on the time CPU spends in the
>    loop printing from buffer to console. And the fact that this loop
>    sometimes happens with interrupts disabled makes the situation even
>    worse.
>  
> > 2) just print [XXX characters lost] on overruns?
>   We don't overrun the printk buffer so no characters are lost. It just
> takes too long to feed the whole printk buffer through serial console...

Yes, I know. No characters are lost, but we spend seconds with
interrupts disabled, breaking the system. (SCSI timeouts? We do
keyboard autorepeat in software these days...)

Would it be better to just drop the characters?
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 3/4] printk: Defer printing to irq work when we printed too much
  2013-11-08 10:21             ` Jan Kara
@ 2013-11-22 23:27               ` Andrew Morton
  2013-11-25 12:08                 ` Jan Kara
  0 siblings, 1 reply; 33+ messages in thread
From: Andrew Morton @ 2013-11-22 23:27 UTC (permalink / raw)
  To: Jan Kara; +Cc: Frederic Weisbecker, Steven Rostedt, LKML, Michal Hocko

On Fri, 8 Nov 2013 11:21:13 +0100 Jan Kara <jack@suse.cz> wrote:

> On Fri 08-11-13 00:46:49, Frederic Weisbecker wrote:
> > On Thu, Nov 07, 2013 at 06:37:17PM -0500, Steven Rostedt wrote:
> > > On Fri, 8 Nov 2013 00:21:51 +0100
> > > Frederic Weisbecker <fweisbec@gmail.com> wrote:
> > > > 
> > > > Offloading to a workqueue would be perhaps better, and writing to the serial
> > > > console could then be done with interrupts enabled, preemptible context, etc...
> > > 
> > > Oh God no ;-)  Adding workqueue logic into printk just spells a
> > > nightmare of much more complexity for a critical kernel infrastructure.
> > 
> > But yeah that's scary, that means workqueues itself can't printk that safely.
> > So, you're right after all.
>   Yeah, we've been there (that was actually my initial proposal). But
> Andrew and Steven (rightfully) objected and suggested irq_work should be
> used instead.

I still hate the patchset and so does everyone else, including you ;)
There must be something smarter we can do.  Let's start by restating
the problem:

CPU A is in printk, emitting log_buf characters to a slow device. 
Meanwhile other CPUs come into printk(), see that the system is busy,
dump their load into log_buf then scram, leaving CPU A to do even more
work.

Correct so far?

If so, what is the role of local_irq_disabled() in this?  Did CPU A
call printk() with local interrupts disabled, or is printk (or the
console driver) causing the irqs-off condition?  Where and why is this
IRQ disablement happening?


Could we fix this problem by not permitting CPUs B, C and D to DoS CPU
A?  When CPU B comes into printk() and sees that printk is busy, make
CPU A hand over to CPU B and let CPU A get out of there?



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 3/4] printk: Defer printing to irq work when we printed too much
  2013-11-22 23:27               ` Andrew Morton
@ 2013-11-25 12:08                 ` Jan Kara
  0 siblings, 0 replies; 33+ messages in thread
From: Jan Kara @ 2013-11-25 12:08 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Jan Kara, Frederic Weisbecker, Steven Rostedt, LKML, Michal Hocko

On Fri 22-11-13 15:27:11, Andrew Morton wrote:
> On Fri, 8 Nov 2013 11:21:13 +0100 Jan Kara <jack@suse.cz> wrote:
> 
> > On Fri 08-11-13 00:46:49, Frederic Weisbecker wrote:
> > > On Thu, Nov 07, 2013 at 06:37:17PM -0500, Steven Rostedt wrote:
> > > > On Fri, 8 Nov 2013 00:21:51 +0100
> > > > Frederic Weisbecker <fweisbec@gmail.com> wrote:
> > > > > 
> > > > > Offloading to a workqueue would be perhaps better, and writing to the serial
> > > > > console could then be done with interrupts enabled, preemptible context, etc...
> > > > 
> > > > Oh God no ;-)  Adding workqueue logic into printk just spells a
> > > > nightmare of much more complexity for a critical kernel infrastructure.
> > > 
> > > But yeah that's scary, that means workqueues itself can't printk that safely.
> > > So, you're right after all.
> >   Yeah, we've been there (that was actually my initial proposal). But
> > Andrew and Steven (rightfully) objected and suggested irq_work should be
> > used instead.
> 
> I still hate the patchset and so does everyone else, including you ;)
> There must be something smarter we can do.  Let's start by restating
> the problem:
> 
> CPU A is in printk, emitting log_buf characters to a slow device. 
> Meanwhile other CPUs come into printk(), see that the system is busy,
> dump their load into log_buf then scram, leaving CPU A to do even more
> work.
> 
> Correct so far?
  Yes, correct.

> If so, what is the role of local_irq_disabled() in this?  Did CPU A
> call printk() with local interrupts disabled, or is printk (or the
> console driver) causing the irqs-off condition?  Where and why is this
> IRQ disablement happening?
  So there are couple of places where we disable interrupts.
a) call_console_drivers() which does the printing to console is always
   called with interrupts disabled. Commonly it is called from
   console_unlock() which takes care of disabling interrupts. I presume
   this is because we want to guard against interrupts doing something
   unexpected with the console while we are printing to it. But I don't
   really understand console drivers to be sure...
b) vprintk_emit() (which is the function usually calling console_unlock())
   also disables interrupts to make updates of log_buf interrupt safe. It
   calls console_unlock() with interrupts disabled which seems to be
   unnecessary as that function takes care of disabling interrupts itself.
   It makes the situation somewhat worse because console_unlock() could
   otherwise enable interrupts from time to time. That being said I've
   tried to fix this shortcoming in previous versions of the patch set but
   it didn't seem to make a difference - maybe
     local_irq_restore(flags);
     spin_lock_irqsafe(&logbuf_lock, flags);
   which is what console_unlock() does, doesn't give APIC enough time to
   deliver blocked interrupts.
c) printk() itself is sometimes called with interrupts disabled. This
   happens a lot for example from sysrq handlers which is sometimes
   unpleasant (sysrq-s simply kills large machines) but not a primary
   concern for me. It doesn't seem to happen too often after an early boot
   is finished (in particular SCSI messages which make machines unbootable
   seem to be generated from kernel thread context). But there are some
   messages like this and if we are unlucky and we get caught in such
   printk, the machine dies. So I believe we have to reliably handle a
   situation when printk() itself gets called with interrupts disabled.
  
> Could we fix this problem by not permitting CPUs B, C and D to DoS CPU
> A?  When CPU B comes into printk() and sees that printk is busy, make
> CPU A hand over to CPU B and let CPU A get out of there?
  We could. In fact I was proposing this in
https://lkml.org/lkml/2013/9/5/329
  It has the advantage that we won't rely on irq work. If we changed
console_trylock() to console_lock() in console_trylock_for_printk() and
made console_unlock() only print the messages in log_buf on function entry,
it would even make things simpler but it would basically undo your change
from ages ago and I'm not sure about consequences. All printk()s could
suddently block much more since printk() would essentially become
completely synchronous.

We could try some more fancy compromise between current "completely async
printk" and ancient "completely synchronous printk" but then it gets more
complex and so far dependence on irq work seemed as a lesser evil to me.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 3/4] printk: Defer printing to irq work when we printed too much
  2013-08-21  8:08 ` [PATCH 3/4] printk: Defer printing to irq work when we printed too much Jan Kara
@ 2013-08-21 19:06   ` Steven Rostedt
  0 siblings, 0 replies; 33+ messages in thread
From: Steven Rostedt @ 2013-08-21 19:06 UTC (permalink / raw)
  To: Jan Kara; +Cc: Andrew Morton, LKML, mhocko, hare

On Wed, 21 Aug 2013 10:08:31 +0200
Jan Kara <jack@suse.cz> wrote:

> A CPU can be caught in console_unlock() for a long time (tens of seconds
> are reported by our customers) when other CPUs are using printk heavily
> and serial console makes printing slow. Despite serial console drivers
> are calling touch_nmi_watchdog() this triggers softlockup warnings
> because interrupts are disabled for the whole time console_unlock() runs
> (e.g. vprintk() calls console_unlock() with interrupts disabled). Thus
> IPIs cannot be processed and other CPUs get stuck spinning in calls like
> smp_call_function_many(). Also RCU eventually starts reporting lockups.
> 
> In my artifical testing I can also easily trigger a situation when disk
> disappears from the system apparently because commands to / from it
> could not be delivered for long enough. This is why just silencing
> watchdogs isn't a reliable solution to the problem and we simply have to
> avoid spending too long in console_unlock() with interrupts disabled.
> 
> The solution this patch works toward is to postpone printing to a later
> moment / different CPU when we already printed over 1000 characters in
> current console_unlock() invocation. This is a crude heuristic but
> measuring time we spent printing doesn't seem to be really viable - we
> cannot rely on high resolution time being available and with interrupts
> disabled jiffies are not updated. The value 1000 was chosen so that
> things are still bearable with 9600 baud serial console and OTOH it
> shouldn't cause offloading of printing in common cases.
> 

BTW, you can also test by running:

 # cd /sys/kernel/debug/tracing
 # echo function > current_tracer
 # sleep 1
 # echo z > /proc/sysrq-trigger


> Signed-off-by: Jan Kara <jack@suse.cz>

Reviewed-by: Steven Rostedt <rostedt@goodmis.org>

-- Steve

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 3/4] printk: Defer printing to irq work when we printed too much
  2013-08-21  8:08 [PATCH 0/4 v6] Avoid softlockups in console_unlock() Jan Kara
@ 2013-08-21  8:08 ` Jan Kara
  2013-08-21 19:06   ` Steven Rostedt
  0 siblings, 1 reply; 33+ messages in thread
From: Jan Kara @ 2013-08-21  8:08 UTC (permalink / raw)
  To: Andrew Morton; +Cc: LKML, mhocko, hare, Steven Rostedt, Jan Kara

A CPU can be caught in console_unlock() for a long time (tens of seconds
are reported by our customers) when other CPUs are using printk heavily
and serial console makes printing slow. Despite serial console drivers
are calling touch_nmi_watchdog() this triggers softlockup warnings
because interrupts are disabled for the whole time console_unlock() runs
(e.g. vprintk() calls console_unlock() with interrupts disabled). Thus
IPIs cannot be processed and other CPUs get stuck spinning in calls like
smp_call_function_many(). Also RCU eventually starts reporting lockups.

In my artifical testing I can also easily trigger a situation when disk
disappears from the system apparently because commands to / from it
could not be delivered for long enough. This is why just silencing
watchdogs isn't a reliable solution to the problem and we simply have to
avoid spending too long in console_unlock() with interrupts disabled.

The solution this patch works toward is to postpone printing to a later
moment / different CPU when we already printed over 1000 characters in
current console_unlock() invocation. This is a crude heuristic but
measuring time we spent printing doesn't seem to be really viable - we
cannot rely on high resolution time being available and with interrupts
disabled jiffies are not updated. The value 1000 was chosen so that
things are still bearable with 9600 baud serial console and OTOH it
shouldn't cause offloading of printing in common cases.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 Documentation/sysctl/kernel.txt | 19 +++++++++++++
 include/linux/printk.h          |  1 +
 kernel/printk/printk.c          | 62 ++++++++++++++++++++++++++++++++++-------
 kernel/sysctl.c                 |  8 ++++++
 4 files changed, 80 insertions(+), 10 deletions(-)

diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt
index ab7d16e..0967bcc 100644
--- a/Documentation/sysctl/kernel.txt
+++ b/Documentation/sysctl/kernel.txt
@@ -57,6 +57,7 @@ show up in /proc/sys/kernel:
 - printk_delay
 - printk_ratelimit
 - printk_ratelimit_burst
+- printk_offload_chars
 - randomize_va_space
 - real-root-dev               ==> Documentation/initrd.txt
 - reboot-cmd                  [ SPARC only ]
@@ -524,6 +525,24 @@ send before ratelimiting kicks in.
 
 ==============================================================
 
+printk_offload_chars:
+
+Printing to console can be relatively slow especially in case of
+serial console. When there is intensive printing happening from
+several cpus (as is the case during boot), a cpu can be spending
+significant time (seconds or more) doing printing. To avoid
+softlockups, lost interrupts, and similar problems a cpu stops
+printing to console after printing 'printk_offload_chars' characters.
+Another cpu will then continue printing. Higher value means possibly
+longer interrupt and other latencies but lower latency of printing
+and lower chance something goes wrong during system crash and
+important message is not printed.
+
+By default we offload printing from a cpu after printing 1000
+characters. Setting printk_offload_chars to 0 disables offloading.
+
+==============================================================
+
 randomize_va_space:
 
 This option can be used to select the type of process address
diff --git a/include/linux/printk.h b/include/linux/printk.h
index 22c7052..135d958 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -141,6 +141,7 @@ extern bool printk_timed_ratelimit(unsigned long *caller_jiffies,
 extern int printk_delay_msec;
 extern int dmesg_restrict;
 extern int kptr_restrict;
+extern int printk_offload_chars;
 
 extern void wake_up_klogd(void);
 
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 9208e17..e97971b 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -253,6 +253,12 @@ static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN);
 static char *log_buf = __log_buf;
 static u32 log_buf_len = __LOG_BUF_LEN;
 
+/*
+ * How much characters can we print in one call of printk before offloading
+ * printing work. Tunable via /proc/sys/kernel/printk_offload_chars.
+ */
+int __read_mostly printk_offload_chars = 1000;
+
 /* cpu currently holding logbuf_lock */
 static volatile unsigned int logbuf_cpu = UINT_MAX;
 
@@ -2026,31 +2032,43 @@ out:
 	raw_spin_unlock_irqrestore(&logbuf_lock, flags);
 }
 
+/* Should we stop printing on this cpu? */
+static bool cpu_stop_printing(int printed_chars)
+{
+	/* Oops? Print everything now to maximize chances user will see it */
+	if (oops_in_progress)
+		return false;
+	return printk_offload_chars && printed_chars > printk_offload_chars;
+}
+
 /**
- * console_unlock - unlock the console system
+ * __console_unlock - unlock the console system
  *
  * Releases the console_lock which the caller holds on the console system
  * and the console driver list.
  *
- * While the console_lock was held, console output may have been buffered
- * by printk().  If this is the case, console_unlock(); emits
- * the output prior to releasing the lock.
+ * While the console_lock was held, console output may have been buffered by
+ * printk(). If this is the case, __console_unlock() emits the output prior to
+ * releasing the lock. However we need not write all the data in the buffer if
+ * we would hog the CPU for too long. Function returns true, if there's more
+ * data that needs printing in the buffer.
  *
  * If there is output waiting, we wake /dev/kmsg and syslog() users.
  *
- * console_unlock(); may be called from any context.
+ * __console_unlock(); may be called from any context.
  */
-void console_unlock(void)
+static bool __console_unlock(void)
 {
 	static char text[LOG_LINE_MAX + PREFIX_MAX];
 	static u64 seen_seq;
 	unsigned long flags;
 	bool wake_klogd = false;
 	bool retry;
+	int printed_chars = 0;
 
 	if (console_suspended) {
 		up(&console_sem);
-		return;
+		return false;
 	}
 
 	console_may_schedule = 0;
@@ -2078,6 +2096,8 @@ again:
 skip:
 		if (console_seq == log_next_seq)
 			break;
+		if (cpu_stop_printing(printed_chars))
+			break;
 
 		msg = log_from_idx(console_idx);
 		if (msg->flags & LOG_NOCONS) {
@@ -2108,6 +2128,7 @@ skip:
 		stop_critical_timings();	/* don't trace print latency */
 		call_console_drivers(level, text, len);
 		start_critical_timings();
+		printed_chars += len;
 		local_irq_restore(flags);
 	}
 	console_locked = 0;
@@ -2131,13 +2152,20 @@ skip:
 	retry = console_seq != log_next_seq;
 	raw_spin_unlock_irqrestore(&logbuf_lock, flags);
 
-	if (retry && console_trylock())
-		goto again;
+	if (retry && !cpu_stop_printing(printed_chars)) {
+		if (console_trylock())
+			goto again;
+		/*
+		 * Someone else is printing so the caller doesn't have to
+		 * schedule irq work
+		 */
+		retry = false;
+	}
 
 	if (wake_klogd)
 		wake_up_klogd();
+	return retry;
 }
-EXPORT_SYMBOL(console_unlock);
 
 /**
  * console_conditional_schedule - yield the CPU if required
@@ -2508,6 +2536,20 @@ int printk_sched(const char *fmt, ...)
 }
 
 /*
+ * This is a wrapper for __console_unlock() that makes sure the rest of buffer
+ * is printed in future.
+ */
+void console_unlock(void)
+{
+       if (__console_unlock()) {
+               /* Leave the rest of printing for a timer tick */
+		__this_cpu_or(printk_pending, PRINTK_PENDING_OUTPUT);
+		irq_work_queue(&__get_cpu_var(wake_up_klogd_work));
+	}
+}
+EXPORT_SYMBOL(console_unlock);
+
+/*
  * printk rate limiting, lifted from the networking subsystem.
  *
  * This enforces a rate limit: not more than 10 kernel messages
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 07f6fc4..002788f 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -772,6 +772,14 @@ static struct ctl_table kern_table[] = {
 		.extra2		= &ten_thousand,
 	},
 	{
+		.procname	= "printk_offload_chars",
+		.data		= &printk_offload_chars,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec_minmax,
+		.extra1		= &zero,
+	},
+	{
 		.procname	= "dmesg_restrict",
 		.data		= &dmesg_restrict,
 		.maxlen		= sizeof(int),
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH 3/4] printk: Defer printing to irq work when we printed too much
  2013-08-15  7:52     ` Jan Kara
@ 2013-08-15 13:26       ` Steven Rostedt
  0 siblings, 0 replies; 33+ messages in thread
From: Steven Rostedt @ 2013-08-15 13:26 UTC (permalink / raw)
  To: Jan Kara; +Cc: LKML, Andrew Morton, mhocko

On Thu, 15 Aug 2013 09:52:51 +0200
Jan Kara <jack@suse.cz> wrote:

 
> > > +/*
> > > + * How much characters can we print in one call of printk before offloading
> > > + * printing work
> > > + */
> > > +#define MAX_PRINTK_CHARS 1000
> > 
> > This really needs to be configurable. 1000 may be good for you, but may
> > not be good for others. Heck, we should be able to disable this
> > offloading too.
> > 
> > This should probably be added to the printk /proc/sys/kernel/... files.
> > 
> > If it is set to zero, then disable any offloading.
>   OK, I can easily do this. I'm somewhat skeptical people would want to
> tune this but maybe they would.
> 

Most people wouldn't. But as this is a new heuristic, if things are not
behaving as people expect, then it would be good to be able to modify
that heuristic to see if that was the issue. Including, disabling it.

-- Steve

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 3/4] printk: Defer printing to irq work when we printed too much
  2013-08-15  1:38   ` Steven Rostedt
@ 2013-08-15  7:52     ` Jan Kara
  2013-08-15 13:26       ` Steven Rostedt
  0 siblings, 1 reply; 33+ messages in thread
From: Jan Kara @ 2013-08-15  7:52 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Jan Kara, LKML, Andrew Morton, mhocko

On Wed 14-08-13 21:38:18, Steven Rostedt wrote:
> On Wed, 14 Aug 2013 15:28:27 +0200
> Jan Kara <jack@suse.cz> wrote:
> 
> > A CPU can be caught in console_unlock() for a long time (tens of seconds
> > are reported by our customers) when other CPUs are using printk heavily
> > and serial console makes printing slow. Despite serial console drivers
> > are calling touch_nmi_watchdog() this triggers softlockup warnings
> > because interrupts are disabled for the whole time console_unlock() runs
> > (e.g. vprintk() calls console_unlock() with interrupts disabled). Thus
> > IPIs cannot be processed and other CPUs get stuck spinning in calls like
> > smp_call_function_many(). Also RCU eventually starts reporting lockups.
> > 
> > In my artifical testing I can also easily trigger a situation when disk
> > disappears from the system apparently because commands to / from it
> > could not be delivered for long enough. This is why just silencing
> > watchdogs isn't a reliable solution to the problem and we simply have to
> > avoid spending too long in console_unlock() with interrupts disabled.
> > 
> > The solution this patch works toward is to postpone printing to a later
> > moment / different CPU when we already printed over 1000 characters in
> > current console_unlock() invocation. This is a crude heuristic but
> > measuring time we spent printing doesn't seem to be really viable - we
> > cannot rely on high resolution time being available and with interrupts
> > disabled jiffies are not updated. The value 1000 was chosen so that
> > things are still bearable with 9600 baud serial console and OTOH it
> > shouldn't cause offloading of printing in common cases.
> > 
> > Signed-off-by: Jan Kara <jack@suse.cz>
> > ---
> >  kernel/printk/printk.c | 62 ++++++++++++++++++++++++++++++++++++++++++--------
> >  1 file changed, 52 insertions(+), 10 deletions(-)
> > 
> > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> > index 9208e17..28c220a 100644
> > --- a/kernel/printk/printk.c
> > +++ b/kernel/printk/printk.c
> > @@ -2026,31 +2026,49 @@ out:
> >  	raw_spin_unlock_irqrestore(&logbuf_lock, flags);
> >  }
> >  
> > +/*
> > + * How much characters can we print in one call of printk before offloading
> > + * printing work
> > + */
> > +#define MAX_PRINTK_CHARS 1000
> 
> This really needs to be configurable. 1000 may be good for you, but may
> not be good for others. Heck, we should be able to disable this
> offloading too.
> 
> This should probably be added to the printk /proc/sys/kernel/... files.
> 
> If it is set to zero, then disable any offloading.
  OK, I can easily do this. I'm somewhat skeptical people would want to
tune this but maybe they would.

								Honza

> > +
> > +/* Should we stop printing on this cpu? */
> > +static bool cpu_stop_printing(int printed_chars)
> > +{
> > +	/* Oops? Print everything now to maximize chances user will see it */
> > +	if (oops_in_progress)
> > +		return false;
> > +	return printed_chars > MAX_PRINTK_CHARS;
> > +}
> > +
> >  /**
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 3/4] printk: Defer printing to irq work when we printed too much
  2013-08-14 13:28 ` [PATCH 3/4] printk: Defer printing to irq work when we printed too much Jan Kara
@ 2013-08-15  1:38   ` Steven Rostedt
  2013-08-15  7:52     ` Jan Kara
  0 siblings, 1 reply; 33+ messages in thread
From: Steven Rostedt @ 2013-08-15  1:38 UTC (permalink / raw)
  To: Jan Kara; +Cc: LKML, Andrew Morton, mhocko

On Wed, 14 Aug 2013 15:28:27 +0200
Jan Kara <jack@suse.cz> wrote:

> A CPU can be caught in console_unlock() for a long time (tens of seconds
> are reported by our customers) when other CPUs are using printk heavily
> and serial console makes printing slow. Despite serial console drivers
> are calling touch_nmi_watchdog() this triggers softlockup warnings
> because interrupts are disabled for the whole time console_unlock() runs
> (e.g. vprintk() calls console_unlock() with interrupts disabled). Thus
> IPIs cannot be processed and other CPUs get stuck spinning in calls like
> smp_call_function_many(). Also RCU eventually starts reporting lockups.
> 
> In my artifical testing I can also easily trigger a situation when disk
> disappears from the system apparently because commands to / from it
> could not be delivered for long enough. This is why just silencing
> watchdogs isn't a reliable solution to the problem and we simply have to
> avoid spending too long in console_unlock() with interrupts disabled.
> 
> The solution this patch works toward is to postpone printing to a later
> moment / different CPU when we already printed over 1000 characters in
> current console_unlock() invocation. This is a crude heuristic but
> measuring time we spent printing doesn't seem to be really viable - we
> cannot rely on high resolution time being available and with interrupts
> disabled jiffies are not updated. The value 1000 was chosen so that
> things are still bearable with 9600 baud serial console and OTOH it
> shouldn't cause offloading of printing in common cases.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  kernel/printk/printk.c | 62 ++++++++++++++++++++++++++++++++++++++++++--------
>  1 file changed, 52 insertions(+), 10 deletions(-)
> 
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 9208e17..28c220a 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -2026,31 +2026,49 @@ out:
>  	raw_spin_unlock_irqrestore(&logbuf_lock, flags);
>  }
>  
> +/*
> + * How much characters can we print in one call of printk before offloading
> + * printing work
> + */
> +#define MAX_PRINTK_CHARS 1000

This really needs to be configurable. 1000 may be good for you, but may
not be good for others. Heck, we should be able to disable this
offloading too.

This should probably be added to the printk /proc/sys/kernel/... files.

If it is set to zero, then disable any offloading.

-- Steve


> +
> +/* Should we stop printing on this cpu? */
> +static bool cpu_stop_printing(int printed_chars)
> +{
> +	/* Oops? Print everything now to maximize chances user will see it */
> +	if (oops_in_progress)
> +		return false;
> +	return printed_chars > MAX_PRINTK_CHARS;
> +}
> +
>  /**

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 3/4] printk: Defer printing to irq work when we printed too much
  2013-08-14 13:28 [PATCH 0/4 v5] Avoid softlockups in console_unlock() Jan Kara
@ 2013-08-14 13:28 ` Jan Kara
  2013-08-15  1:38   ` Steven Rostedt
  0 siblings, 1 reply; 33+ messages in thread
From: Jan Kara @ 2013-08-14 13:28 UTC (permalink / raw)
  To: LKML; +Cc: Andrew Morton, Steven Rostedt, mhocko, Jan Kara

A CPU can be caught in console_unlock() for a long time (tens of seconds
are reported by our customers) when other CPUs are using printk heavily
and serial console makes printing slow. Despite serial console drivers
are calling touch_nmi_watchdog() this triggers softlockup warnings
because interrupts are disabled for the whole time console_unlock() runs
(e.g. vprintk() calls console_unlock() with interrupts disabled). Thus
IPIs cannot be processed and other CPUs get stuck spinning in calls like
smp_call_function_many(). Also RCU eventually starts reporting lockups.

In my artifical testing I can also easily trigger a situation when disk
disappears from the system apparently because commands to / from it
could not be delivered for long enough. This is why just silencing
watchdogs isn't a reliable solution to the problem and we simply have to
avoid spending too long in console_unlock() with interrupts disabled.

The solution this patch works toward is to postpone printing to a later
moment / different CPU when we already printed over 1000 characters in
current console_unlock() invocation. This is a crude heuristic but
measuring time we spent printing doesn't seem to be really viable - we
cannot rely on high resolution time being available and with interrupts
disabled jiffies are not updated. The value 1000 was chosen so that
things are still bearable with 9600 baud serial console and OTOH it
shouldn't cause offloading of printing in common cases.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 kernel/printk/printk.c | 62 ++++++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 52 insertions(+), 10 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 9208e17..28c220a 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2026,31 +2026,49 @@ out:
 	raw_spin_unlock_irqrestore(&logbuf_lock, flags);
 }
 
+/*
+ * How much characters can we print in one call of printk before offloading
+ * printing work
+ */
+#define MAX_PRINTK_CHARS 1000
+
+/* Should we stop printing on this cpu? */
+static bool cpu_stop_printing(int printed_chars)
+{
+	/* Oops? Print everything now to maximize chances user will see it */
+	if (oops_in_progress)
+		return false;
+	return printed_chars > MAX_PRINTK_CHARS;
+}
+
 /**
- * console_unlock - unlock the console system
+ * __console_unlock - unlock the console system
  *
  * Releases the console_lock which the caller holds on the console system
  * and the console driver list.
  *
- * While the console_lock was held, console output may have been buffered
- * by printk().  If this is the case, console_unlock(); emits
- * the output prior to releasing the lock.
+ * While the console_lock was held, console output may have been buffered by
+ * printk(). If this is the case, __console_unlock() emits the output prior to
+ * releasing the lock. However we need not write all the data in the buffer if
+ * we would hog the CPU for too long. Function returns true, if there's more
+ * data that needs printing in the buffer.
  *
  * If there is output waiting, we wake /dev/kmsg and syslog() users.
  *
- * console_unlock(); may be called from any context.
+ * __console_unlock(); may be called from any context.
  */
-void console_unlock(void)
+static bool __console_unlock(void)
 {
 	static char text[LOG_LINE_MAX + PREFIX_MAX];
 	static u64 seen_seq;
 	unsigned long flags;
 	bool wake_klogd = false;
 	bool retry;
+	int printed_chars = 0;
 
 	if (console_suspended) {
 		up(&console_sem);
-		return;
+		return false;
 	}
 
 	console_may_schedule = 0;
@@ -2078,6 +2096,8 @@ again:
 skip:
 		if (console_seq == log_next_seq)
 			break;
+		if (cpu_stop_printing(printed_chars))
+			break;
 
 		msg = log_from_idx(console_idx);
 		if (msg->flags & LOG_NOCONS) {
@@ -2108,6 +2128,7 @@ skip:
 		stop_critical_timings();	/* don't trace print latency */
 		call_console_drivers(level, text, len);
 		start_critical_timings();
+		printed_chars += len;
 		local_irq_restore(flags);
 	}
 	console_locked = 0;
@@ -2131,13 +2152,20 @@ skip:
 	retry = console_seq != log_next_seq;
 	raw_spin_unlock_irqrestore(&logbuf_lock, flags);
 
-	if (retry && console_trylock())
-		goto again;
+	if (retry && !cpu_stop_printing(printed_chars)) {
+		if (console_trylock())
+			goto again;
+		/*
+		 * Someone else is printing so the caller doesn't have to
+		 * schedule irq work
+		 */
+		retry = false;
+	}
 
 	if (wake_klogd)
 		wake_up_klogd();
+	return retry;
 }
-EXPORT_SYMBOL(console_unlock);
 
 /**
  * console_conditional_schedule - yield the CPU if required
@@ -2508,6 +2536,20 @@ int printk_sched(const char *fmt, ...)
 }
 
 /*
+ * This is a wrapper for __console_unlock() that makes sure the rest of buffer
+ * is printed in future.
+ */
+void console_unlock(void)
+{
+       if (__console_unlock()) {
+               /* Leave the rest of printing for a timer tick */
+		__this_cpu_or(printk_pending, PRINTK_PENDING_OUTPUT);
+		irq_work_queue(&__get_cpu_var(wake_up_klogd_work));
+	}
+}
+EXPORT_SYMBOL(console_unlock);
+
+/*
  * printk rate limiting, lifted from the networking subsystem.
  *
  * This enforces a rate limit: not more than 10 kernel messages
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2013-11-25 12:08 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-11-07 21:48 [PATCH 0/4 v6] Avoid softlockups in console_unlock() Jan Kara
2013-11-07 21:48 ` [PATCH 1/4] printk: Remove separate printk_sched buffers and use printk buf instead Jan Kara
2013-11-07 21:48 ` [PATCH 2/4] irq_work: Provide a irq work that can be processed on any cpu Jan Kara
2013-11-07 22:13   ` Frederic Weisbecker
2013-11-07 22:19     ` Jan Kara
2013-11-07 22:23       ` Frederic Weisbecker
2013-11-07 22:50         ` Jan Kara
2013-11-07 22:54           ` Frederic Weisbecker
2013-11-07 23:01             ` Jan Kara
2013-11-07 23:31               ` Steven Rostedt
2013-11-08 10:18                 ` Jan Kara
2013-11-07 22:32       ` Frederic Weisbecker
2013-11-07 21:48 ` [PATCH 3/4] printk: Defer printing to irq work when we printed too much Jan Kara
2013-11-07 22:43   ` Frederic Weisbecker
2013-11-07 22:57     ` Jan Kara
2013-11-07 23:21       ` Frederic Weisbecker
2013-11-07 23:37         ` Steven Rostedt
2013-11-07 23:44           ` Frederic Weisbecker
2013-11-07 23:46           ` Frederic Weisbecker
2013-11-08 10:21             ` Jan Kara
2013-11-22 23:27               ` Andrew Morton
2013-11-25 12:08                 ` Jan Kara
2013-11-11 21:54       ` Pavel Machek
2013-11-11 22:17         ` Jan Kara
2013-11-16 11:35           ` Pavel Machek
2013-11-07 22:59     ` Steven Rostedt
2013-11-07 21:48 ` [PATCH 4/4] printk: Use unbound irq work for printing and waking Jan Kara
  -- strict thread matches above, loose matches on Subject: below --
2013-08-21  8:08 [PATCH 0/4 v6] Avoid softlockups in console_unlock() Jan Kara
2013-08-21  8:08 ` [PATCH 3/4] printk: Defer printing to irq work when we printed too much Jan Kara
2013-08-21 19:06   ` Steven Rostedt
2013-08-14 13:28 [PATCH 0/4 v5] Avoid softlockups in console_unlock() Jan Kara
2013-08-14 13:28 ` [PATCH 3/4] printk: Defer printing to irq work when we printed too much Jan Kara
2013-08-15  1:38   ` Steven Rostedt
2013-08-15  7:52     ` Jan Kara
2013-08-15 13:26       ` Steven Rostedt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).