All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/6 v2] printk: Softlockup avoidance
@ 2015-10-26  4:52 Jan Kara
  2015-10-26  4:52 ` [PATCH 1/7] printk: Hand over printing to console if printing too long Jan Kara
                   ` (5 more replies)
  0 siblings, 6 replies; 35+ messages in thread
From: Jan Kara @ 2015-10-26  4:52 UTC (permalink / raw)
  To: Andrew Morton; +Cc: LKML, pmladek, KY Srinivasan, rostedt, Jan Kara

From: Jan Kara <jack@suse.cz>

Hello,

here is another posting of the revived patch set to avoid lockups during heavy
printing. Lately there were several attempts at dealing with softlockups due to
heavy printk traffic [1] [2] and I've been also privately pinged by couple of
people about the state of the patch set, so I've decided to revive the patch
set. Patches (their older version) are present in SUSE enterprise kernels for
several years and we didn't observe any issues with them.

Patch set passes my stress testing where serial console is slowed down to print
~1000 chars per second and there are 100 delayed works printing together some
64k of text and in parallel modules are inserted which generates quite some
additional messages, stop_machine() calls etc.

Changes since v1:
* printing kthreads now check for kthread_should_stop()
* printing kthreads are now bound to CPUs so that scheduler cannot decide to
  schedule both kthreads on one CPU which effectively makes it impossible to
  hand over printing between them. This happened relatively frequently in
  virtual machines.
* use printk buffer draining code in panic() to force all messages out when
  the system is dying
* better naming of logbuf flushing functions suggested by AKPM
* fixed irq safety of printing lock as pointed out by AKMP
* fixed various smaller issues pointed by AKPM


Changes since the the old patch set [3]:
* I have replaced the state machine to pass printing and spinning on
  console_sem with a simple spinlock which makes the code
  somewhat easier to read and verify.
* Some of the patches were merged so I dropped them.

To remind the original problem:

Currently, console_unlock() prints messages from kernel printk buffer to
console while the buffer is non-empty. When serial console is attached,
printing is slow and thus other CPUs in the system have plenty of time
to append new messages to the buffer while one CPU is printing. Thus the
CPU can spend unbounded amount of time doing printing in console_unlock().
This is especially serious when printk() gets called under some critical
spinlock or with interrupts disabled.
    
In practice users have observed a CPU can spend tens of seconds printing
in console_unlock() (usually during boot when hundreds of SCSI devices
are discovered) resulting in RCU stalls (CPU doing printing doesn't
reach quiescent state for a long time), softlockup reports (IPIs for the
printing CPU don't get served and thus other CPUs are spinning waiting
for the printing CPU to process IPIs), and eventually a machine death
(as messages from stalls and lockups append to printk buffer faster than
we are able to print). So these machines are unable to boot with serial
console attached. Also during artificial stress testing SATA disk
disappears from the system because its interrupts aren't served for too
long.

This series addresses the problem in the following way: If CPU has printed
more that printk_offload (defaults to 1000) characters, it wakes up one
of dedicated printk kthreads (we don't use workqueue because that has
deadlock potential if printk was called from workqueue code). Once we find
out kthread is spinning on a lock, we stop printing, drop console_sem, and
let kthread continue printing. Since there are two printing kthreads, they
will pass printing between them and thus no CPU gets hogged by printing.

								Honza

[1] https://lkml.org/lkml/2015/7/8/215
[2] http://marc.info/?l=linux-kernel&m=143929238407816&w=2
[3] https://lkml.org/lkml/2014/3/17/68

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 1/7] printk: Hand over printing to console if printing too long
  2015-10-26  4:52 [PATCH 0/6 v2] printk: Softlockup avoidance Jan Kara
@ 2015-10-26  4:52 ` Jan Kara
  2016-03-01 17:22   ` Denys Vlasenko
  2015-10-26  4:52 ` [PATCH 2/7] printk: Start printing handover kthreads on demand Jan Kara
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 35+ messages in thread
From: Jan Kara @ 2015-10-26  4:52 UTC (permalink / raw)
  To: Andrew Morton; +Cc: LKML, pmladek, KY Srinivasan, rostedt, Jan Kara

From: Jan Kara <jack@suse.cz>

Currently, console_unlock() prints messages from kernel printk buffer to
console while the buffer is non-empty. When serial console is attached,
printing is slow and thus other CPUs in the system have plenty of time
to append new messages to the buffer while one CPU is printing. Thus the
CPU can spend unbounded amount of time doing printing in console_unlock().
This is especially serious problem if the printk() calling
console_unlock() was called with interrupts disabled.

In practice users have observed a CPU can spend tens of seconds printing
in console_unlock() (usually during boot when hundreds of SCSI devices
are discovered) resulting in RCU stalls (CPU doing printing doesn't
reach quiescent state for a long time), softlockup reports (IPIs for the
printing CPU don't get served and thus other CPUs are spinning waiting
for the printing CPU to process IPIs), and eventually a machine death
(as messages from stalls and lockups append to printk buffer faster than
we are able to print). So these machines are unable to boot with serial
console attached. Also during artificial stress testing SATA disk
disappears from the system because its interrupts aren't served for too
long.

This patch implements a mechanism where after printing specified number
of characters (tunable as a kernel parameter printk.offload_chars), CPU
doing printing asks for help by waking up one of dedicated kthreads.  As
soon as the printing CPU notices kthread got scheduled and is spinning
on print_lock dedicated for that purpose, it drops console_sem,
print_lock, and exits console_unlock(). Kthread then takes over printing
instead. This way no CPU should spend printing too long even if there
is heavy printk traffic.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 Documentation/kernel-parameters.txt |  15 ++++
 kernel/printk/printk.c              | 173 ++++++++++++++++++++++++++++++++----
 2 files changed, 171 insertions(+), 17 deletions(-)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 22a4b687ea5b..df8adee975ba 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -2958,6 +2958,21 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
 			Format: <bool>  (1/Y/y=enable, 0/N/n=disable)
 			default: disabled
 
+	printk.offload_chars=
+			Printing to console can be relatively slow especially
+			in case of serial console. When there is intensive
+			printing happening from several cpus (as is the case
+			during boot), a cpu can be spending significant time
+			(seconds or more) doing printing. To avoid softlockups,
+			lost interrupts, and similar problems other cpus
+			will take over printing after the currently printing
+			cpu has printed 'printk.offload_chars' characters.
+			Higher value means possibly longer interrupt and other
+			latencies but lower overhead of printing due to handing
+			over of printing.
+			Format: <number> (0 = disabled)
+			default: 1000
+
 	printk.time=	Show timing data prefixed to each printk message line
 			Format: <bool>  (1/Y/y=enable, 0/N/n=disable)
 
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 8f0324ef72ab..1b26263edfa7 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -46,6 +46,7 @@
 #include <linux/utsname.h>
 #include <linux/ctype.h>
 #include <linux/uio.h>
+#include <linux/kthread.h>
 
 #include <asm/uaccess.h>
 
@@ -78,6 +79,29 @@ static DEFINE_SEMAPHORE(console_sem);
 struct console *console_drivers;
 EXPORT_SYMBOL_GPL(console_drivers);
 
+/*
+ * This spinlock is taken when printing to console. It is used only so that
+ * we can spin on it when some other thread wants to take over printing to
+ * console.
+ */
+static DEFINE_SPINLOCK(print_lock);
+
+/*
+ * Number of printing threads spinning on print_lock. Can go away once
+ * spin_is_contended() is reliable.
+ */
+static atomic_t printing_tasks_spinning = ATOMIC_INIT(0);
+
+/*
+ * Number of kernel threads for offloading printing. We need at least two so
+ * that they can hand over printing from one to another one and thus switch
+ * CPUs.
+ */
+#define PRINTING_TASKS 2
+
+/* Wait queue printing kthreads sleep on when idle */
+static DECLARE_WAIT_QUEUE_HEAD(print_queue);
+
 #ifdef CONFIG_LOCKDEP
 static struct lockdep_map console_lock_dep_map = {
 	.name = "console_lock"
@@ -279,6 +303,18 @@ static u32 clear_idx;
 static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN);
 static char *log_buf = __log_buf;
 static u32 log_buf_len = __LOG_BUF_LEN;
+/*
+ * How many characters can we print in one call of printk before asking
+ * other cpus to continue printing. 0 means infinity. Tunable via
+ * printk.offload_chars kernel parameter. Our default 1000 means about
+ * 0.1s maximum latency due to printing.
+ */
+static unsigned int __read_mostly printk_offload_chars = 1000;
+
+module_param_named(offload_chars, printk_offload_chars, uint,
+		   S_IRUGO | S_IWUSR);
+MODULE_PARM_DESC(offload_chars, "offload printing to console to a different"
+	" cpu after this number of characters");
 
 /* Return log buffer address */
 char *log_buf_addr_get(void)
@@ -2208,15 +2244,41 @@ out:
 	raw_spin_unlock_irqrestore(&logbuf_lock, flags);
 }
 
+/*
+ * Returns true iff there is other cpu waiting to take over printing. This
+ * function also takes are of setting PRINTK_HANDOVER_B if we want to hand over
+ * printing to some other cpu.
+ */
+static bool cpu_stop_printing(int printed_chars)
+{
+	/* Oops? Print everything now to maximize chances user will see it */
+	if (oops_in_progress)
+		return false;
+	if (!printk_offload_chars || printed_chars < printk_offload_chars)
+		return false;
+	/*
+	 * Make sure we load fresh value of printing_tasks_spinning. Matches
+	 * the barrier in printing_task()
+	 */
+	smp_rmb();
+	if (atomic_read(&printing_tasks_spinning))
+		return true;
+	wake_up(&print_queue);
+
+	return false;
+}
+
 /**
  * console_unlock - unlock the console system
  *
  * Releases the console_lock which the caller holds on the console system
  * and the console driver list.
  *
- * While the console_lock was held, console output may have been buffered
- * by printk().  If this is the case, console_unlock(); emits
- * the output prior to releasing the lock.
+ * While the console_lock was held, console output may have been buffered by
+ * printk(). If this is the case, console_unlock() emits the output prior to
+ * releasing the lock. However we need not write all the data in the buffer if
+ * we would hog the CPU for too long. In such case we try to hand over printing
+ * to a different cpu.
  *
  * If there is output waiting, we wake /dev/kmsg and syslog() users.
  *
@@ -2230,6 +2292,8 @@ void console_unlock(void)
 	unsigned long flags;
 	bool wake_klogd = false;
 	bool retry;
+	bool hand_over = false;
+	int printed_chars = 0;
 
 	if (console_suspended) {
 		up_console_sem();
@@ -2241,13 +2305,20 @@ void console_unlock(void)
 	/* flush buffered message fragment immediately to console */
 	console_cont_flush(text, sizeof(text));
 again:
+	retry = false;
+	spin_lock_irqsave(&print_lock, flags);
 	for (;;) {
 		struct printk_log *msg;
 		size_t ext_len = 0;
 		size_t len;
 		int level;
 
-		raw_spin_lock_irqsave(&logbuf_lock, flags);
+		if (cpu_stop_printing(printed_chars)) {
+			hand_over = true;
+			break;
+		}
+
+		raw_spin_lock(&logbuf_lock);
 		if (seen_seq != log_next_seq) {
 			wake_klogd = true;
 			seen_seq = log_next_seq;
@@ -2265,8 +2336,10 @@ again:
 			len = 0;
 		}
 skip:
-		if (console_seq == log_next_seq)
+		if (console_seq == log_next_seq) {
+			raw_spin_unlock(&logbuf_lock);
 			break;
+		}
 
 		msg = log_from_idx(console_idx);
 		if (msg->flags & LOG_NOCONS) {
@@ -2306,28 +2379,38 @@ skip:
 		stop_critical_timings();	/* don't trace print latency */
 		call_console_drivers(level, ext_text, ext_len, text, len);
 		start_critical_timings();
-		local_irq_restore(flags);
+		printed_chars += len;
 	}
-	console_locked = 0;
 
 	/* Release the exclusive_console once it is used */
 	if (unlikely(exclusive_console))
 		exclusive_console = NULL;
 
-	raw_spin_unlock(&logbuf_lock);
-
+	console_locked = 0;
 	up_console_sem();
 
+	if (!hand_over) {
+		/*
+		 * Someone could have filled up the buffer again, so re-check
+		 * if there's something to flush. We perform the check under
+		 * print_lock to save one cli / sti pair in the fast path.
+		 */
+		raw_spin_lock(&logbuf_lock);
+		retry = console_seq != log_next_seq;
+		raw_spin_unlock(&logbuf_lock);
+	}
+
 	/*
-	 * Someone could have filled up the buffer again, so re-check if there's
-	 * something to flush. In case we cannot trylock the console_sem again,
-	 * there's a new owner and the console_unlock() from them will do the
-	 * flush, no worries.
-	 */
-	raw_spin_lock(&logbuf_lock);
-	retry = console_seq != log_next_seq;
-	raw_spin_unlock_irqrestore(&logbuf_lock, flags);
+	 * Release print_lock after console_sem so that printing_task()
+	 * succeeds in getting console_sem (unless someone else takes it and
+	 * then he'll be responsible for printing).
+         */
+	spin_unlock_irqrestore(&print_lock, flags);
 
+	/*
+	 * In case we cannot trylock the console_sem again, there's a new owner
+	 * and the console_unlock() from them will do the flush, no worries.
+	 */
 	if (retry && console_trylock())
 		goto again;
 
@@ -2654,9 +2737,52 @@ int unregister_console(struct console *console)
 }
 EXPORT_SYMBOL(unregister_console);
 
+/* Kthread which takes over printing from a CPU which asks for help */
+static int printing_task(void *arg)
+{
+	unsigned long flags;
+	DEFINE_WAIT(wait);
+
+	while (1) {
+		prepare_to_wait_exclusive(&print_queue, &wait,
+					  TASK_INTERRUPTIBLE);
+		schedule();
+		finish_wait(&print_queue, &wait);
+		if (kthread_should_stop())
+			break;
+		/*
+		 * Disable preemption so that we do not get scheduled away from
+		 * the CPU after we get print_lock and before we are finished
+		 * with printing.
+		 */
+		preempt_disable();
+		atomic_inc(&printing_tasks_spinning);
+		/*
+		 * Store printing_tasks_spinning value before we spin. Matches
+		 * the barrier in cpu_stop_printing().
+		 */
+		smp_mb__after_atomic();
+		/*
+		 * Wait for currently printing thread to complete. We spin on
+		 * print_lock instead of waiting on console_sem since we don't
+		 * want to sleep once we got scheduled to make sure we take
+		 * over printing without depending on the scheduler.
+		 */
+		spin_lock_irqsave(&print_lock, flags);
+		atomic_dec(&printing_tasks_spinning);
+		spin_unlock_irqrestore(&print_lock, flags);
+		if (console_trylock())
+			console_unlock();
+		preempt_enable();
+	}
+	return 0;
+}
+
 static int __init printk_late_init(void)
 {
 	struct console *con;
+	int i;
+	struct task_struct *task;
 
 	for_each_console(con) {
 		if (!keep_bootcon && con->flags & CON_BOOT) {
@@ -2664,6 +2790,19 @@ static int __init printk_late_init(void)
 		}
 	}
 	hotcpu_notifier(console_cpu_notify, 0);
+
+	/* Does any handover of printing have any sence? */
+	if (num_possible_cpus() <= 1)
+		return 0;
+
+	for (i = 0; i < PRINTING_TASKS; i++) {
+		task = kthread_run(printing_task, NULL, "print/%d", i);
+		if (IS_ERR(task)) {
+			pr_err("printk: Cannot create printing thread: %ld\n",
+			       PTR_ERR(task));
+		}
+	}
+
 	return 0;
 }
 late_initcall(printk_late_init);
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 2/7] printk: Start printing handover kthreads on demand
  2015-10-26  4:52 [PATCH 0/6 v2] printk: Softlockup avoidance Jan Kara
  2015-10-26  4:52 ` [PATCH 1/7] printk: Hand over printing to console if printing too long Jan Kara
@ 2015-10-26  4:52 ` Jan Kara
  2015-10-26  4:52 ` [PATCH 3/7] kernel: Avoid softlockups in stop_machine() during heavy printing Jan Kara
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 35+ messages in thread
From: Jan Kara @ 2015-10-26  4:52 UTC (permalink / raw)
  To: Andrew Morton; +Cc: LKML, pmladek, KY Srinivasan, rostedt, Jan Kara

From: Jan Kara <jack@suse.cz>

Start kthreads for handing over printing only when printk.offload_chars
is set to value > 0 (i.e., when print offloading gets enabled).

Signed-off-by: Jan Kara <jack@suse.cz>
---
 kernel/printk/printk.c | 78 +++++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 64 insertions(+), 14 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 1b26263edfa7..b9bb4a7a6dff 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -98,6 +98,10 @@ static atomic_t printing_tasks_spinning = ATOMIC_INIT(0);
  * CPUs.
  */
 #define PRINTING_TASKS 2
+/* Pointers to printing kthreads */
+static struct task_struct *printing_kthread[PRINTING_TASKS];
+/* Serialization of changes to printk_offload_chars and kthread creation */
+static DEFINE_MUTEX(printk_kthread_mutex);
 
 /* Wait queue printing kthreads sleep on when idle */
 static DECLARE_WAIT_QUEUE_HEAD(print_queue);
@@ -303,6 +307,13 @@ static u32 clear_idx;
 static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN);
 static char *log_buf = __log_buf;
 static u32 log_buf_len = __LOG_BUF_LEN;
+
+static int offload_chars_set(const char *val, const struct kernel_param *kp);
+static struct kernel_param_ops offload_chars_ops = {
+	.set = offload_chars_set,
+	.get = param_get_uint,
+};
+
 /*
  * How many characters can we print in one call of printk before asking
  * other cpus to continue printing. 0 means infinity. Tunable via
@@ -311,7 +322,7 @@ static u32 log_buf_len = __LOG_BUF_LEN;
  */
 static unsigned int __read_mostly printk_offload_chars = 1000;
 
-module_param_named(offload_chars, printk_offload_chars, uint,
+module_param_cb(offload_chars, &offload_chars_ops, &printk_offload_chars,
 		   S_IRUGO | S_IWUSR);
 MODULE_PARM_DESC(offload_chars, "offload printing to console to a different"
 	" cpu after this number of characters");
@@ -2778,12 +2789,61 @@ static int printing_task(void *arg)
 	return 0;
 }
 
-static int __init printk_late_init(void)
+static int printk_start_offload_kthreads(void)
 {
-	struct console *con;
 	int i;
 	struct task_struct *task;
 
+	/* Does handover of printing make any sense? */
+	if (printk_offload_chars == 0 || num_possible_cpus() <= 1)
+		return 0;
+	for (i = 0; i < PRINTING_TASKS; i++) {
+		if (printing_kthread[i])
+			continue;
+		task = kthread_run(printing_task, NULL, "print/%d", i);
+		if (IS_ERR(task))
+			goto out_err;
+		printing_kthread[i] = task;
+	}
+	return 0;
+out_err:
+	pr_err("printk: Cannot create printing thread: %ld\n", PTR_ERR(task));
+	/* Disable offloading if creating kthreads failed */
+	printk_offload_chars = 0;
+	return PTR_ERR(task);
+}
+
+static int offload_chars_set(const char *val, const struct kernel_param *kp)
+{
+	int ret;
+
+	/* Protect against parallel change of printk_offload_chars */
+	mutex_lock(&printk_kthread_mutex);
+	ret = param_set_uint(val, kp);
+	if (ret) {
+		mutex_unlock(&printk_kthread_mutex);
+		return ret;
+	}
+	ret = printk_start_offload_kthreads();
+	mutex_unlock(&printk_kthread_mutex);
+	return ret;
+}
+
+static void printk_offload_init(void)
+{
+	mutex_lock(&printk_kthread_mutex);
+	if (num_possible_cpus() <= 1) {
+		/* Offloading doesn't make sense. Disable print offloading. */
+		printk_offload_chars = 0;
+	} else
+		printk_start_offload_kthreads();
+	mutex_unlock(&printk_kthread_mutex);
+}
+
+static int __init printk_late_init(void)
+{
+	struct console *con;
+
 	for_each_console(con) {
 		if (!keep_bootcon && con->flags & CON_BOOT) {
 			unregister_console(con);
@@ -2791,17 +2851,7 @@ static int __init printk_late_init(void)
 	}
 	hotcpu_notifier(console_cpu_notify, 0);
 
-	/* Does any handover of printing have any sence? */
-	if (num_possible_cpus() <= 1)
-		return 0;
-
-	for (i = 0; i < PRINTING_TASKS; i++) {
-		task = kthread_run(printing_task, NULL, "print/%d", i);
-		if (IS_ERR(task)) {
-			pr_err("printk: Cannot create printing thread: %ld\n",
-			       PTR_ERR(task));
-		}
-	}
+	printk_offload_init();
 
 	return 0;
 }
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 3/7] kernel: Avoid softlockups in stop_machine() during heavy printing
  2015-10-26  4:52 [PATCH 0/6 v2] printk: Softlockup avoidance Jan Kara
  2015-10-26  4:52 ` [PATCH 1/7] printk: Hand over printing to console if printing too long Jan Kara
  2015-10-26  4:52 ` [PATCH 2/7] printk: Start printing handover kthreads on demand Jan Kara
@ 2015-10-26  4:52 ` Jan Kara
  2015-10-26  4:56   ` Jan Kara
  2015-10-26  4:52 ` [PATCH 4/7] panic: Always flush printk buffer before panic Jan Kara
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 35+ messages in thread
From: Jan Kara @ 2015-10-26  4:52 UTC (permalink / raw)
  To: Andrew Morton; +Cc: LKML, pmladek, KY Srinivasan, rostedt, Jan Kara

From: Jan Kara <jack@suse.cz>

When there are lots of messages accumulated in printk buffer, printing
them (especially over serial console) can take a long time (tens of
seconds). stop_machine() will effectively make all cpus spin in
multi_cpu_stop() waiting for the CPU doing printing to print all the
messages which triggers NMI softlockup watchdog and RCU stall detector
which add even more to the messages to print. Since machine doesn't do
anything (except serving interrupts) during this time, also network
connections are dropped and other disturbances may happen.

Paper over the problem by waiting for printk buffer to be empty before
starting to stop CPUs. In theory a burst of new messages can be appended
to the printk buffer before CPUs enter multi_cpu_stop() so this isn't a 100%
solution but it works OK in practice and I'm not aware of a reasonably
simple better solution.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 include/linux/console.h | 11 +++++++++++
 kernel/printk/printk.c  | 25 +++++++++++++++++++++++++
 kernel/stop_machine.c   |  9 +++++++++
 3 files changed, 45 insertions(+)

diff --git a/include/linux/console.h b/include/linux/console.h
index bd194343c346..96da462cdfeb 100644
--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -150,6 +150,17 @@ extern int console_trylock(void);
 extern void console_unlock(void);
 extern void console_conditional_schedule(void);
 extern void console_unblank(void);
+#ifdef CONFIG_SMP
+extern void printk_log_buf_drain(void);
+#else
+/*
+ * In non-SMP kernels there won't be much to drain so save some code for tiny
+ * kernels.
+ */
+static inline void printk_log_buf_drain(void)
+{
+}
+#endif
 extern struct tty_driver *console_device(int *);
 extern void console_stop(struct console *);
 extern void console_start(struct console *);
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index b9bb4a7a6dff..8dc6c146d022 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2488,6 +2488,31 @@ struct tty_driver *console_device(int *index)
 	return driver;
 }
 
+/* For non-SMP kernels this function isn't used and would be pointless anyway */
+#ifdef CONFIG_SMP
+/*
+ * Wait until all messages accumulated in the printk buffer are printed to
+ * console. Note that as soon as this function returns, new messages may be
+ * added to the printk buffer by other CPUs.
+ */
+void printk_log_buf_drain(void)
+{
+	bool retry;
+	unsigned long flags;
+
+	while (1) {
+		raw_spin_lock_irqsave(&logbuf_lock, flags);
+		retry = console_seq != log_next_seq;
+		raw_spin_unlock_irqrestore(&logbuf_lock, flags);
+		if (!retry || console_suspended)
+			break;
+		/* Cycle console_sem to wait for outstanding printing */
+		console_lock();
+		console_unlock();
+	}
+}
+#endif
+
 /*
  * Prevent further output on the passed console device so that (for example)
  * serial drivers can disable console output before suspending a port, and can
diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index 12484e5d5c88..e9496b4a3825 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -21,6 +21,7 @@
 #include <linux/smpboot.h>
 #include <linux/atomic.h>
 #include <linux/lglock.h>
+#include <linux/console.h>
 
 /*
  * Structure to determine completion condition and record errors.  May
@@ -543,6 +544,14 @@ static int __stop_machine(cpu_stop_fn_t fn, void *data, const struct cpumask *cp
 		return ret;
 	}
 
+	/*
+	 * If there are lots of outstanding messages, printing them can take a
+	 * long time and all cpus would be spinning waiting for the printing to
+	 * finish thus triggering NMI watchdog, RCU lockups etc. Wait for the
+	 * printing here to avoid these.
+	 */
+	printk_log_buf_drain();
+
 	/* Set the initial state and stop all online cpus. */
 	set_state(&msdata, MULTI_STOP_PREPARE);
 	return stop_cpus(cpu_online_mask, multi_cpu_stop, &msdata);
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 4/7] panic: Always flush printk buffer before panic
  2015-10-26  4:52 [PATCH 0/6 v2] printk: Softlockup avoidance Jan Kara
                   ` (2 preceding siblings ...)
  2015-10-26  4:52 ` [PATCH 3/7] kernel: Avoid softlockups in stop_machine() during heavy printing Jan Kara
@ 2015-10-26  4:52 ` Jan Kara
  2015-10-26  4:52 ` [PATCH 5/7] printk: Add config option for disabling printk offloading Jan Kara
  2015-10-26  4:52 ` [PATCH 6/7] printk: Avoid scheduling printing threads on the same CPU Jan Kara
  5 siblings, 0 replies; 35+ messages in thread
From: Jan Kara @ 2015-10-26  4:52 UTC (permalink / raw)
  To: Andrew Morton
  Cc: LKML, pmladek, KY Srinivasan, rostedt, Jan Kara, Vitaly Kuznetsov

In some cases we may end up killing the CPU holding the console lock
while still having valuable data in logbuf. E.g. Vitaly is observing the
following:

- A crash is happening on one CPU and console_unlock() is being called
  on some other.

- console_unlock() tries to print out the buffer before releasing the
  lock and on slow console it takes time.

- in the meanwhile crashing CPU does lots of printk()-s with valuable
  data (which go to the logbuf) and sends IPIs to all other CPUs.

- console_unlock() finishes printing previous chunk and enables
  interrupts before trying to print out the rest, the CPU catches the
  IPI and never releases console lock.

This is not the only possible case: in VT/fb subsystems we have many
other console_lock()/console_unlock() users. Non-masked interrupts (or
receiving NMI in case of extreme slowness) will have the same result.

Getting the whole console buffer printed out on crash is top priority.
So zap printk locks and print logbuf contents after all cpus have been
stopped.

Based on patch by Vitaly Kuznetsov.

CC: Vitaly Kuznetsov <vkuznets@redhat.com>
Reported-and-tested-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Jan Kara <jack@suse.com>
---
 include/linux/console.h | 4 ++--
 kernel/panic.c          | 8 ++++++++
 kernel/printk/printk.c  | 5 ++++-
 kernel/stop_machine.c   | 2 +-
 4 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/include/linux/console.h b/include/linux/console.h
index 96da462cdfeb..f40084802f3f 100644
--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -151,13 +151,13 @@ extern void console_unlock(void);
 extern void console_conditional_schedule(void);
 extern void console_unblank(void);
 #ifdef CONFIG_SMP
-extern void printk_log_buf_drain(void);
+extern void printk_log_buf_drain(bool panic);
 #else
 /*
  * In non-SMP kernels there won't be much to drain so save some code for tiny
  * kernels.
  */
-static inline void printk_log_buf_drain(void)
+static inline void printk_log_buf_drain(bool panic)
 {
 }
 #endif
diff --git a/kernel/panic.c b/kernel/panic.c
index 04e91ff7560b..d07ed830a9fb 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -23,6 +23,7 @@
 #include <linux/sysrq.h>
 #include <linux/init.h>
 #include <linux/nmi.h>
+#include <linux/console.h>
 
 #define PANIC_TIMER_STEP 100
 #define PANIC_BLINK_SPD 18
@@ -147,6 +148,13 @@ void panic(const char *fmt, ...)
 
 	bust_spinlocks(0);
 
+	/*
+	 * We may have ended up stopping the CPU doing printing (in
+	 * smp_send_stop()) while still having some valuable data in the
+	 * console buffer.  Flush it out.
+	 */
+	printk_log_buf_drain(true);
+
 	if (!panic_blink)
 		panic_blink = no_blink;
 
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 8dc6c146d022..e404c429fe87 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2495,11 +2495,14 @@ struct tty_driver *console_device(int *index)
  * console. Note that as soon as this function returns, new messages may be
  * added to the printk buffer by other CPUs.
  */
-void printk_log_buf_drain(void)
+void printk_log_buf_drain(bool panic)
 {
 	bool retry;
 	unsigned long flags;
 
+	if (panic)
+		zap_locks();
+
 	while (1) {
 		raw_spin_lock_irqsave(&logbuf_lock, flags);
 		retry = console_seq != log_next_seq;
diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index e9496b4a3825..50a03735893e 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -550,7 +550,7 @@ static int __stop_machine(cpu_stop_fn_t fn, void *data, const struct cpumask *cp
 	 * finish thus triggering NMI watchdog, RCU lockups etc. Wait for the
 	 * printing here to avoid these.
 	 */
-	printk_log_buf_drain();
+	printk_log_buf_drain(false);
 
 	/* Set the initial state and stop all online cpus. */
 	set_state(&msdata, MULTI_STOP_PREPARE);
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 5/7] printk: Add config option for disabling printk offloading
  2015-10-26  4:52 [PATCH 0/6 v2] printk: Softlockup avoidance Jan Kara
                   ` (3 preceding siblings ...)
  2015-10-26  4:52 ` [PATCH 4/7] panic: Always flush printk buffer before panic Jan Kara
@ 2015-10-26  4:52 ` Jan Kara
  2015-10-26  4:52 ` [PATCH 6/7] printk: Avoid scheduling printing threads on the same CPU Jan Kara
  5 siblings, 0 replies; 35+ messages in thread
From: Jan Kara @ 2015-10-26  4:52 UTC (permalink / raw)
  To: Andrew Morton; +Cc: LKML, pmladek, KY Srinivasan, rostedt, Jan Kara

From: Jan Kara <jack@suse.cz>

Necessity for offloading of printing was observed only for large
systems. So add a config option (disabled by default) which removes most
of the overhead added by this functionality.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 Documentation/kernel-parameters.txt | 13 +++++++------
 init/Kconfig                        | 14 ++++++++++++++
 kernel/printk/printk.c              | 35 +++++++++++++++++++++++++++++++++--
 3 files changed, 54 insertions(+), 8 deletions(-)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index df8adee975ba..913c166fdfea 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -2958,18 +2958,19 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
 			Format: <bool>  (1/Y/y=enable, 0/N/n=disable)
 			default: disabled
 
-	printk.offload_chars=
+	printk.offload_chars=	[KNL]
 			Printing to console can be relatively slow especially
 			in case of serial console. When there is intensive
 			printing happening from several cpus (as is the case
 			during boot), a cpu can be spending significant time
 			(seconds or more) doing printing. To avoid softlockups,
 			lost interrupts, and similar problems other cpus
-			will take over printing after the currently printing
-			cpu has printed 'printk.offload_chars' characters.
-			Higher value means possibly longer interrupt and other
-			latencies but lower overhead of printing due to handing
-			over of printing.
+			will take over printing (if CONFIG_PRINTK_OFFLOAD=y)
+			after the currently printing cpu has printed
+			'printk.offload_chars' characters. Higher value means
+			possibly longer interrupt and other latencies but
+			lower overhead of printing due to handing over of
+			printing.
 			Format: <number> (0 = disabled)
 			default: 1000
 
diff --git a/init/Kconfig b/init/Kconfig
index c24b6f767bf0..fa9749da5fc8 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1456,6 +1456,20 @@ config PRINTK
 	  very difficult to diagnose system problems, saying N here is
 	  strongly discouraged.
 
+config PRINTK_OFFLOAD
+	default n
+	bool "Enable support for offloading printing to different CPU"
+	depends on PRINTK && SMP
+	help
+	  Printing to console can be relatively slow especially in case of
+	  serial console. On large machines when there is intensive printing
+	  happening from several cpus (as is the case during boot), a cpu can
+	  be spending significant time (seconds or more) doing printing. To
+	  avoid softlockups, lost interrupts, and similar problems other cpus
+	  will take over printing after the currently printing cpu has printed
+	  certain number of characters (tunable via 'printk.offload_chars'
+	  kernel parameter).
+
 config BUG
 	bool "BUG() support" if EXPERT
 	default y
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index e404c429fe87..5153c6518b9d 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -79,6 +79,7 @@ static DEFINE_SEMAPHORE(console_sem);
 struct console *console_drivers;
 EXPORT_SYMBOL_GPL(console_drivers);
 
+#ifdef CONFIG_PRINTK_OFFLOAD
 /*
  * This spinlock is taken when printing to console. It is used only so that
  * we can spin on it when some other thread wants to take over printing to
@@ -105,6 +106,7 @@ static DEFINE_MUTEX(printk_kthread_mutex);
 
 /* Wait queue printing kthreads sleep on when idle */
 static DECLARE_WAIT_QUEUE_HEAD(print_queue);
+#endif	/* CONFIG_PRINTK_OFFLOAD */
 
 #ifdef CONFIG_LOCKDEP
 static struct lockdep_map console_lock_dep_map = {
@@ -308,6 +310,7 @@ static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN);
 static char *log_buf = __log_buf;
 static u32 log_buf_len = __LOG_BUF_LEN;
 
+#ifdef CONFIG_PRINTK_OFFLOAD
 static int offload_chars_set(const char *val, const struct kernel_param *kp);
 static struct kernel_param_ops offload_chars_ops = {
 	.set = offload_chars_set,
@@ -326,6 +329,7 @@ module_param_cb(offload_chars, &offload_chars_ops, &printk_offload_chars,
 		   S_IRUGO | S_IWUSR);
 MODULE_PARM_DESC(offload_chars, "offload printing to console to a different"
 	" cpu after this number of characters");
+#endif
 
 /* Return log buffer address */
 char *log_buf_addr_get(void)
@@ -2255,6 +2259,7 @@ out:
 	raw_spin_unlock_irqrestore(&logbuf_lock, flags);
 }
 
+#ifdef CONFIG_PRINTK_OFFLOAD
 /*
  * Returns true iff there is other cpu waiting to take over printing. This
  * function also takes are of setting PRINTK_HANDOVER_B if we want to hand over
@@ -2279,6 +2284,23 @@ static bool cpu_stop_printing(int printed_chars)
 	return false;
 }
 
+#define spin_lock_print_lock(flags) spin_lock_irqsave(&print_lock, flags)
+
+#define spin_unlock_print_lock(flags) spin_unlock_irqrestore(&print_lock, flags)
+
+#else
+
+static bool cpu_stop_printing(int printed_chars)
+{
+	return false;
+}
+
+#define spin_lock_print_lock(flags) local_irq_save(flags)
+
+#define spin_unlock_print_lock(flags) local_irq_restore(flags)
+
+#endif
+
 /**
  * console_unlock - unlock the console system
  *
@@ -2317,7 +2339,7 @@ void console_unlock(void)
 	console_cont_flush(text, sizeof(text));
 again:
 	retry = false;
-	spin_lock_irqsave(&print_lock, flags);
+	spin_lock_print_lock(flags);
 	for (;;) {
 		struct printk_log *msg;
 		size_t ext_len = 0;
@@ -2416,7 +2438,7 @@ skip:
 	 * succeeds in getting console_sem (unless someone else takes it and
 	 * then he'll be responsible for printing).
          */
-	spin_unlock_irqrestore(&print_lock, flags);
+	spin_unlock_print_lock(flags);
 
 	/*
 	 * In case we cannot trylock the console_sem again, there's a new owner
@@ -2776,6 +2798,7 @@ int unregister_console(struct console *console)
 }
 EXPORT_SYMBOL(unregister_console);
 
+#ifdef CONFIG_PRINTK_OFFLOAD
 /* Kthread which takes over printing from a CPU which asks for help */
 static int printing_task(void *arg)
 {
@@ -2868,6 +2891,14 @@ static void printk_offload_init(void)
 	mutex_unlock(&printk_kthread_mutex);
 }
 
+#else	/* CONFIG_PRINTK_OFFLOAD */
+
+static void printk_offload_init(void)
+{
+}
+
+#endif	/* CONFIG_PRINTK_OFFLOAD */
+
 static int __init printk_late_init(void)
 {
 	struct console *con;
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 6/7] printk: Avoid scheduling printing threads on the same CPU
  2015-10-26  4:52 [PATCH 0/6 v2] printk: Softlockup avoidance Jan Kara
                   ` (4 preceding siblings ...)
  2015-10-26  4:52 ` [PATCH 5/7] printk: Add config option for disabling printk offloading Jan Kara
@ 2015-10-26  4:52 ` Jan Kara
  5 siblings, 0 replies; 35+ messages in thread
From: Jan Kara @ 2015-10-26  4:52 UTC (permalink / raw)
  To: Andrew Morton; +Cc: LKML, pmladek, KY Srinivasan, rostedt, Jan Kara

Currently nothing forces the scheduler to schedule printing kthread on
the same CPU that is currently doing printing. In fact in some KVM
configurations this seems to happen rather frequently and it defeats
printing offloading since the current CPU is doing printing and watching
for printing kthread to come and take over however that never happens
because that kthread has been scheduled on the very same CPU.

Fix the problem by allowing each printing kthread to be scheduled only
on a subset of CPUs and these subsets are disjoint so at least one of
the kthreads is guaranteed to be able to take over printing. CPU hotplug
makes this more difficult than it should be but we cope by
redistributing kthreads among CPUs when some kthread is not able to run
anywhere.

Signed-off-by: Jan Kara <jack@suse.com>
---
 kernel/printk/printk.c | 105 ++++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 96 insertions(+), 9 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 5153c6518b9d..72334ed42942 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -101,8 +101,10 @@ static atomic_t printing_tasks_spinning = ATOMIC_INIT(0);
 #define PRINTING_TASKS 2
 /* Pointers to printing kthreads */
 static struct task_struct *printing_kthread[PRINTING_TASKS];
+/* Masks of cpus allowed for printing kthreads */
+static struct cpumask *printing_kthread_mask[PRINTING_TASKS];
 /* Serialization of changes to printk_offload_chars and kthread creation */
-static DEFINE_MUTEX(printk_kthread_mutex);
+static DEFINE_MUTEX(printing_kthread_mutex);
 
 /* Wait queue printing kthreads sleep on when idle */
 static DECLARE_WAIT_QUEUE_HEAD(print_queue);
@@ -2840,28 +2842,113 @@ static int printing_task(void *arg)
 	return 0;
 }
 
+/* Divide online cpus among printing kthreads */
+static void distribute_printing_kthreads(void)
+{
+	int i;
+	unsigned int cpus_per_thread;
+	unsigned int cpu, seen_cpu;
+
+	for (i = 0; i < PRINTING_TASKS; i++)
+		cpumask_clear(printing_kthread_mask[i]);
+
+	cpus_per_thread = DIV_ROUND_UP(num_online_cpus(), PRINTING_TASKS);
+	seen_cpu = 0;
+	for_each_online_cpu(cpu) {
+		cpumask_set_cpu(cpu,
+			printing_kthread_mask[seen_cpu / cpus_per_thread]);
+		seen_cpu++;
+	}
+
+	for (i = 0; i < PRINTING_TASKS; i++)
+		if (!cpumask_empty(printing_kthread_mask[i]))
+			set_cpus_allowed_ptr(printing_kthread[i],
+					     printing_kthread_mask[i]);
+}
+
+static int printing_kthread_cpu_notify(struct notifier_block *nfb,
+				       unsigned long action, void *hcpu)
+{
+	unsigned int cpu = (unsigned long)hcpu;
+	int i;
+
+	if (printk_offload_chars == 0)
+		goto out;
+
+	/* Get exclusion against turning of printk offload off... */
+	mutex_lock(&printing_kthread_mutex);
+	/* Now a reliable check if printk offload is enabled */
+	if (printk_offload_chars == 0) {
+		mutex_unlock(&printing_kthread_mutex);
+		goto out;
+	}
+
+	if (action == CPU_ONLINE) {
+		/*
+		 * Allow some task to use the CPU. We don't want to spend too
+		 * much time with fair distribution so just guess. We do a fair
+		 * redistribution if some task has no cpu to run on.
+		 */
+		i = cpu % PRINTING_TASKS;
+		cpumask_set_cpu(cpu, printing_kthread_mask[i]);
+		set_cpus_allowed_ptr(printing_kthread[i],
+				     printing_kthread_mask[i]);
+	}
+	if (action == CPU_DEAD) {
+
+		for (i = 0; i < PRINTING_TASKS; i++) {
+			if (cpumask_test_cpu(cpu, printing_kthread_mask[i])) {
+				cpumask_clear_cpu(cpu,
+						  printing_kthread_mask[i]);
+				if (cpumask_empty(printing_kthread_mask[i]))
+					distribute_printing_kthreads();
+				break;
+			}
+		}
+	}
+	mutex_unlock(&printing_kthread_mutex);
+out:
+	return NOTIFY_OK;
+}
+
 static int printk_start_offload_kthreads(void)
 {
 	int i;
 	struct task_struct *task;
+	int ret;
 
 	/* Does handover of printing make any sense? */
 	if (printk_offload_chars == 0 || num_possible_cpus() <= 1)
 		return 0;
+
 	for (i = 0; i < PRINTING_TASKS; i++) {
 		if (printing_kthread[i])
 			continue;
+		printing_kthread_mask[i] = kmalloc(cpumask_size(), GFP_KERNEL);
+		if (!printing_kthread_mask[i]) {
+			pr_err("printk: Cannot allocate cpumask for printing "
+			       "thread.\n");
+			ret = -ENOMEM;
+			goto out_err;
+		}
 		task = kthread_run(printing_task, NULL, "print/%d", i);
-		if (IS_ERR(task))
+		if (IS_ERR(task)) {
+			kfree(printing_kthread_mask[i]);
+			pr_err("printk: Cannot create printing thread: %ld\n",
+			       PTR_ERR(task));
+			ret = PTR_ERR(task);
 			goto out_err;
+		}
 		printing_kthread[i] = task;
 	}
+
+	hotcpu_notifier(printing_kthread_cpu_notify, 0);
+	distribute_printing_kthreads();
 	return 0;
 out_err:
-	pr_err("printk: Cannot create printing thread: %ld\n", PTR_ERR(task));
 	/* Disable offloading if creating kthreads failed */
 	printk_offload_chars = 0;
-	return PTR_ERR(task);
+	return ret;
 }
 
 static int offload_chars_set(const char *val, const struct kernel_param *kp)
@@ -2869,26 +2956,26 @@ static int offload_chars_set(const char *val, const struct kernel_param *kp)
 	int ret;
 
 	/* Protect against parallel change of printk_offload_chars */
-	mutex_lock(&printk_kthread_mutex);
+	mutex_lock(&printing_kthread_mutex);
 	ret = param_set_uint(val, kp);
 	if (ret) {
-		mutex_unlock(&printk_kthread_mutex);
+		mutex_unlock(&printing_kthread_mutex);
 		return ret;
 	}
 	ret = printk_start_offload_kthreads();
-	mutex_unlock(&printk_kthread_mutex);
+	mutex_unlock(&printing_kthread_mutex);
 	return ret;
 }
 
 static void printk_offload_init(void)
 {
-	mutex_lock(&printk_kthread_mutex);
+	mutex_lock(&printing_kthread_mutex);
 	if (num_possible_cpus() <= 1) {
 		/* Offloading doesn't make sense. Disable print offloading. */
 		printk_offload_chars = 0;
 	} else
 		printk_start_offload_kthreads();
-	mutex_unlock(&printk_kthread_mutex);
+	mutex_unlock(&printing_kthread_mutex);
 }
 
 #else	/* CONFIG_PRINTK_OFFLOAD */
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH 3/7] kernel: Avoid softlockups in stop_machine() during heavy printing
  2015-10-26  4:52 ` [PATCH 3/7] kernel: Avoid softlockups in stop_machine() during heavy printing Jan Kara
@ 2015-10-26  4:56   ` Jan Kara
  0 siblings, 0 replies; 35+ messages in thread
From: Jan Kara @ 2015-10-26  4:56 UTC (permalink / raw)
  To: Andrew Morton; +Cc: LKML, pmladek, KY Srinivasan, rostedt, Jan Kara

  Hmph, sorry for the x/7 numbering. Patch 7 was the debug patch which I
didn't send...

								Honza

On Mon 26-10-15 05:52:46, Jan Kara wrote:
> From: Jan Kara <jack@suse.cz>
> 
> When there are lots of messages accumulated in printk buffer, printing
> them (especially over serial console) can take a long time (tens of
> seconds). stop_machine() will effectively make all cpus spin in
> multi_cpu_stop() waiting for the CPU doing printing to print all the
> messages which triggers NMI softlockup watchdog and RCU stall detector
> which add even more to the messages to print. Since machine doesn't do
> anything (except serving interrupts) during this time, also network
> connections are dropped and other disturbances may happen.
> 
> Paper over the problem by waiting for printk buffer to be empty before
> starting to stop CPUs. In theory a burst of new messages can be appended
> to the printk buffer before CPUs enter multi_cpu_stop() so this isn't a 100%
> solution but it works OK in practice and I'm not aware of a reasonably
> simple better solution.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  include/linux/console.h | 11 +++++++++++
>  kernel/printk/printk.c  | 25 +++++++++++++++++++++++++
>  kernel/stop_machine.c   |  9 +++++++++
>  3 files changed, 45 insertions(+)
> 
> diff --git a/include/linux/console.h b/include/linux/console.h
> index bd194343c346..96da462cdfeb 100644
> --- a/include/linux/console.h
> +++ b/include/linux/console.h
> @@ -150,6 +150,17 @@ extern int console_trylock(void);
>  extern void console_unlock(void);
>  extern void console_conditional_schedule(void);
>  extern void console_unblank(void);
> +#ifdef CONFIG_SMP
> +extern void printk_log_buf_drain(void);
> +#else
> +/*
> + * In non-SMP kernels there won't be much to drain so save some code for tiny
> + * kernels.
> + */
> +static inline void printk_log_buf_drain(void)
> +{
> +}
> +#endif
>  extern struct tty_driver *console_device(int *);
>  extern void console_stop(struct console *);
>  extern void console_start(struct console *);
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index b9bb4a7a6dff..8dc6c146d022 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -2488,6 +2488,31 @@ struct tty_driver *console_device(int *index)
>  	return driver;
>  }
>  
> +/* For non-SMP kernels this function isn't used and would be pointless anyway */
> +#ifdef CONFIG_SMP
> +/*
> + * Wait until all messages accumulated in the printk buffer are printed to
> + * console. Note that as soon as this function returns, new messages may be
> + * added to the printk buffer by other CPUs.
> + */
> +void printk_log_buf_drain(void)
> +{
> +	bool retry;
> +	unsigned long flags;
> +
> +	while (1) {
> +		raw_spin_lock_irqsave(&logbuf_lock, flags);
> +		retry = console_seq != log_next_seq;
> +		raw_spin_unlock_irqrestore(&logbuf_lock, flags);
> +		if (!retry || console_suspended)
> +			break;
> +		/* Cycle console_sem to wait for outstanding printing */
> +		console_lock();
> +		console_unlock();
> +	}
> +}
> +#endif
> +
>  /*
>   * Prevent further output on the passed console device so that (for example)
>   * serial drivers can disable console output before suspending a port, and can
> diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
> index 12484e5d5c88..e9496b4a3825 100644
> --- a/kernel/stop_machine.c
> +++ b/kernel/stop_machine.c
> @@ -21,6 +21,7 @@
>  #include <linux/smpboot.h>
>  #include <linux/atomic.h>
>  #include <linux/lglock.h>
> +#include <linux/console.h>
>  
>  /*
>   * Structure to determine completion condition and record errors.  May
> @@ -543,6 +544,14 @@ static int __stop_machine(cpu_stop_fn_t fn, void *data, const struct cpumask *cp
>  		return ret;
>  	}
>  
> +	/*
> +	 * If there are lots of outstanding messages, printing them can take a
> +	 * long time and all cpus would be spinning waiting for the printing to
> +	 * finish thus triggering NMI watchdog, RCU lockups etc. Wait for the
> +	 * printing here to avoid these.
> +	 */
> +	printk_log_buf_drain();
> +
>  	/* Set the initial state and stop all online cpus. */
>  	set_state(&msdata, MULTI_STOP_PREPARE);
>  	return stop_cpus(cpu_online_mask, multi_cpu_stop, &msdata);
> -- 
> 2.1.4
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/7] printk: Hand over printing to console if printing too long
  2015-10-26  4:52 ` [PATCH 1/7] printk: Hand over printing to console if printing too long Jan Kara
@ 2016-03-01 17:22   ` Denys Vlasenko
  2016-03-02  9:30     ` Jan Kara
  0 siblings, 1 reply; 35+ messages in thread
From: Denys Vlasenko @ 2016-03-01 17:22 UTC (permalink / raw)
  To: Jan Kara
  Cc: Andrew Morton, LKML, pmladek, KY Srinivasan, Steven Rostedt, Jan Kara

On Mon, Oct 26, 2015 at 5:52 AM, Jan Kara <jack@suse.com> wrote:
> This patch implements a mechanism where after printing specified number
> of characters (tunable as a kernel parameter printk.offload_chars), CPU
> doing printing asks for help by waking up one of dedicated kthreads.  As
> soon as the printing CPU notices kthread got scheduled and is spinning
> on print_lock dedicated for that purpose, it drops console_sem,
> print_lock, and exits console_unlock(). Kthread then takes over printing
> instead. This way no CPU should spend printing too long even if there
> is heavy printk traffic.


> +/*
> + * Number of kernel threads for offloading printing. We need at least two so
> + * that they can hand over printing from one to another one and thus switch
> + * CPUs.
> + */
> +#define PRINTING_TASKS 2
> +
> +/* Wait queue printing kthreads sleep on when idle */
> +static DECLARE_WAIT_QUEUE_HEAD(print_queue);

Having two tasks, not one, for printking for the case
when console output is slow... sounds wasteful.

Can this be improved so that only one task is needed?

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/7] printk: Hand over printing to console if printing too long
  2016-03-01 17:22   ` Denys Vlasenko
@ 2016-03-02  9:30     ` Jan Kara
  0 siblings, 0 replies; 35+ messages in thread
From: Jan Kara @ 2016-03-02  9:30 UTC (permalink / raw)
  To: Denys Vlasenko
  Cc: Jan Kara, Andrew Morton, LKML, pmladek, KY Srinivasan,
	Steven Rostedt, Jan Kara

On Tue 01-03-16 18:22:25, Denys Vlasenko wrote:
> On Mon, Oct 26, 2015 at 5:52 AM, Jan Kara <jack@suse.com> wrote:
> > This patch implements a mechanism where after printing specified number
> > of characters (tunable as a kernel parameter printk.offload_chars), CPU
> > doing printing asks for help by waking up one of dedicated kthreads.  As
> > soon as the printing CPU notices kthread got scheduled and is spinning
> > on print_lock dedicated for that purpose, it drops console_sem,
> > print_lock, and exits console_unlock(). Kthread then takes over printing
> > instead. This way no CPU should spend printing too long even if there
> > is heavy printk traffic.
> 
> 
> > +/*
> > + * Number of kernel threads for offloading printing. We need at least two so
> > + * that they can hand over printing from one to another one and thus switch
> > + * CPUs.
> > + */
> > +#define PRINTING_TASKS 2
> > +
> > +/* Wait queue printing kthreads sleep on when idle */
> > +static DECLARE_WAIT_QUEUE_HEAD(print_queue);
> 
> Having two tasks, not one, for printking for the case
> when console output is slow... sounds wasteful.
> 
> Can this be improved so that only one task is needed?

Probably we'll go with workqueue in the next version of the patch series.
But at least in this version you needed two tasks so that one task can hand
over printing to the other one and thus relieve the load from a CPU.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/7] printk: Hand over printing to console if printing too long
  2016-01-11 12:54   ` Petr Mladek
@ 2016-01-12 14:00     ` Jan Kara
  0 siblings, 0 replies; 35+ messages in thread
From: Jan Kara @ 2016-01-12 14:00 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Jan Kara, Sergey Senozhatsky, Andrew Morton, KY Sri nivasan,
	Steven Rostedt, linux-kernel, Sergey Senozhatsky

On Mon 11-01-16 13:54:34, Petr Mladek wrote:
> On Tue 2015-12-22 14:47:30, Jan Kara wrote:
> > On Thu 10-12-15 23:52:51, Sergey Senozhatsky wrote:
> > > *** in this email and in every later emails ***
> > Over last few days I have worked on redoing the stuff as we
> > discussed with Linus and Andrew at Kernel Summit and I have new patches
> > which are working fine for me. I still want to test them on some machines
> > having real issues with udev during boot but so far stress-testing with
> > serial console slowed down to ~1000 chars/sec on other machines and VMs
> > looks promising.
> > 
> > I'm attaching them in case you want to have a look. They are on top of
> > Tejun's patch adding cond_resched() (which is essential). I'll officially
> > submit the patches once the testing is finished (but I'm not sure when I
> > get to the problematic HW...).
> > 
> > [1] http://www.spinics.net/lists/stable/msg111535.html
> > -- 
> > Jan Kara <jack@suse.com>
> > SUSE Labs, CR
> 
> > >From 2e9675abbfc0df4a24a8c760c58e8150b9a31259 Mon Sep 17 00:00:00 2001
> > From: Jan Kara <jack@suse.cz>
> > Date: Mon, 21 Dec 2015 13:10:31 +0100
> > Subject: [PATCH 1/2] printk: Make printk() completely async
> > 
> > --- a/kernel/printk/printk.c
> > +++ b/kernel/printk/printk.c
> > @@ -1803,10 +1869,24 @@ asmlinkage int vprintk_emit(int facility, int level,
> >  	logbuf_cpu = UINT_MAX;
> >  	raw_spin_unlock(&logbuf_lock);
> >  	lockdep_on();
> > +	/*
> > +	 * By default we print message to console asynchronously so that kernel
> > +	 * doesn't get stalled due to slow serial console. That can lead to
> > +	 * softlockups, lost interrupts, or userspace timing out under heavy
> > +	 * printing load.
> > +	 *
> > +	 * However we resort to synchronous printing of messages during early
> > +	 * boot, when oops is in progress, or when synchronous printing was
> > +	 * explicitely requested by kernel parameter.
> > +	 */
> > +	if (keventd_up() && !oops_in_progress && !sync_print) {
> > +		__this_cpu_or(printk_pending, PRINTK_PENDING_OUTPUT);
> > +		irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
> 
> I wonder if it is safe to use sync_print for early messages
> from the scheduler. Well, in this case, we might need to print
> the messages directly from the irq context because the system
> workqueue is not ready yet :-(

Yeah, you are right that early scheduler messages can currently result in a
deadlock. I'll handle this case. Thanks for catching this.

> > +	} else
> > +		sync_print = true;
> >  	local_irq_restore(flags);
> >  
> > -	/* If called from the scheduler, we can not call up(). */
> > -	if (!in_sched) {
> > +	if (sync_print) {
> >  		lockdep_off();
> >  		/*
> >  		 * Disable preemption to avoid being preempted while holding
> 
> > >From be116ae18f15f0d2d05ddf0b53eaac184943d312 Mon Sep 17 00:00:00 2001
> > From: Jan Kara <jack@suse.cz>
> > Date: Mon, 21 Dec 2015 14:26:13 +0100
> > Subject: [PATCH 2/2] printk: Skip messages on oops
> > 
> > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> > index d455d1bd0d2c..fc67ab70e9c7 100644
> > --- a/kernel/printk/printk.c
> > +++ b/kernel/printk/printk.c
> > @@ -262,6 +262,9 @@ static u64 console_seq;
> >  static u32 console_idx;
> >  static enum log_flags console_prev;
> >  
> > +/* current record sequence when oops happened */
> > +static u64 oops_start_seq;
> > +
> >  /* the next printk record to read after the last 'clear' command */
> >  static u64 clear_seq;
> >  static u32 clear_idx;
> > @@ -1783,6 +1786,8 @@ asmlinkage int vprintk_emit(int facility, int level,
> >  					 NULL, 0, recursion_msg,
> >  					 strlen(recursion_msg));
> >  	}
> > +	if (oops_in_progress && !sync_print && !oops_start_seq)
> > +		oops_start_seq = log_next_seq;
> 
> sync_print is false for scheduler messages here. Therefore we might
> skip messages even when user set printk.synchronous on the
> command line.

Ah, right. I'll fix this. Thanks!

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/7] printk: Hand over printing to console if printing too long
  2016-01-06 12:25         ` Jan Kara
@ 2016-01-11 13:25           ` Sergey Senozhatsky
  0 siblings, 0 replies; 35+ messages in thread
From: Sergey Senozhatsky @ 2016-01-11 13:25 UTC (permalink / raw)
  To: Jan Kara
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Andrew Morton,
	Petr Mladek, KY Sri nivasan, Steven Rostedt, linux-kernel

Hello Jan,

On (01/06/16 13:25), Jan Kara wrote:
[..]
> > a quote from -mm a74b6533ead8 http://www.spinics.net/lists/linux-mm/msg98990.html
[..]
> > particularly this "workqueue context is not appropriate because all the workers
> > might be busy (e.g.  allocating memory)" part. I think printk should switch to
> > sync mode in this case, since printk now does queue_work(system_wq, work).
> > um... console_verbose() call from oom kill? but it'll be nice to return back
> > to async mode once (if) memory pressure goes away.
> 
> Hum, yes, some mechanism to switch to sync printing in case work cannot be
> executed for a long time is probably needed. I'll think about it.

well, technically, worker_pool keeps ->watchdog_ts updated, so ,basically,
worker pool knows when it stall. with CONFIG_WQ_WATCHDOG enabled timer_fn
wq_watchdog_timer_fn() checks that value and pr_emerg(). in the worst case,
printk can depend on CONFIG_WQ_WATCHDOG (yes, this sounds a bit sad) --
which implies, however, potentially long print from timer_fn. having a
printk() specific timer_fn, that will do the same, is just a duplication of
functionality; and checking the value in every vprintk_emit() is not really
an option too, I'm afraid, there may be no printk calls for some time.
just my 5 cents. probably you have better ideas.


one another thing, include/linux/workqueue.h says

 : System-wide workqueues which are always present.
 :
 : system_wq is the one used by schedule[_delayed]_work[_on]().
 : Multi-CPU multi-threaded.  There are users which expect relatively
 : short queue flush time.  Don't queue works which can run for too
 : long.
 :
[..]
 :
 : system_long_wq is similar to system_wq but may host long running
 : works.  Queue flushing might take relatively long.
 :
 : system_unbound_wq is unbound workqueue.  Workers are not bound to
 : any specific CPU, not concurrency managed, and all queued works are
 : executed immediately as long as max_active limit is not reached and
 : resources are available.

wake_up_klogd_work_func() is using `system_wq' to do
'console_lock()/console_unlock()', both of which can take a long time.
should it be switched to `system_long_wq' or `system_unbound_wq'?

	-ss

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/7] printk: Hand over printing to console if printing too long
  2015-12-22 13:47 ` Jan Kara
                     ` (2 preceding siblings ...)
  2015-12-31  2:44   ` Sergey Senozhatsky
@ 2016-01-11 12:54   ` Petr Mladek
  2016-01-12 14:00     ` Jan Kara
  3 siblings, 1 reply; 35+ messages in thread
From: Petr Mladek @ 2016-01-11 12:54 UTC (permalink / raw)
  To: Jan Kara
  Cc: Sergey Senozhatsky, Andrew Morton, KY Sri nivasan,
	Steven Rostedt, linux-kernel, Sergey Senozhatsky

On Tue 2015-12-22 14:47:30, Jan Kara wrote:
> On Thu 10-12-15 23:52:51, Sergey Senozhatsky wrote:
> > *** in this email and in every later emails ***
> Over last few days I have worked on redoing the stuff as we
> discussed with Linus and Andrew at Kernel Summit and I have new patches
> which are working fine for me. I still want to test them on some machines
> having real issues with udev during boot but so far stress-testing with
> serial console slowed down to ~1000 chars/sec on other machines and VMs
> looks promising.
> 
> I'm attaching them in case you want to have a look. They are on top of
> Tejun's patch adding cond_resched() (which is essential). I'll officially
> submit the patches once the testing is finished (but I'm not sure when I
> get to the problematic HW...).
> 
> [1] http://www.spinics.net/lists/stable/msg111535.html
> -- 
> Jan Kara <jack@suse.com>
> SUSE Labs, CR

> >From 2e9675abbfc0df4a24a8c760c58e8150b9a31259 Mon Sep 17 00:00:00 2001
> From: Jan Kara <jack@suse.cz>
> Date: Mon, 21 Dec 2015 13:10:31 +0100
> Subject: [PATCH 1/2] printk: Make printk() completely async
> 
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -1803,10 +1869,24 @@ asmlinkage int vprintk_emit(int facility, int level,
>  	logbuf_cpu = UINT_MAX;
>  	raw_spin_unlock(&logbuf_lock);
>  	lockdep_on();
> +	/*
> +	 * By default we print message to console asynchronously so that kernel
> +	 * doesn't get stalled due to slow serial console. That can lead to
> +	 * softlockups, lost interrupts, or userspace timing out under heavy
> +	 * printing load.
> +	 *
> +	 * However we resort to synchronous printing of messages during early
> +	 * boot, when oops is in progress, or when synchronous printing was
> +	 * explicitely requested by kernel parameter.
> +	 */
> +	if (keventd_up() && !oops_in_progress && !sync_print) {
> +		__this_cpu_or(printk_pending, PRINTK_PENDING_OUTPUT);
> +		irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));

I wonder if it is safe to use sync_print for early messages
from the scheduler. Well, in this case, we might need to print
the messages directly from the irq context because the system
workqueue is not ready yet :-(


> +	} else
> +		sync_print = true;
>  	local_irq_restore(flags);
>  
> -	/* If called from the scheduler, we can not call up(). */
> -	if (!in_sched) {
> +	if (sync_print) {
>  		lockdep_off();
>  		/*
>  		 * Disable preemption to avoid being preempted while holding

> >From be116ae18f15f0d2d05ddf0b53eaac184943d312 Mon Sep 17 00:00:00 2001
> From: Jan Kara <jack@suse.cz>
> Date: Mon, 21 Dec 2015 14:26:13 +0100
> Subject: [PATCH 2/2] printk: Skip messages on oops
> 
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index d455d1bd0d2c..fc67ab70e9c7 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -262,6 +262,9 @@ static u64 console_seq;
>  static u32 console_idx;
>  static enum log_flags console_prev;
>  
> +/* current record sequence when oops happened */
> +static u64 oops_start_seq;
> +
>  /* the next printk record to read after the last 'clear' command */
>  static u64 clear_seq;
>  static u32 clear_idx;
> @@ -1783,6 +1786,8 @@ asmlinkage int vprintk_emit(int facility, int level,
>  					 NULL, 0, recursion_msg,
>  					 strlen(recursion_msg));
>  	}
> +	if (oops_in_progress && !sync_print && !oops_start_seq)
> +		oops_start_seq = log_next_seq;

sync_print is false for scheduler messages here. Therefore we might
skip messages even when user set printk.synchronous on the
command line.

Otherwise, the patch set looks rather straightforward.

Best Regards,
Petr

>  	/*
>  	 * The printf needs to come first; we need the syslog
> @@ -2292,6 +2297,12 @@ out:
>  	raw_spin_unlock_irqrestore(&logbuf_lock, flags);
>  }
>  
> +/*
> + * When oops happens and there are more messages to be printed in the printk
> + * buffer that this, skip some mesages and print only this many newest messages.
> + */
> +#define PRINT_MSGS_BEFORE_OOPS 100
> +
>  /**
>   * console_unlock - unlock the console system
>   *

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/7] printk: Hand over printing to console if printing too long
  2016-01-06  6:48       ` Sergey Senozhatsky
@ 2016-01-06 12:25         ` Jan Kara
  2016-01-11 13:25           ` Sergey Senozhatsky
  0 siblings, 1 reply; 35+ messages in thread
From: Jan Kara @ 2016-01-06 12:25 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Jan Kara, Sergey Senozhatsky, Andrew Morton, Petr Mladek,
	KY Sri nivasan, Steven Rostedt, linux-kernel

On Wed 06-01-16 15:48:36, Sergey Senozhatsky wrote:
> On (01/05/16 15:37), Jan Kara wrote:
> > > How about setting 'sync_print' to 'true' in...
> > >   bust_spinlocks() /* only set to true */
> > > or
> > >   console_verbose() /* um... may be... */
> > > or
> > >   having a separate one-liner for that
> > > 
> > > void console_panic_mode(void)
> > > {
> > > 	sync_print = true;
> > > }
> > > 
> > > and call it early in panic(), before we send out IPI_STOP.
> > 
> > I like using console_verbose() for setting sync_print to true. That will
> > likely be more reliable than using oops in progress. After all
> > console_verbose() is used like console_panic_mode() anyway and in quite a
> > few places so it is a reasonable match.
> 
> another corner case.
> 
> a quote from -mm a74b6533ead8 http://www.spinics.net/lists/linux-mm/msg98990.html
> 
> :    This patch reduces the probability of such a lockup by introducing a
> :    specialized kernel thread (oom_reaper) which tries to reclaim additional
> :    memory by preemptively reaping the anonymous or swapped out memory owned
> :    by the oom victim under an assumption that such a memory won't be needed
> :    when its owner is killed and kicked from the userspace anyway.  There is
> :    one notable exception to this, though, if the OOM victim was in the
> :    process of coredumping the result would be incomplete.  This is considered
> :    a reasonable constrain because the overall system health is more important
> :    than debugability of a particular application.
> :
> :    A kernel thread has been chosen because we need a reliable way of
> :    invocation so workqueue context is not appropriate because all the workers
> :    might be busy (e.g.  allocating memory).  Kswapd which sounds like another
> :    good fit is not appropriate as well because it might get blocked on locks
> :    during reclaim as well.
> 
> particularly this "workqueue context is not appropriate because all the workers
> might be busy (e.g.  allocating memory)" part. I think printk should switch to
> sync mode in this case, since printk now does queue_work(system_wq, work).
> um... console_verbose() call from oom kill? but it'll be nice to return back
> to async mode once (if) memory pressure goes away.

Hum, yes, some mechanism to switch to sync printing in case work cannot be
executed for a long time is probably needed. I'll think about it.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/7] printk: Hand over printing to console if printing too long
  2016-01-06 10:21               ` Jan Kara
@ 2016-01-06 11:10                 ` Sergey Senozhatsky
  0 siblings, 0 replies; 35+ messages in thread
From: Sergey Senozhatsky @ 2016-01-06 11:10 UTC (permalink / raw)
  To: Jan Kara
  Cc: Sergey Senozhatsky, Andrew Morton, Petr Mladek, KY Sri nivasan,
	Steven Rostedt, linux-kernel, Sergey Senozhatsky

On (01/06/16 11:21), Jan Kara wrote:
[..]
> > 
> > or like this on another setup ([k|u]_ts updated to u64)
> > 
> > # cat /proc/1/time_in_console_unlock
> > kern:[12.755920] user:[38.367332]
> 
> So maybe that is worth addressing if it bothers you but please as a
> separate patch set. This seems fairly independent and I think even current
> version of the patches will be controversial enough...

Agree

	-ss

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/7] printk: Hand over printing to console if printing too long
  2016-01-06  8:36             ` Sergey Senozhatsky
@ 2016-01-06 10:21               ` Jan Kara
  2016-01-06 11:10                 ` Sergey Senozhatsky
  0 siblings, 1 reply; 35+ messages in thread
From: Jan Kara @ 2016-01-06 10:21 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Jan Kara, Andrew Morton, Petr Mladek, KY Sri nivasan,
	Steven Rostedt, linux-kernel, Sergey Senozhatsky

On Wed 06-01-16 17:36:53, Sergey Senozhatsky wrote:
> On (01/06/16 12:38), Sergey Senozhatsky wrote:
> > On (01/05/16 15:48), Jan Kara wrote:
> > > > [..]
> > > > > cond_resched() does its job there, of course. well, a user process still can
> > > > > do a lot of call_console_drivers() calls. may be we can check who is calling
> > > > > console_unlock() and if we have "!printk_sync && !oops_in_progress" (or just printk_sync
> > > > > test) AND a user process then return from console_unlock() doing irq_work_queue()
> > > > > and set PRINTK_PENDING_OUTPUT pending bit, the way vprintk_emit() does it.
> > > > 
> > > > attached two patches, I ended up having on top of yours. just in case.
> > > > 
> > > >     printk: factor out can_printk_async()
> > > >     
> > > >     console_unlock() can be called directly or indirectly by a user
> > > >     space process, so it can end up doing call_console_drivers() loop,
> > > >     which will hold it from returning back to user-space from a syscall
> > > >     for unpredictable amount of time.
> > > >     
> > > >     Factor out can_printk_async() function, which queues an irq work and
> > > >     sets a PRINTK_PENDING_OUTPUT pending bit (if we can do async printk).
> > > >     vprintk_emit() already does it, add can_printk_async() call to
> > > >     console_unlock() for !PF_KTHREAD processes.
> > > 
> > > I'd be cautious about changing this userspace visible behavior. Someone may
> > > be relying on it... I agree that sometimes we can block userspace process
> > > in kernel for a long time (e.g. in my testing I often see syslog process
> > > doing the printing) but so far I didn't see / was notified about some real
> > > problem with this. So unless I see some real user issues with user
> > > processes doing printing for too long I would not touch this.
> > 
> > and w/o a lot of effort (no heavy printk message traffic)
> 
> or like this on another setup ([k|u]_ts updated to u64)
> 
> # cat /proc/1/time_in_console_unlock
> kern:[12.755920] user:[38.367332]

So maybe that is worth addressing if it bothers you but please as a
separate patch set. This seems fairly independent and I think even current
version of the patches will be controversial enough...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/7] printk: Hand over printing to console if printing too long
  2016-01-06  3:38           ` Sergey Senozhatsky
@ 2016-01-06  8:36             ` Sergey Senozhatsky
  2016-01-06 10:21               ` Jan Kara
  0 siblings, 1 reply; 35+ messages in thread
From: Sergey Senozhatsky @ 2016-01-06  8:36 UTC (permalink / raw)
  To: Jan Kara
  Cc: Andrew Morton, Petr Mladek, KY Sri nivasan, Steven Rostedt,
	linux-kernel, Sergey Senozhatsky, Sergey Senozhatsky

On (01/06/16 12:38), Sergey Senozhatsky wrote:
> On (01/05/16 15:48), Jan Kara wrote:
> > > [..]
> > > > cond_resched() does its job there, of course. well, a user process still can
> > > > do a lot of call_console_drivers() calls. may be we can check who is calling
> > > > console_unlock() and if we have "!printk_sync && !oops_in_progress" (or just printk_sync
> > > > test) AND a user process then return from console_unlock() doing irq_work_queue()
> > > > and set PRINTK_PENDING_OUTPUT pending bit, the way vprintk_emit() does it.
> > > 
> > > attached two patches, I ended up having on top of yours. just in case.
> > > 
> > >     printk: factor out can_printk_async()
> > >     
> > >     console_unlock() can be called directly or indirectly by a user
> > >     space process, so it can end up doing call_console_drivers() loop,
> > >     which will hold it from returning back to user-space from a syscall
> > >     for unpredictable amount of time.
> > >     
> > >     Factor out can_printk_async() function, which queues an irq work and
> > >     sets a PRINTK_PENDING_OUTPUT pending bit (if we can do async printk).
> > >     vprintk_emit() already does it, add can_printk_async() call to
> > >     console_unlock() for !PF_KTHREAD processes.
> > 
> > I'd be cautious about changing this userspace visible behavior. Someone may
> > be relying on it... I agree that sometimes we can block userspace process
> > in kernel for a long time (e.g. in my testing I often see syslog process
> > doing the printing) but so far I didn't see / was notified about some real
> > problem with this. So unless I see some real user issues with user
> > processes doing printing for too long I would not touch this.
> 
> and w/o a lot of effort (no heavy printk message traffic)

or like this on another setup ([k|u]_ts updated to u64)

# cat /proc/1/time_in_console_unlock
kern:[12.755920] user:[38.367332]

	-ss

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/7] printk: Hand over printing to console if printing too long
  2016-01-05 14:37     ` Jan Kara
  2016-01-06  1:41       ` Sergey Senozhatsky
@ 2016-01-06  6:48       ` Sergey Senozhatsky
  2016-01-06 12:25         ` Jan Kara
  1 sibling, 1 reply; 35+ messages in thread
From: Sergey Senozhatsky @ 2016-01-06  6:48 UTC (permalink / raw)
  To: Jan Kara
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Andrew Morton,
	Petr Mladek, KY Sri nivasan, Steven Rostedt, linux-kernel

On (01/05/16 15:37), Jan Kara wrote:
> > How about setting 'sync_print' to 'true' in...
> >   bust_spinlocks() /* only set to true */
> > or
> >   console_verbose() /* um... may be... */
> > or
> >   having a separate one-liner for that
> > 
> > void console_panic_mode(void)
> > {
> > 	sync_print = true;
> > }
> > 
> > and call it early in panic(), before we send out IPI_STOP.
> 
> I like using console_verbose() for setting sync_print to true. That will
> likely be more reliable than using oops in progress. After all
> console_verbose() is used like console_panic_mode() anyway and in quite a
> few places so it is a reasonable match.

another corner case.

a quote from -mm a74b6533ead8 http://www.spinics.net/lists/linux-mm/msg98990.html

:    This patch reduces the probability of such a lockup by introducing a
:    specialized kernel thread (oom_reaper) which tries to reclaim additional
:    memory by preemptively reaping the anonymous or swapped out memory owned
:    by the oom victim under an assumption that such a memory won't be needed
:    when its owner is killed and kicked from the userspace anyway.  There is
:    one notable exception to this, though, if the OOM victim was in the
:    process of coredumping the result would be incomplete.  This is considered
:    a reasonable constrain because the overall system health is more important
:    than debugability of a particular application.
:
:    A kernel thread has been chosen because we need a reliable way of
:    invocation so workqueue context is not appropriate because all the workers
:    might be busy (e.g.  allocating memory).  Kswapd which sounds like another
:    good fit is not appropriate as well because it might get blocked on locks
:    during reclaim as well.

particularly this "workqueue context is not appropriate because all the workers
might be busy (e.g.  allocating memory)" part. I think printk should switch to
sync mode in this case, since printk now does queue_work(system_wq, work).
um... console_verbose() call from oom kill? but it'll be nice to return back
to async mode once (if) memory pressure goes away.

	-ss

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/7] printk: Hand over printing to console if printing too long
  2016-01-05 14:48         ` Jan Kara
@ 2016-01-06  3:38           ` Sergey Senozhatsky
  2016-01-06  8:36             ` Sergey Senozhatsky
  0 siblings, 1 reply; 35+ messages in thread
From: Sergey Senozhatsky @ 2016-01-06  3:38 UTC (permalink / raw)
  To: Jan Kara
  Cc: Sergey Senozhatsky, Andrew Morton, Petr Mladek, KY Sri nivasan,
	Steven Rostedt, linux-kernel, Sergey Senozhatsky

Hello,

On (01/05/16 15:48), Jan Kara wrote:
> > [..]
> > > cond_resched() does its job there, of course. well, a user process still can
> > > do a lot of call_console_drivers() calls. may be we can check who is calling
> > > console_unlock() and if we have "!printk_sync && !oops_in_progress" (or just printk_sync
> > > test) AND a user process then return from console_unlock() doing irq_work_queue()
> > > and set PRINTK_PENDING_OUTPUT pending bit, the way vprintk_emit() does it.
> > 
> > attached two patches, I ended up having on top of yours. just in case.
> > 
> >     printk: factor out can_printk_async()
> >     
> >     console_unlock() can be called directly or indirectly by a user
> >     space process, so it can end up doing call_console_drivers() loop,
> >     which will hold it from returning back to user-space from a syscall
> >     for unpredictable amount of time.
> >     
> >     Factor out can_printk_async() function, which queues an irq work and
> >     sets a PRINTK_PENDING_OUTPUT pending bit (if we can do async printk).
> >     vprintk_emit() already does it, add can_printk_async() call to
> >     console_unlock() for !PF_KTHREAD processes.
> 
> I'd be cautious about changing this userspace visible behavior. Someone may
> be relying on it... I agree that sometimes we can block userspace process
> in kernel for a long time (e.g. in my testing I often see syslog process
> doing the printing) but so far I didn't see / was notified about some real
> problem with this. So unless I see some real user issues with user
> processes doing printing for too long I would not touch this.

hm, interesting point.

/* random thoughts, I'm still on sick leave */

do we really have a user visible behaviour here that it really wants to have?
a task that does tty_open, for instance, hardly wants to end up doing a bunch
of call_console_drivers() calls in console_unlock(). it does look to me mostly
as unexpected side effect.

I probably can imagine someone writing a /usr/bin/flush_logs_to_serial
app that specifically depends on that behaviour, but that will require a bit
of hackery and trickery, since (seems) there is no syscall this app can call
that will perform *only* the required action:

	void force_flush_logs_to_serial(void)
	{
		console_lock();
		console_unlock();
	}

returning back to user space from syscalls quicker is a good thing, I'd
prefer user space apps to do less kernel job (by kernel job I mean
call_console_drivers() loop). well, at least on admittedly weird setups
that I have to deal with. but I may be missing something here.

some numbers

I added global `unsigned long k_ts, u_ts;' to accumulate time spent
in console_unlock() by PF_KTHREAD and !PF_KTHREAD correspondingly.

void console_unlock(void)
{
	...
	s_ts = local_clock();
	console_cont_flush(text, sizeof(text));
again:

	...

	if (wake_klogd)
		wake_up_klogd();

	e_ts = local_clock();
	if (time_after(e_ts, s_ts)) {
		if (current->flags & PF_KTHREAD)
			k_ts += (e_ts - s_ts);
		else
			u_ts += (e_ts - s_ts);
	}
}

and a procfs file to read the values

         unsigned long k = k_ts;
         unsigned long u = u_ts;
         unsigned long rem_nsec_k = do_div(k, 1000000000);
         unsigned long rem_nsec_u = do_div(u, 1000000000);

         return sprintf(buf, "kern:[%lu.%06lu] user:[%lu.%06lu]\n",
                         k, rem_nsec_k / 1000,
                         u, rem_nsec_u / 1000);

and w/o a lot of effort (no heavy printk message traffic)

$ cat /proc/1/time_in_console_unlock
kern:[4.241475] user:[4.077787]

that's user space spent almost the same amount of time to print kernel
messages as the kernel did on its own. which is hard to formulate as an
issue, it's just user space was doing for 4 seconds something it was not
really meant to do (at least from user space app developer's point of
view); so there is an unpredictable additional cost X added to some of
the syscalls.

	-ss

> >     printk: introduce console_sync_mode
> >     
> >     console_sync_mode() should be called early in panic() to switch
> >     printk from async mode to sync. Otherwise, STOP IPIs can arrive
> >     to other CPUs too late and those CPUs will see oops_in_progress
> >     being 0 again.
> 
> So as I wrote, I like this in principle but there are much more places
> calling console_verbose() and all of them want console_sync_mode() as well.
> So I prefer hiding the sync printing in console_verbose() and possibly
> renaming it to something better but I'm not sure renaming is worth it.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/7] printk: Hand over printing to console if printing too long
  2016-01-05 14:37     ` Jan Kara
@ 2016-01-06  1:41       ` Sergey Senozhatsky
  2016-01-06  6:48       ` Sergey Senozhatsky
  1 sibling, 0 replies; 35+ messages in thread
From: Sergey Senozhatsky @ 2016-01-06  1:41 UTC (permalink / raw)
  To: Jan Kara
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Andrew Morton,
	Petr Mladek, KY Sri nivasan, Steven Rostedt, linux-kernel

Hello,

On (01/05/16 15:37), Jan Kara wrote:
> Hi,
> 
> On Wed 23-12-15 10:54:49, Sergey Senozhatsky wrote:
> > slowly looking through the patches.
> 
> Back from Christmas vacation...
> 
> > How about setting 'sync_print' to 'true' in...
> >   bust_spinlocks() /* only set to true */
> > or
> >   console_verbose() /* um... may be... */
> > or
> >   having a separate one-liner for that
> > 
> > void console_panic_mode(void)
> > {
> > 	sync_print = true;
> > }
> > 
> > and call it early in panic(), before we send out IPI_STOP.
> 
> I like using console_verbose() for setting sync_print to true. That will
> likely be more reliable than using oops in progress. After all
> console_verbose() is used like console_panic_mode() anyway and in quite a
> few places so it is a reasonable match.

Agree, only arch/microblaze/kernel/setup.c and arch/nios2/kernel/setup.c
do console_verbose() early in setup_arch(), the rest seems to be what I
was thinking of.

	-ss

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/7] printk: Hand over printing to console if printing too long
  2015-12-31  4:58       ` Sergey Senozhatsky
@ 2016-01-05 14:48         ` Jan Kara
  2016-01-06  3:38           ` Sergey Senozhatsky
  0 siblings, 1 reply; 35+ messages in thread
From: Jan Kara @ 2016-01-05 14:48 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Jan Kara, Andrew Morton, Petr Mladek, KY Sri nivasan,
	Steven Rostedt, linux-kernel, Sergey Senozhatsky

On Thu 31-12-15 13:58:59, Sergey Senozhatsky wrote:
> On (12/31/15 12:13), Sergey Senozhatsky wrote:
> [..]
> > cond_resched() does its job there, of course. well, a user process still can
> > do a lot of call_console_drivers() calls. may be we can check who is calling
> > console_unlock() and if we have "!printk_sync && !oops_in_progress" (or just printk_sync
> > test) AND a user process then return from console_unlock() doing irq_work_queue()
> > and set PRINTK_PENDING_OUTPUT pending bit, the way vprintk_emit() does it.
> 
> attached two patches, I ended up having on top of yours. just in case.
> 
>     printk: factor out can_printk_async()
>     
>     console_unlock() can be called directly or indirectly by a user
>     space process, so it can end up doing call_console_drivers() loop,
>     which will hold it from returning back to user-space from a syscall
>     for unpredictable amount of time.
>     
>     Factor out can_printk_async() function, which queues an irq work and
>     sets a PRINTK_PENDING_OUTPUT pending bit (if we can do async printk).
>     vprintk_emit() already does it, add can_printk_async() call to
>     console_unlock() for !PF_KTHREAD processes.

I'd be cautious about changing this userspace visible behavior. Someone may
be relying on it... I agree that sometimes we can block userspace process
in kernel for a long time (e.g. in my testing I often see syslog process
doing the printing) but so far I didn't see / was notified about some real
problem with this. So unless I see some real user issues with user
processes doing printing for too long I would not touch this.
 
> and
> 
>     printk: introduce console_sync_mode
>     
>     console_sync_mode() should be called early in panic() to switch
>     printk from async mode to sync. Otherwise, STOP IPIs can arrive
>     to other CPUs too late and those CPUs will see oops_in_progress
>     being 0 again.

So as I wrote, I like this in principle but there are much more places
calling console_verbose() and all of them want console_sync_mode() as well.
So I prefer hiding the sync printing in console_verbose() and possibly
renaming it to something better but I'm not sure renaming is worth it.

								Honza

> 
> 	-ss

> From c3fc955809adab8f497cdc7581e67e1fa29d6517 Mon Sep 17 00:00:00 2001
> From: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
> Date: Wed, 30 Dec 2015 20:39:12 +0900
> Subject: [PATCH 1/2] printk: introduce console_sync_mode
> 
> console_sync_mode() should be called early in panic() to switch
> printk from async mode to sync. Otherwise, STOP IPIs can arrive
> to other CPUs too late and those CPUs will see oops_in_progress
> being 0 again.
> ---
>  include/linux/console.h | 1 +
>  kernel/panic.c          | 1 +
>  kernel/printk/printk.c  | 5 +++++
>  3 files changed, 7 insertions(+)
> 
> diff --git a/include/linux/console.h b/include/linux/console.h
> index bd19434..f068985 100644
> --- a/include/linux/console.h
> +++ b/include/linux/console.h
> @@ -150,6 +150,7 @@ extern int console_trylock(void);
>  extern void console_unlock(void);
>  extern void console_conditional_schedule(void);
>  extern void console_unblank(void);
> +extern void console_sync_mode(void);
>  extern struct tty_driver *console_device(int *);
>  extern void console_stop(struct console *);
>  extern void console_start(struct console *);
> diff --git a/kernel/panic.c b/kernel/panic.c
> index b333380..04c8ff4 100644
> --- a/kernel/panic.c
> +++ b/kernel/panic.c
> @@ -117,6 +117,7 @@ void panic(const char *fmt, ...)
>  	if (old_cpu != PANIC_CPU_INVALID && old_cpu != this_cpu)
>  		panic_smp_self_stop();
>  
> +	console_sync_mode();
>  	console_verbose();
>  	bust_spinlocks(1);
>  	va_start(args, fmt);
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index de9d31b..47a70a2 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -2501,6 +2501,11 @@ void console_unblank(void)
>  	console_unlock();
>  }
>  
> +void console_sync_mode(void)
> +{
> +	printk_sync = true;
> +}
> +
>  /*
>   * Return the console tty driver structure and its associated index
>   */
> -- 
> 2.6.4
> 

> From 92f2c0f2a5ed015caa2757dcfec4407d708f8628 Mon Sep 17 00:00:00 2001
> From: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
> Date: Thu, 31 Dec 2015 13:39:58 +0900
> Subject: [PATCH 2/2] printk: factor out can_printk_async()
> 
> console_unlock() can be called directly or indirectly by a user
> space process, so it can end up doing call_console_drivers() loop,
> which will hold it from returning back to user-space from a syscall
> for unpredictable amount of time.
> 
> Factor out can_printk_async() function, which queues an irq work and
> sets a PRINTK_PENDING_OUTPUT pending bit (if we can do async printk).
> vprintk_emit() already does it, add can_printk_async() call to
> console_unlock() for !PF_KTHREAD processes.
> ---
>  kernel/printk/printk.c | 42 ++++++++++++++++++++++++++++--------------
>  1 file changed, 28 insertions(+), 14 deletions(-)
> 
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 47a70a2..7d3a8e1 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -355,6 +355,26 @@ int printk_deferred(const char *fmt, ...)
>  	return r;
>  }
>  
> +static bool can_printk_async(bool sync)
> +{
> +	/*
> +	 * By default we print message to console asynchronously so that kernel
> +	 * doesn't get stalled due to slow serial console. That can lead to
> +	 * softlockups, lost interrupts, or userspace timing out under heavy
> +	 * printing load.
> +	 *
> +	 * However we resort to synchronous printing of messages during early
> +	 * boot, when oops is in progress, or when synchronous printing was
> +	 * explicitely requested by kernel parameter.
> +	 */
> +	if (keventd_up() && !oops_in_progress && !sync) {
> +		__this_cpu_or(printk_pending, PRINTK_PENDING_OUTPUT);
> +		irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
> +		return true;
> +	}
> +	return false;
> +}
> +
>  /* Return log buffer address */
>  char *log_buf_addr_get(void)
>  {
> @@ -1889,20 +1909,7 @@ asmlinkage int vprintk_emit(int facility, int level,
>  	logbuf_cpu = UINT_MAX;
>  	raw_spin_unlock(&logbuf_lock);
>  	lockdep_on();
> -	/*
> -	 * By default we print message to console asynchronously so that kernel
> -	 * doesn't get stalled due to slow serial console. That can lead to
> -	 * softlockups, lost interrupts, or userspace timing out under heavy
> -	 * printing load.
> -	 *
> -	 * However we resort to synchronous printing of messages during early
> -	 * boot, when oops is in progress, or when synchronous printing was
> -	 * explicitely requested by kernel parameter.
> -	 */
> -	if (keventd_up() && !oops_in_progress && !sync_print) {
> -		__this_cpu_or(printk_pending, PRINTK_PENDING_OUTPUT);
> -		irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
> -	} else
> +	if (!can_printk_async(sync_print))
>  		sync_print = true;
>  	local_irq_restore(flags);
>  
> @@ -2328,6 +2335,13 @@ void console_unlock(void)
>  		return;
>  	}
>  
> +	if (!(current->flags & PF_KTHREAD) &&
> +			can_printk_async(printk_sync)) {
> +		console_locked = 0;
> +		up_console_sem();
> +		return;
> +	}
> +
>  	/*
>  	 * Console drivers are called under logbuf_lock, so
>  	 * @console_may_schedule should be cleared before; however, we may
> -- 
> 2.6.4
> 

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/7] printk: Hand over printing to console if printing too long
  2015-12-23  1:54   ` Sergey Senozhatsky
  2015-12-23  3:37     ` Sergey Senozhatsky
@ 2016-01-05 14:37     ` Jan Kara
  2016-01-06  1:41       ` Sergey Senozhatsky
  2016-01-06  6:48       ` Sergey Senozhatsky
  1 sibling, 2 replies; 35+ messages in thread
From: Jan Kara @ 2016-01-05 14:37 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Jan Kara, Sergey Senozhatsky, Andrew Morton, Petr Mladek,
	KY Sri nivasan, Steven Rostedt, linux-kernel

Hi,

On Wed 23-12-15 10:54:49, Sergey Senozhatsky wrote:
> slowly looking through the patches.

Back from Christmas vacation...

> How about setting 'sync_print' to 'true' in...
>   bust_spinlocks() /* only set to true */
> or
>   console_verbose() /* um... may be... */
> or
>   having a separate one-liner for that
> 
> void console_panic_mode(void)
> {
> 	sync_print = true;
> }
> 
> and call it early in panic(), before we send out IPI_STOP.

I like using console_verbose() for setting sync_print to true. That will
likely be more reliable than using oops in progress. After all
console_verbose() is used like console_panic_mode() anyway and in quite a
few places so it is a reasonable match.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/7] printk: Hand over printing to console if printing too long
  2015-12-31  3:13     ` Sergey Senozhatsky
@ 2015-12-31  4:58       ` Sergey Senozhatsky
  2016-01-05 14:48         ` Jan Kara
  0 siblings, 1 reply; 35+ messages in thread
From: Sergey Senozhatsky @ 2015-12-31  4:58 UTC (permalink / raw)
  To: Jan Kara
  Cc: Andrew Morton, Petr Mladek, KY Sri nivasan, Steven Rostedt,
	linux-kernel, Sergey Senozhatsky, Sergey Senozhatsky

[-- Attachment #1: Type: text/plain, Size: 1378 bytes --]

On (12/31/15 12:13), Sergey Senozhatsky wrote:
[..]
> cond_resched() does its job there, of course. well, a user process still can
> do a lot of call_console_drivers() calls. may be we can check who is calling
> console_unlock() and if we have "!printk_sync && !oops_in_progress" (or just printk_sync
> test) AND a user process then return from console_unlock() doing irq_work_queue()
> and set PRINTK_PENDING_OUTPUT pending bit, the way vprintk_emit() does it.

attached two patches, I ended up having on top of yours. just in case.

    printk: factor out can_printk_async()
    
    console_unlock() can be called directly or indirectly by a user
    space process, so it can end up doing call_console_drivers() loop,
    which will hold it from returning back to user-space from a syscall
    for unpredictable amount of time.
    
    Factor out can_printk_async() function, which queues an irq work and
    sets a PRINTK_PENDING_OUTPUT pending bit (if we can do async printk).
    vprintk_emit() already does it, add can_printk_async() call to
    console_unlock() for !PF_KTHREAD processes.

and

    printk: introduce console_sync_mode
    
    console_sync_mode() should be called early in panic() to switch
    printk from async mode to sync. Otherwise, STOP IPIs can arrive
    to other CPUs too late and those CPUs will see oops_in_progress
    being 0 again.

	-ss

[-- Attachment #2: 0001-printk-introduce-console_sync_mode.patch --]
[-- Type: text/x-diff, Size: 1791 bytes --]

>From c3fc955809adab8f497cdc7581e67e1fa29d6517 Mon Sep 17 00:00:00 2001
From: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Date: Wed, 30 Dec 2015 20:39:12 +0900
Subject: [PATCH 1/2] printk: introduce console_sync_mode

console_sync_mode() should be called early in panic() to switch
printk from async mode to sync. Otherwise, STOP IPIs can arrive
to other CPUs too late and those CPUs will see oops_in_progress
being 0 again.
---
 include/linux/console.h | 1 +
 kernel/panic.c          | 1 +
 kernel/printk/printk.c  | 5 +++++
 3 files changed, 7 insertions(+)

diff --git a/include/linux/console.h b/include/linux/console.h
index bd19434..f068985 100644
--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -150,6 +150,7 @@ extern int console_trylock(void);
 extern void console_unlock(void);
 extern void console_conditional_schedule(void);
 extern void console_unblank(void);
+extern void console_sync_mode(void);
 extern struct tty_driver *console_device(int *);
 extern void console_stop(struct console *);
 extern void console_start(struct console *);
diff --git a/kernel/panic.c b/kernel/panic.c
index b333380..04c8ff4 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -117,6 +117,7 @@ void panic(const char *fmt, ...)
 	if (old_cpu != PANIC_CPU_INVALID && old_cpu != this_cpu)
 		panic_smp_self_stop();
 
+	console_sync_mode();
 	console_verbose();
 	bust_spinlocks(1);
 	va_start(args, fmt);
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index de9d31b..47a70a2 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2501,6 +2501,11 @@ void console_unblank(void)
 	console_unlock();
 }
 
+void console_sync_mode(void)
+{
+	printk_sync = true;
+}
+
 /*
  * Return the console tty driver structure and its associated index
  */
-- 
2.6.4


[-- Attachment #3: 0002-printk-factor-out-can_printk_async.patch --]
[-- Type: text/x-diff, Size: 3019 bytes --]

>From 92f2c0f2a5ed015caa2757dcfec4407d708f8628 Mon Sep 17 00:00:00 2001
From: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Date: Thu, 31 Dec 2015 13:39:58 +0900
Subject: [PATCH 2/2] printk: factor out can_printk_async()

console_unlock() can be called directly or indirectly by a user
space process, so it can end up doing call_console_drivers() loop,
which will hold it from returning back to user-space from a syscall
for unpredictable amount of time.

Factor out can_printk_async() function, which queues an irq work and
sets a PRINTK_PENDING_OUTPUT pending bit (if we can do async printk).
vprintk_emit() already does it, add can_printk_async() call to
console_unlock() for !PF_KTHREAD processes.
---
 kernel/printk/printk.c | 42 ++++++++++++++++++++++++++++--------------
 1 file changed, 28 insertions(+), 14 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 47a70a2..7d3a8e1 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -355,6 +355,26 @@ int printk_deferred(const char *fmt, ...)
 	return r;
 }
 
+static bool can_printk_async(bool sync)
+{
+	/*
+	 * By default we print message to console asynchronously so that kernel
+	 * doesn't get stalled due to slow serial console. That can lead to
+	 * softlockups, lost interrupts, or userspace timing out under heavy
+	 * printing load.
+	 *
+	 * However we resort to synchronous printing of messages during early
+	 * boot, when oops is in progress, or when synchronous printing was
+	 * explicitely requested by kernel parameter.
+	 */
+	if (keventd_up() && !oops_in_progress && !sync) {
+		__this_cpu_or(printk_pending, PRINTK_PENDING_OUTPUT);
+		irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
+		return true;
+	}
+	return false;
+}
+
 /* Return log buffer address */
 char *log_buf_addr_get(void)
 {
@@ -1889,20 +1909,7 @@ asmlinkage int vprintk_emit(int facility, int level,
 	logbuf_cpu = UINT_MAX;
 	raw_spin_unlock(&logbuf_lock);
 	lockdep_on();
-	/*
-	 * By default we print message to console asynchronously so that kernel
-	 * doesn't get stalled due to slow serial console. That can lead to
-	 * softlockups, lost interrupts, or userspace timing out under heavy
-	 * printing load.
-	 *
-	 * However we resort to synchronous printing of messages during early
-	 * boot, when oops is in progress, or when synchronous printing was
-	 * explicitely requested by kernel parameter.
-	 */
-	if (keventd_up() && !oops_in_progress && !sync_print) {
-		__this_cpu_or(printk_pending, PRINTK_PENDING_OUTPUT);
-		irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
-	} else
+	if (!can_printk_async(sync_print))
 		sync_print = true;
 	local_irq_restore(flags);
 
@@ -2328,6 +2335,13 @@ void console_unlock(void)
 		return;
 	}
 
+	if (!(current->flags & PF_KTHREAD) &&
+			can_printk_async(printk_sync)) {
+		console_locked = 0;
+		up_console_sem();
+		return;
+	}
+
 	/*
 	 * Console drivers are called under logbuf_lock, so
 	 * @console_may_schedule should be cleared before; however, we may
-- 
2.6.4


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/7] printk: Hand over printing to console if printing too long
  2015-12-31  2:44   ` Sergey Senozhatsky
@ 2015-12-31  3:13     ` Sergey Senozhatsky
  2015-12-31  4:58       ` Sergey Senozhatsky
  0 siblings, 1 reply; 35+ messages in thread
From: Sergey Senozhatsky @ 2015-12-31  3:13 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Jan Kara, Sergey Senozhatsky, Andrew Morton, Petr Mladek,
	KY Sri nivasan, Steven Rostedt, linux-kernel

On (12/31/15 11:44), Sergey Senozhatsky wrote:
> On (12/22/15 14:47), Jan Kara wrote:
> [..]
> > +int printk_deferred(const char *fmt, ...)
> > +{
> > +	va_list args;
> > +	int r;
> > +
> > +	va_start(args, fmt);
> > +	r = vprintk_emit(0, LOGLEVEL_SCHED, NULL, 0, fmt, args);
> > +	va_end(args);
> > +
> > +	return r;
> > +}
> [..]
> > @@ -1803,10 +1869,24 @@ asmlinkage int vprintk_emit(int facility, int level,
> >  	logbuf_cpu = UINT_MAX;
> >  	raw_spin_unlock(&logbuf_lock);
> >  	lockdep_on();
> > +	/*
> > +	 * By default we print message to console asynchronously so that kernel
> > +	 * doesn't get stalled due to slow serial console. That can lead to
> > +	 * softlockups, lost interrupts, or userspace timing out under heavy
> > +	 * printing load.
> > +	 *
> > +	 * However we resort to synchronous printing of messages during early
> > +	 * boot, when oops is in progress, or when synchronous printing was
> > +	 * explicitely requested by kernel parameter.
> > +	 */
> > +	if (keventd_up() && !oops_in_progress && !sync_print) {
> > +		__this_cpu_or(printk_pending, PRINTK_PENDING_OUTPUT);
> > +		irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
> > +	} else
> > +		sync_print = true;
> >  	local_irq_restore(flags);
> 
> So this fixes printk() and printk_deferred(), but it doesn't address any of the
> direct and indirect console_lock/console_unlock callers.
> 
> for example, direct:
> ~/_mmots$ git grep console_unlock | egrep -v "printk\.c|panic\.c|console\.h" | wc -l
> 199
> 
> indirect (e.g. via console_devices()):
> ~/_mmots$ git grep console_device | egrep -v "printk\.c|panic\.c|console\.h|_console_device" | wc -l
> 4
> 
> One of those indirect callers is tty_lookup_driver(), called from tty_open(). Which
> is quite big to ignore, I suspect.
>
> 
> A user space process opening a tty can end up doing that while (1) call_console_drivers()
> loop, I suspect. At least nothing prevents it, at a glance.

d'oh... sorry. that cold that I have is affecting me... no more emails for today.


cond_resched() does its job there, of course. well, a user process still can
do a lot of call_console_drivers() calls. may be we can check who is calling
console_unlock() and if we have "!printk_sync && !oops_in_progress" (or just printk_sync
test) AND a user process then return from console_unlock() doing irq_work_queue()
and set PRINTK_PENDING_OUTPUT pending bit, the way vprintk_emit() does it.

	-ss

> A side note, isn't it too often to cond_resched() from console_unlock()? What if
> we have 10000000 very short printk() messages (e.g. no more than 32 chars).

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/7] printk: Hand over printing to console if printing too long
  2015-12-22 13:47 ` Jan Kara
  2015-12-22 14:48   ` Sergey Senozhatsky
  2015-12-23  1:54   ` Sergey Senozhatsky
@ 2015-12-31  2:44   ` Sergey Senozhatsky
  2015-12-31  3:13     ` Sergey Senozhatsky
  2016-01-11 12:54   ` Petr Mladek
  3 siblings, 1 reply; 35+ messages in thread
From: Sergey Senozhatsky @ 2015-12-31  2:44 UTC (permalink / raw)
  To: Jan Kara
  Cc: Sergey Senozhatsky, Andrew Morton, Petr Mladek, KY Sri nivasan,
	Steven Rostedt, linux-kernel, Sergey Senozhatsky

Hello,

On (12/22/15 14:47), Jan Kara wrote:
[..]
> +int printk_deferred(const char *fmt, ...)
> +{
> +	va_list args;
> +	int r;
> +
> +	va_start(args, fmt);
> +	r = vprintk_emit(0, LOGLEVEL_SCHED, NULL, 0, fmt, args);
> +	va_end(args);
> +
> +	return r;
> +}
[..]
> @@ -1803,10 +1869,24 @@ asmlinkage int vprintk_emit(int facility, int level,
>  	logbuf_cpu = UINT_MAX;
>  	raw_spin_unlock(&logbuf_lock);
>  	lockdep_on();
> +	/*
> +	 * By default we print message to console asynchronously so that kernel
> +	 * doesn't get stalled due to slow serial console. That can lead to
> +	 * softlockups, lost interrupts, or userspace timing out under heavy
> +	 * printing load.
> +	 *
> +	 * However we resort to synchronous printing of messages during early
> +	 * boot, when oops is in progress, or when synchronous printing was
> +	 * explicitely requested by kernel parameter.
> +	 */
> +	if (keventd_up() && !oops_in_progress && !sync_print) {
> +		__this_cpu_or(printk_pending, PRINTK_PENDING_OUTPUT);
> +		irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
> +	} else
> +		sync_print = true;
>  	local_irq_restore(flags);

So this fixes printk() and printk_deferred(), but it doesn't address any of the
direct and indirect console_lock/console_unlock callers.

for example, direct:
~/_mmots$ git grep console_unlock | egrep -v "printk\.c|panic\.c|console\.h" | wc -l
199

indirect (e.g. via console_devices()):
~/_mmots$ git grep console_device | egrep -v "printk\.c|panic\.c|console\.h|_console_device" | wc -l
4

One of those indirect callers is tty_lookup_driver(), called from tty_open(). Which
is quite big to ignore, I suspect.

A user space process opening a tty can end up doing that while (1) call_console_drivers()
loop, I suspect. At least nothing prevents it, at a glance.

A side note, isn't it too often to cond_resched() from console_unlock()? What if
we have 10000000 very short printk() messages (e.g. no more than 32 chars).

	-ss

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/7] printk: Hand over printing to console if printing too long
  2015-12-23  3:57       ` Sergey Senozhatsky
@ 2015-12-23  4:15         ` Sergey Senozhatsky
  0 siblings, 0 replies; 35+ messages in thread
From: Sergey Senozhatsky @ 2015-12-23  4:15 UTC (permalink / raw)
  To: Jan Kara
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Andrew Morton,
	Petr Mladek, KY Sri nivasan, Steven Rostedt, linux-kernel

On (12/23/15 12:57), Sergey Senozhatsky wrote:
[..]
> > > can we replace this oops_in_progress check with something more reliable?
> > > 
> > > CPU0                                CPU1 - CPUN
> > > panic()
> > >  local_irq_disable()                executing foo() with irqs disabled,
> > >  console_verbose()                  or processing an extremely long irq handler.
> > >  bust_spinlocks()
> > >     oops_in_progress++
> 
> 					or we huge enough number of CPUs, `deep' stack
> 					traces, slow serial and CPU doing dump_stack()
> 					under raw_spin_lock(&stop_lock), so it can take
> 					longer than 1 second to print the stacks and
> 					thus panic CPU will set oops_in_progress back
> 					to 0.
> 
> > >  smp_send_stop()
> > > 
> > >  bust_spinlocks()
> > >     oops_in_progress--              ok, IPI arrives
> > >                                     dump_stack()/printk()/etc from IPI_CPU_STOP
> > > 			            "while (1) cpu_relax()" with irq/fiq disabled/halt/etc.
> > > 
> > > smp_send_stop() wrapped in `oops_in_progress++/oops_in_progress--' is arch specific,
> > > and some platforms don't do any IPI-delivered (e.g. via num_online_cpus()) checks at
> > > all. Some do. For example, arm/arm64:
> > > 
> > > void smp_send_stop(void)
> > > ...
> > >         /* Wait up to one second for other CPUs to stop */
> > >         timeout = USEC_PER_SEC;
> > >         while (num_online_cpus() > 1 && timeout--)
> > >                 udelay(1);
> > > 
> > >         if (num_online_cpus() > 1)
> > >                 pr_warn("SMP: failed to stop secondary CPUs\n");
> > > ...
> > > 
> > > 
> > > so there are non-zero chances that IPI will arrive to CPU after 'oops_in_progress--',
> > > and thus dump_stack()/etc. happening on that/those cpu/cpus will be lost.
> > > 
> > > 
> > > bust_spinlocks(0) does
> > > ...
> > > 	if (--oops_in_progress == 0)
> > > 		wake_up_klogd();
> > > ...
> > > 
> > > but local cpu has irqs disabled and `panic_timeout' can be zero.

well, if panic_timeout != 0, then wake_up_klogd() calls irq_work_queue() which
schedule_work. what if we have the following

CPU0						CPU1 - CPUN

foo
preempt_disable
	bar
		panic				irq/fiq disable
			schedule_work		while (1) cpu_relax

	-ss

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/7] printk: Hand over printing to console if printing too long
  2015-12-23  3:37     ` Sergey Senozhatsky
@ 2015-12-23  3:57       ` Sergey Senozhatsky
  2015-12-23  4:15         ` Sergey Senozhatsky
  0 siblings, 1 reply; 35+ messages in thread
From: Sergey Senozhatsky @ 2015-12-23  3:57 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Jan Kara, Sergey Senozhatsky, Andrew Morton, Petr Mladek,
	KY Sri nivasan, Steven Rostedt, linux-kernel

On (12/23/15 12:37), Sergey Senozhatsky wrote:
> Date: Wed, 23 Dec 2015 12:37:24 +0900
> From: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
> To: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
> Cc: Jan Kara <jack@suse.cz>, Sergey Senozhatsky
>  <sergey.senozhatsky@gmail.com>, Andrew Morton <akpm@linux-foundation.org>,
>  Petr Mladek <pmladek@suse.cz>, KY Sri nivasan <kys@microsoft.com>, Steven
>  Rostedt <rostedt@goodmis.org>, linux-kernel@vger.kernel.org
> Subject: Re: [PATCH 1/7] printk: Hand over printing to console if printing
>  too long
> User-Agent: Mutt/1.5.24 (2015-08-30)
> 
> On (12/23/15 10:54), Sergey Senozhatsky wrote:
> > On (12/22/15 14:47), Jan Kara wrote:
> > [..]
> > > @@ -1803,10 +1869,24 @@ asmlinkage int vprintk_emit(int facility, int level,
> > >  	logbuf_cpu = UINT_MAX;
> > >  	raw_spin_unlock(&logbuf_lock);
> > >  	lockdep_on();
> > > +	/*
> > > +	 * By default we print message to console asynchronously so that kernel
> > > +	 * doesn't get stalled due to slow serial console. That can lead to
> > > +	 * softlockups, lost interrupts, or userspace timing out under heavy
> > > +	 * printing load.
> > > +	 *
> > > +	 * However we resort to synchronous printing of messages during early
> > > +	 * boot, when oops is in progress, or when synchronous printing was
> > > +	 * explicitely requested by kernel parameter.
> > > +	 */
> > > +	if (keventd_up() && !oops_in_progress && !sync_print) {
> > > +		__this_cpu_or(printk_pending, PRINTK_PENDING_OUTPUT);
> > > +		irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
> > > +	} else
> > > +		sync_print = true;
> 
> oops, didn't have enough coffee... missed that `else sync_print = true' :(
> 

ah, never mind my previous email... it's a local variable, so the very next printk()
happening right after bust_spinlocks(0) will irq_work_queue(). I'd prefer CPUs to
print stacks rather than burn cpu cycles in `while (1) cpu_relax()' loop.

so

else {
	printk_sync = true;
	sync_print = true; /* and remove this local variable entirely may be*/
}

> > can we replace this oops_in_progress check with something more reliable?
> > 
> > CPU0                                CPU1 - CPUN
> > panic()
> >  local_irq_disable()                executing foo() with irqs disabled,
> >  console_verbose()                  or processing an extremely long irq handler.
> >  bust_spinlocks()
> >     oops_in_progress++

					or we huge enough number of CPUs, `deep' stack
					traces, slow serial and CPU doing dump_stack()
					under raw_spin_lock(&stop_lock), so it can take
					longer than 1 second to print the stacks and
					thus panic CPU will set oops_in_progress back
					to 0.

> >  smp_send_stop()
> > 
> >  bust_spinlocks()
> >     oops_in_progress--              ok, IPI arrives
> >                                     dump_stack()/printk()/etc from IPI_CPU_STOP
> > 			            "while (1) cpu_relax()" with irq/fiq disabled/halt/etc.
> > 
> > smp_send_stop() wrapped in `oops_in_progress++/oops_in_progress--' is arch specific,
> > and some platforms don't do any IPI-delivered (e.g. via num_online_cpus()) checks at
> > all. Some do. For example, arm/arm64:
> > 
> > void smp_send_stop(void)
> > ...
> >         /* Wait up to one second for other CPUs to stop */
> >         timeout = USEC_PER_SEC;
> >         while (num_online_cpus() > 1 && timeout--)
> >                 udelay(1);
> > 
> >         if (num_online_cpus() > 1)
> >                 pr_warn("SMP: failed to stop secondary CPUs\n");
> > ...
> > 
> > 
> > so there are non-zero chances that IPI will arrive to CPU after 'oops_in_progress--',
> > and thus dump_stack()/etc. happening on that/those cpu/cpus will be lost.
> > 
> > 
> > bust_spinlocks(0) does
> > ...
> > 	if (--oops_in_progress == 0)
> > 		wake_up_klogd();
> > ...
> > 
> > but local cpu has irqs disabled and `panic_timeout' can be zero.
> > 
> > How about setting 'sync_print' to 'true' in...
> >   bust_spinlocks() /* only set to true */
> > or
> >   console_verbose() /* um... may be... */
> > or
> >   having a separate one-liner for that
> > 
> > void console_panic_mode(void)
> > {
> > 	sync_print = true;

	printk_sync = true;

> > }
> > 
> > and call it early in panic(), before we send out IPI_STOP.


	-ss

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/7] printk: Hand over printing to console if printing too long
  2015-12-23  1:54   ` Sergey Senozhatsky
@ 2015-12-23  3:37     ` Sergey Senozhatsky
  2015-12-23  3:57       ` Sergey Senozhatsky
  2016-01-05 14:37     ` Jan Kara
  1 sibling, 1 reply; 35+ messages in thread
From: Sergey Senozhatsky @ 2015-12-23  3:37 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Jan Kara, Sergey Senozhatsky, Andrew Morton, Petr Mladek,
	KY Sri nivasan, Steven Rostedt, linux-kernel

On (12/23/15 10:54), Sergey Senozhatsky wrote:
> On (12/22/15 14:47), Jan Kara wrote:
> [..]
> > @@ -1803,10 +1869,24 @@ asmlinkage int vprintk_emit(int facility, int level,
> >  	logbuf_cpu = UINT_MAX;
> >  	raw_spin_unlock(&logbuf_lock);
> >  	lockdep_on();
> > +	/*
> > +	 * By default we print message to console asynchronously so that kernel
> > +	 * doesn't get stalled due to slow serial console. That can lead to
> > +	 * softlockups, lost interrupts, or userspace timing out under heavy
> > +	 * printing load.
> > +	 *
> > +	 * However we resort to synchronous printing of messages during early
> > +	 * boot, when oops is in progress, or when synchronous printing was
> > +	 * explicitely requested by kernel parameter.
> > +	 */
> > +	if (keventd_up() && !oops_in_progress && !sync_print) {
> > +		__this_cpu_or(printk_pending, PRINTK_PENDING_OUTPUT);
> > +		irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
> > +	} else
> > +		sync_print = true;

oops, didn't have enough coffee... missed that `else sync_print = true' :(

	-ss

> >  	local_irq_restore(flags);
> 
> can we replace this oops_in_progress check with something more reliable?
> 
> CPU0                                CPU1 - CPUN
> panic()
>  local_irq_disable()                executing foo() with irqs disabled,
>  console_verbose()                  or processing an extremely long irq handler.
>  bust_spinlocks()
>     oops_in_progress++
> 
>  smp_send_stop()
> 
>  bust_spinlocks()
>     oops_in_progress--              ok, IPI arrives
>                                     dump_stack()/printk()/etc from IPI_CPU_STOP
> 			            "while (1) cpu_relax()" with irq/fiq disabled/halt/etc.
> 
> smp_send_stop() wrapped in `oops_in_progress++/oops_in_progress--' is arch specific,
> and some platforms don't do any IPI-delivered (e.g. via num_online_cpus()) checks at
> all. Some do. For example, arm/arm64:
> 
> void smp_send_stop(void)
> ...
>         /* Wait up to one second for other CPUs to stop */
>         timeout = USEC_PER_SEC;
>         while (num_online_cpus() > 1 && timeout--)
>                 udelay(1);
> 
>         if (num_online_cpus() > 1)
>                 pr_warn("SMP: failed to stop secondary CPUs\n");
> ...
> 
> 
> so there are non-zero chances that IPI will arrive to CPU after 'oops_in_progress--',
> and thus dump_stack()/etc. happening on that/those cpu/cpus will be lost.
> 
> 
> bust_spinlocks(0) does
> ...
> 	if (--oops_in_progress == 0)
> 		wake_up_klogd();
> ...
> 
> but local cpu has irqs disabled and `panic_timeout' can be zero.
> 
> How about setting 'sync_print' to 'true' in...
>   bust_spinlocks() /* only set to true */
> or
>   console_verbose() /* um... may be... */
> or
>   having a separate one-liner for that
> 
> void console_panic_mode(void)
> {
> 	sync_print = true;
> }
> 
> and call it early in panic(), before we send out IPI_STOP.
> 
> 	-ss
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/7] printk: Hand over printing to console if printing too long
  2015-12-22 13:47 ` Jan Kara
  2015-12-22 14:48   ` Sergey Senozhatsky
@ 2015-12-23  1:54   ` Sergey Senozhatsky
  2015-12-23  3:37     ` Sergey Senozhatsky
  2016-01-05 14:37     ` Jan Kara
  2015-12-31  2:44   ` Sergey Senozhatsky
  2016-01-11 12:54   ` Petr Mladek
  3 siblings, 2 replies; 35+ messages in thread
From: Sergey Senozhatsky @ 2015-12-23  1:54 UTC (permalink / raw)
  To: Jan Kara
  Cc: Sergey Senozhatsky, Andrew Morton, Petr Mladek, KY Sri nivasan,
	Steven Rostedt, linux-kernel, Sergey Senozhatsky

Hi,

slowly looking through the patches.

On (12/22/15 14:47), Jan Kara wrote:
[..]
> @@ -1803,10 +1869,24 @@ asmlinkage int vprintk_emit(int facility, int level,
>  	logbuf_cpu = UINT_MAX;
>  	raw_spin_unlock(&logbuf_lock);
>  	lockdep_on();
> +	/*
> +	 * By default we print message to console asynchronously so that kernel
> +	 * doesn't get stalled due to slow serial console. That can lead to
> +	 * softlockups, lost interrupts, or userspace timing out under heavy
> +	 * printing load.
> +	 *
> +	 * However we resort to synchronous printing of messages during early
> +	 * boot, when oops is in progress, or when synchronous printing was
> +	 * explicitely requested by kernel parameter.
> +	 */
> +	if (keventd_up() && !oops_in_progress && !sync_print) {
> +		__this_cpu_or(printk_pending, PRINTK_PENDING_OUTPUT);
> +		irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
> +	} else
> +		sync_print = true;
>  	local_irq_restore(flags);

can we replace this oops_in_progress check with something more reliable?

CPU0                                CPU1 - CPUN
panic()
 local_irq_disable()                executing foo() with irqs disabled,
 console_verbose()                  or processing an extremely long irq handler.
 bust_spinlocks()
    oops_in_progress++

 smp_send_stop()

 bust_spinlocks()
    oops_in_progress--              ok, IPI arrives
                                    dump_stack()/printk()/etc from IPI_CPU_STOP
			            "while (1) cpu_relax()" with irq/fiq disabled/halt/etc.

smp_send_stop() wrapped in `oops_in_progress++/oops_in_progress--' is arch specific,
and some platforms don't do any IPI-delivered (e.g. via num_online_cpus()) checks at
all. Some do. For example, arm/arm64:

void smp_send_stop(void)
...
        /* Wait up to one second for other CPUs to stop */
        timeout = USEC_PER_SEC;
        while (num_online_cpus() > 1 && timeout--)
                udelay(1);

        if (num_online_cpus() > 1)
                pr_warn("SMP: failed to stop secondary CPUs\n");
...


so there are non-zero chances that IPI will arrive to CPU after 'oops_in_progress--',
and thus dump_stack()/etc. happening on that/those cpu/cpus will be lost.


bust_spinlocks(0) does
...
	if (--oops_in_progress == 0)
		wake_up_klogd();
...

but local cpu has irqs disabled and `panic_timeout' can be zero.

How about setting 'sync_print' to 'true' in...
  bust_spinlocks() /* only set to true */
or
  console_verbose() /* um... may be... */
or
  having a separate one-liner for that

void console_panic_mode(void)
{
	sync_print = true;
}

and call it early in panic(), before we send out IPI_STOP.

	-ss

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/7] printk: Hand over printing to console if printing too long
  2015-12-22 13:47 ` Jan Kara
@ 2015-12-22 14:48   ` Sergey Senozhatsky
  2015-12-23  1:54   ` Sergey Senozhatsky
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 35+ messages in thread
From: Sergey Senozhatsky @ 2015-12-22 14:48 UTC (permalink / raw)
  To: Jan Kara
  Cc: Sergey Senozhatsky, Andrew Morton, Petr Mladek, KY Sri nivasan,
	Steven Rostedt, linux-kernel, Sergey Senozhatsky

On (12/22/15 14:47), Jan Kara wrote:
[..]
> Thanks for looking into my patches and sorry for replying with a delay. As
> I wrote in my previous email [1] even the referenced patches are not quite
> enough. Over last few days I have worked on redoing the stuff as we
> discussed with Linus and Andrew at Kernel Summit and I have new patches
> which are working fine for me. I still want to test them on some machines
> having real issues with udev during boot but so far stress-testing with
> serial console slowed down to ~1000 chars/sec on other machines and VMs
> looks promising.
> 
> I'm attaching them in case you want to have a look. They are on top of
> Tejun's patch adding cond_resched() (which is essential). I'll officially
> submit the patches once the testing is finished (but I'm not sure when I
> get to the problematic HW...).
> 

Hello,

Thanks a lot! Will take a look.

	-ss

> From 2e9675abbfc0df4a24a8c760c58e8150b9a31259 Mon Sep 17 00:00:00 2001
> From: Jan Kara <jack@suse.cz>
> Date: Mon, 21 Dec 2015 13:10:31 +0100
> Subject: [PATCH 1/2] printk: Make printk() completely async
> 
> Currently, printk() sometimes waits for message to be printed to console
> and sometimes it does not (when console_sem is held by some other
> process). In case printk() grabs console_sem and starts printing to
> console, it prints messages from kernel printk buffer until the buffer
> is empty. When serial console is attached, printing is slow and thus
> other CPUs in the system have plenty of time to append new messages to
> the buffer while one CPU is printing. Thus the CPU can spend unbounded
> amount of time doing printing in console_unlock().  This is especially
> serious problem if the printk() calling console_unlock() was called with
> interrupts disabled.
> 
> In practice users have observed a CPU can spend tens of seconds printing
> in console_unlock() (usually during boot when hundreds of SCSI devices
> are discovered) resulting in RCU stalls (CPU doing printing doesn't
> reach quiescent state for a long time), softlockup reports (IPIs for the
> printing CPU don't get served and thus other CPUs are spinning waiting
> for the printing CPU to process IPIs), and eventually a machine death
> (as messages from stalls and lockups append to printk buffer faster than
> we are able to print). So these machines are unable to boot with serial
> console attached. Another observed issue is that due to slow printk,
> hardware discovery is slow and udev times out before kernel manages to
> discover all the attached HW. Also during artificial stress testing SATA
> disk disappears from the system because its interrupts aren't served for
> too long.
> 
> This patch makes printk() completely asynchronous (similar to what
> printk_deferred() did until now). It appends message to the kernel
> printk buffer and queues work to do the printing to console. This has
> the advantage that printing always happens from a schedulable contex and
> thus we don't lockup any particular CPU or even interrupts. Also it has
> the advantage that printk() is fast and thus kernel booting is not
> slowed down by slow serial console. Disadvantage of this method is that
> in case of crash there is higher chance that important messages won't
> appear in console output (we may need working scheduling to print
> message to console). We somewhat mitigate this risk by switching printk
> to the original method of immediate printing to console if oops is in
> progress.  Also for debugging purposes we provide printk.synchronous
> kernel parameter which resorts to the original printk behavior.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  Documentation/kernel-parameters.txt |  10 +++
>  kernel/printk/printk.c              | 144 +++++++++++++++++++++---------------
>  2 files changed, 95 insertions(+), 59 deletions(-)
> 
> diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
> index 742f69d18fc8..4cf1bddeffc7 100644
> --- a/Documentation/kernel-parameters.txt
> +++ b/Documentation/kernel-parameters.txt
> @@ -3000,6 +3000,16 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
>  	printk.time=	Show timing data prefixed to each printk message line
>  			Format: <bool>  (1/Y/y=enable, 0/N/n=disable)
>  
> +	printk.synchronous=
> +			By default kernel messages are printed to console
> +			asynchronously (except during early boot or when oops
> +			is happening). That avoids kernel stalling behind slow
> +			serial console and thus avoids softlockups, interrupt
> +			timeouts, or userspace timing out during heavy printing.
> +			However for debugging problems, printing messages to
> +			console immediately may be desirable. This option
> +			enables such behavior.
> +
>  	processor.max_cstate=	[HW,ACPI]
>  			Limit processor to maximum C-state
>  			max_cstate=9 overrides any DMI blacklist limit.
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 299c2f0e7350..d455d1bd0d2c 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -283,6 +283,73 @@ static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN);
>  static char *log_buf = __log_buf;
>  static u32 log_buf_len = __LOG_BUF_LEN;
>  
> +/*
> + * When true, printing to console will happen synchronously unless someone else
> + * is already printing messages.
> + */
> +static bool __read_mostly printk_sync;
> +module_param_named(synchronous, printk_sync, bool, S_IRUGO | S_IWUSR);
> +MODULE_PARM_DESC(synchronous, "make printing to console synchronous");
> +
> +#define PRINTK_PENDING_WAKEUP	0x01
> +#define PRINTK_PENDING_OUTPUT	0x02
> +
> +static DEFINE_PER_CPU(int, printk_pending);
> +
> +static void printing_work_func(struct work_struct *work)
> +{
> +	console_lock();
> +	console_unlock();
> +}
> +
> +static DECLARE_WORK(printing_work, printing_work_func);
> +
> +static void wake_up_klogd_work_func(struct irq_work *irq_work)
> +{
> +	int pending = __this_cpu_xchg(printk_pending, 0);
> +
> +	/*
> +	 * We just schedule regular work to do the printing from irq work. We
> +	 * don't want to do printing here directly as that happens with
> +	 * interrupts disabled and thus is bad for interrupt latency. We also
> +	 * don't want to queue regular work from vprintk_emit() as that gets
> +	 * called in various difficult contexts where schedule_work() could
> +	 * deadlock.
> +	 */
> +	if (pending & PRINTK_PENDING_OUTPUT)
> +		schedule_work(&printing_work);
> +
> +	if (pending & PRINTK_PENDING_WAKEUP)
> +		wake_up_interruptible(&log_wait);
> +}
> +
> +static DEFINE_PER_CPU(struct irq_work, wake_up_klogd_work) = {
> +	.func = wake_up_klogd_work_func,
> +	.flags = IRQ_WORK_LAZY,
> +};
> +
> +void wake_up_klogd(void)
> +{
> +	preempt_disable();
> +	if (waitqueue_active(&log_wait)) {
> +		this_cpu_or(printk_pending, PRINTK_PENDING_WAKEUP);
> +		irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
> +	}
> +	preempt_enable();
> +}
> +
> +int printk_deferred(const char *fmt, ...)
> +{
> +	va_list args;
> +	int r;
> +
> +	va_start(args, fmt);
> +	r = vprintk_emit(0, LOGLEVEL_SCHED, NULL, 0, fmt, args);
> +	va_end(args);
> +
> +	return r;
> +}
> +
>  /* Return log buffer address */
>  char *log_buf_addr_get(void)
>  {
> @@ -1668,15 +1735,14 @@ asmlinkage int vprintk_emit(int facility, int level,
>  	unsigned long flags;
>  	int this_cpu;
>  	int printed_len = 0;
> -	bool in_sched = false;
> +	bool sync_print = printk_sync;
>  	/* cpu currently holding logbuf_lock in this function */
>  	static unsigned int logbuf_cpu = UINT_MAX;
>  
>  	if (level == LOGLEVEL_SCHED) {
>  		level = LOGLEVEL_DEFAULT;
> -		in_sched = true;
> +		sync_print = false;
>  	}
> -
>  	boot_delay_msec(level);
>  	printk_delay();
>  
> @@ -1803,10 +1869,24 @@ asmlinkage int vprintk_emit(int facility, int level,
>  	logbuf_cpu = UINT_MAX;
>  	raw_spin_unlock(&logbuf_lock);
>  	lockdep_on();
> +	/*
> +	 * By default we print message to console asynchronously so that kernel
> +	 * doesn't get stalled due to slow serial console. That can lead to
> +	 * softlockups, lost interrupts, or userspace timing out under heavy
> +	 * printing load.
> +	 *
> +	 * However we resort to synchronous printing of messages during early
> +	 * boot, when oops is in progress, or when synchronous printing was
> +	 * explicitely requested by kernel parameter.
> +	 */
> +	if (keventd_up() && !oops_in_progress && !sync_print) {
> +		__this_cpu_or(printk_pending, PRINTK_PENDING_OUTPUT);
> +		irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
> +	} else
> +		sync_print = true;
>  	local_irq_restore(flags);
>  
> -	/* If called from the scheduler, we can not call up(). */
> -	if (!in_sched) {
> +	if (sync_print) {
>  		lockdep_off();
>  		/*
>  		 * Disable preemption to avoid being preempted while holding
> @@ -2688,60 +2768,6 @@ late_initcall(printk_late_init);
>  
>  #if defined CONFIG_PRINTK
>  /*
> - * Delayed printk version, for scheduler-internal messages:
> - */
> -#define PRINTK_PENDING_WAKEUP	0x01
> -#define PRINTK_PENDING_OUTPUT	0x02
> -
> -static DEFINE_PER_CPU(int, printk_pending);
> -
> -static void wake_up_klogd_work_func(struct irq_work *irq_work)
> -{
> -	int pending = __this_cpu_xchg(printk_pending, 0);
> -
> -	if (pending & PRINTK_PENDING_OUTPUT) {
> -		/* If trylock fails, someone else is doing the printing */
> -		if (console_trylock())
> -			console_unlock();
> -	}
> -
> -	if (pending & PRINTK_PENDING_WAKEUP)
> -		wake_up_interruptible(&log_wait);
> -}
> -
> -static DEFINE_PER_CPU(struct irq_work, wake_up_klogd_work) = {
> -	.func = wake_up_klogd_work_func,
> -	.flags = IRQ_WORK_LAZY,
> -};
> -
> -void wake_up_klogd(void)
> -{
> -	preempt_disable();
> -	if (waitqueue_active(&log_wait)) {
> -		this_cpu_or(printk_pending, PRINTK_PENDING_WAKEUP);
> -		irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
> -	}
> -	preempt_enable();
> -}
> -
> -int printk_deferred(const char *fmt, ...)
> -{
> -	va_list args;
> -	int r;
> -
> -	preempt_disable();
> -	va_start(args, fmt);
> -	r = vprintk_emit(0, LOGLEVEL_SCHED, NULL, 0, fmt, args);
> -	va_end(args);
> -
> -	__this_cpu_or(printk_pending, PRINTK_PENDING_OUTPUT);
> -	irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
> -	preempt_enable();
> -
> -	return r;
> -}
> -
> -/*
>   * printk rate limiting, lifted from the networking subsystem.
>   *
>   * This enforces a rate limit: not more than 10 kernel messages
> -- 
> 2.6.2
> 

> From be116ae18f15f0d2d05ddf0b53eaac184943d312 Mon Sep 17 00:00:00 2001
> From: Jan Kara <jack@suse.cz>
> Date: Mon, 21 Dec 2015 14:26:13 +0100
> Subject: [PATCH 2/2] printk: Skip messages on oops
> 
> When there are too many messages in the kernel printk buffer it can take
> very long to print them to console (especially when using slow serial
> console). This is undesirable during oops so when we encounter oops and
> there are more than 100 messages to print, print just the newest 100
> messages and then the oops message.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  kernel/printk/printk.c | 34 +++++++++++++++++++++++++++++++++-
>  1 file changed, 33 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index d455d1bd0d2c..fc67ab70e9c7 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -262,6 +262,9 @@ static u64 console_seq;
>  static u32 console_idx;
>  static enum log_flags console_prev;
>  
> +/* current record sequence when oops happened */
> +static u64 oops_start_seq;
> +
>  /* the next printk record to read after the last 'clear' command */
>  static u64 clear_seq;
>  static u32 clear_idx;
> @@ -1783,6 +1786,8 @@ asmlinkage int vprintk_emit(int facility, int level,
>  					 NULL, 0, recursion_msg,
>  					 strlen(recursion_msg));
>  	}
> +	if (oops_in_progress && !sync_print && !oops_start_seq)
> +		oops_start_seq = log_next_seq;
>  
>  	/*
>  	 * The printf needs to come first; we need the syslog
> @@ -2292,6 +2297,12 @@ out:
>  	raw_spin_unlock_irqrestore(&logbuf_lock, flags);
>  }
>  
> +/*
> + * When oops happens and there are more messages to be printed in the printk
> + * buffer that this, skip some mesages and print only this many newest messages.
> + */
> +#define PRINT_MSGS_BEFORE_OOPS 100
> +
>  /**
>   * console_unlock - unlock the console system
>   *
> @@ -2348,7 +2359,28 @@ again:
>  			seen_seq = log_next_seq;
>  		}
>  
> -		if (console_seq < log_first_seq) {
> +		/*
> +		 * If oops happened and there are more than
> +		 * PRINT_MSGS_BEFORE_OOPS messages pending before oops message,
> +		 * skip them to make oops appear faster.
> +		 */
> +		if (oops_start_seq &&
> +		    console_seq + PRINT_MSGS_BEFORE_OOPS < oops_start_seq) {
> +			len = sprintf(text,
> +				"** %u printk messages dropped due to oops ** ",
> +				(unsigned)(oops_start_seq - console_seq -
> +					   PRINT_MSGS_BEFORE_OOPS));
> +			if (console_seq < log_first_seq) {
> +				console_seq = log_first_seq;
> +				console_idx = log_first_idx;
> +			}
> +			while (console_seq <
> +			       oops_start_seq - PRINT_MSGS_BEFORE_OOPS) {
> +				console_idx = log_next(console_idx);
> +				console_seq++;
> +			}
> +			console_prev = 0;
> +		} else if (console_seq < log_first_seq) {
>  			len = sprintf(text, "** %u printk messages dropped ** ",
>  				      (unsigned)(log_first_seq - console_seq));
>  
> -- 
> 2.6.2
> 


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/7] printk: Hand over printing to console if printing too long
  2015-12-10 14:52 [PATCH 1/7] printk: Hand over printing to console if printing too long Sergey Senozhatsky
  2015-12-10 15:24 ` Sergey Senozhatsky
  2015-12-11  4:27 ` Sergey Senozhatsky
@ 2015-12-22 13:47 ` Jan Kara
  2015-12-22 14:48   ` Sergey Senozhatsky
                     ` (3 more replies)
  2 siblings, 4 replies; 35+ messages in thread
From: Jan Kara @ 2015-12-22 13:47 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Andrew Morton, Petr Mladek, KY Sri nivasan, Steven Rostedt,
	linux-kernel, Sergey Senozhatsky

[-- Attachment #1: Type: text/plain, Size: 3610 bytes --]

On Thu 10-12-15 23:52:51, Sergey Senozhatsky wrote:
> Hello,
> 
> *** in this email and in every later emails ***
>     Sorry, if I messed up with Cc list or message-ids. It's suprisingly
> hard to jump in into a loop that has never been in your inbox. It took
> some `googling' effort.
> 
> I haven't tested the patch set yet, I just 'ported' it to linux-next.
> I reverted 073696a8bc7779b ("printk: do cond_resched() between lines while
> outputting to consoles") as a first step, but it comes in later again. I can
> send out the updated series (off list is OK).
> 
> > Currently, console_unlock() prints messages from kernel printk buffer to
> > console while the buffer is non-empty. When serial console is attached,
> > printing is slow and thus other CPUs in the system have plenty of time
> > to append new messages to the buffer while one CPU is printing. Thus the
> > CPU can spend unbounded amount of time doing printing in console_unlock().
> > This is especially serious problem if the printk() calling
> > console_unlock() was called with interrupts disabled.
> >
> > In practice users have observed a CPU can spend tens of seconds printing
> > in console_unlock() (usually during boot when hundreds of SCSI devices
> > are discovered) resulting in RCU stalls (CPU doing printing doesn't
> > reach quiescent state for a long time), softlockup reports (IPIs for the
> > printing CPU don't get served and thus other CPUs are spinning waiting
> > for the printing CPU to process IPIs), and eventually a machine death
> > (as messages from stalls and lockups append to printk buffer faster than
> > we are able to print). So these machines are unable to boot with serial
> > console attached. Also during artificial stress testing SATA disk
> > disappears from the system because its interrupts aren't served for too
> > long.
> >
> > This patch implements a mechanism where after printing specified number
> > of characters (tunable as a kernel parameter printk.offload_chars), CPU
> > doing printing asks for help by waking up one of dedicated kthreads.  As
> > soon as the printing CPU notices kthread got scheduled and is spinning
> > on print_lock dedicated for that purpose, it drops console_sem,
> > print_lock, and exits console_unlock(). Kthread then takes over printing
> > instead. This way no CPU should spend printing too long even if there
> > is heavy printk traffic.
> >
> > Signed-off-by: Jan Kara <jack@suse.cz>
> 
> I think we better use raw_spin_lock as a print_lock; and, apart from that,
> seems that we don't re-init in zap_lock(). So I ended up with the following
> patch on top of yours (to be folded):
> 
> - use raw_spin_lock
> - do not forget to re-init `print_lock' in zap_locks()

Thanks for looking into my patches and sorry for replying with a delay. As
I wrote in my previous email [1] even the referenced patches are not quite
enough. Over last few days I have worked on redoing the stuff as we
discussed with Linus and Andrew at Kernel Summit and I have new patches
which are working fine for me. I still want to test them on some machines
having real issues with udev during boot but so far stress-testing with
serial console slowed down to ~1000 chars/sec on other machines and VMs
looks promising.

I'm attaching them in case you want to have a look. They are on top of
Tejun's patch adding cond_resched() (which is essential). I'll officially
submit the patches once the testing is finished (but I'm not sure when I
get to the problematic HW...).

								Honza

[1] http://www.spinics.net/lists/stable/msg111535.html
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

[-- Attachment #2: 0001-printk-Make-printk-completely-async.patch --]
[-- Type: text/x-patch, Size: 8977 bytes --]

>From 2e9675abbfc0df4a24a8c760c58e8150b9a31259 Mon Sep 17 00:00:00 2001
From: Jan Kara <jack@suse.cz>
Date: Mon, 21 Dec 2015 13:10:31 +0100
Subject: [PATCH 1/2] printk: Make printk() completely async

Currently, printk() sometimes waits for message to be printed to console
and sometimes it does not (when console_sem is held by some other
process). In case printk() grabs console_sem and starts printing to
console, it prints messages from kernel printk buffer until the buffer
is empty. When serial console is attached, printing is slow and thus
other CPUs in the system have plenty of time to append new messages to
the buffer while one CPU is printing. Thus the CPU can spend unbounded
amount of time doing printing in console_unlock().  This is especially
serious problem if the printk() calling console_unlock() was called with
interrupts disabled.

In practice users have observed a CPU can spend tens of seconds printing
in console_unlock() (usually during boot when hundreds of SCSI devices
are discovered) resulting in RCU stalls (CPU doing printing doesn't
reach quiescent state for a long time), softlockup reports (IPIs for the
printing CPU don't get served and thus other CPUs are spinning waiting
for the printing CPU to process IPIs), and eventually a machine death
(as messages from stalls and lockups append to printk buffer faster than
we are able to print). So these machines are unable to boot with serial
console attached. Another observed issue is that due to slow printk,
hardware discovery is slow and udev times out before kernel manages to
discover all the attached HW. Also during artificial stress testing SATA
disk disappears from the system because its interrupts aren't served for
too long.

This patch makes printk() completely asynchronous (similar to what
printk_deferred() did until now). It appends message to the kernel
printk buffer and queues work to do the printing to console. This has
the advantage that printing always happens from a schedulable contex and
thus we don't lockup any particular CPU or even interrupts. Also it has
the advantage that printk() is fast and thus kernel booting is not
slowed down by slow serial console. Disadvantage of this method is that
in case of crash there is higher chance that important messages won't
appear in console output (we may need working scheduling to print
message to console). We somewhat mitigate this risk by switching printk
to the original method of immediate printing to console if oops is in
progress.  Also for debugging purposes we provide printk.synchronous
kernel parameter which resorts to the original printk behavior.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 Documentation/kernel-parameters.txt |  10 +++
 kernel/printk/printk.c              | 144 +++++++++++++++++++++---------------
 2 files changed, 95 insertions(+), 59 deletions(-)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 742f69d18fc8..4cf1bddeffc7 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -3000,6 +3000,16 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
 	printk.time=	Show timing data prefixed to each printk message line
 			Format: <bool>  (1/Y/y=enable, 0/N/n=disable)
 
+	printk.synchronous=
+			By default kernel messages are printed to console
+			asynchronously (except during early boot or when oops
+			is happening). That avoids kernel stalling behind slow
+			serial console and thus avoids softlockups, interrupt
+			timeouts, or userspace timing out during heavy printing.
+			However for debugging problems, printing messages to
+			console immediately may be desirable. This option
+			enables such behavior.
+
 	processor.max_cstate=	[HW,ACPI]
 			Limit processor to maximum C-state
 			max_cstate=9 overrides any DMI blacklist limit.
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 299c2f0e7350..d455d1bd0d2c 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -283,6 +283,73 @@ static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN);
 static char *log_buf = __log_buf;
 static u32 log_buf_len = __LOG_BUF_LEN;
 
+/*
+ * When true, printing to console will happen synchronously unless someone else
+ * is already printing messages.
+ */
+static bool __read_mostly printk_sync;
+module_param_named(synchronous, printk_sync, bool, S_IRUGO | S_IWUSR);
+MODULE_PARM_DESC(synchronous, "make printing to console synchronous");
+
+#define PRINTK_PENDING_WAKEUP	0x01
+#define PRINTK_PENDING_OUTPUT	0x02
+
+static DEFINE_PER_CPU(int, printk_pending);
+
+static void printing_work_func(struct work_struct *work)
+{
+	console_lock();
+	console_unlock();
+}
+
+static DECLARE_WORK(printing_work, printing_work_func);
+
+static void wake_up_klogd_work_func(struct irq_work *irq_work)
+{
+	int pending = __this_cpu_xchg(printk_pending, 0);
+
+	/*
+	 * We just schedule regular work to do the printing from irq work. We
+	 * don't want to do printing here directly as that happens with
+	 * interrupts disabled and thus is bad for interrupt latency. We also
+	 * don't want to queue regular work from vprintk_emit() as that gets
+	 * called in various difficult contexts where schedule_work() could
+	 * deadlock.
+	 */
+	if (pending & PRINTK_PENDING_OUTPUT)
+		schedule_work(&printing_work);
+
+	if (pending & PRINTK_PENDING_WAKEUP)
+		wake_up_interruptible(&log_wait);
+}
+
+static DEFINE_PER_CPU(struct irq_work, wake_up_klogd_work) = {
+	.func = wake_up_klogd_work_func,
+	.flags = IRQ_WORK_LAZY,
+};
+
+void wake_up_klogd(void)
+{
+	preempt_disable();
+	if (waitqueue_active(&log_wait)) {
+		this_cpu_or(printk_pending, PRINTK_PENDING_WAKEUP);
+		irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
+	}
+	preempt_enable();
+}
+
+int printk_deferred(const char *fmt, ...)
+{
+	va_list args;
+	int r;
+
+	va_start(args, fmt);
+	r = vprintk_emit(0, LOGLEVEL_SCHED, NULL, 0, fmt, args);
+	va_end(args);
+
+	return r;
+}
+
 /* Return log buffer address */
 char *log_buf_addr_get(void)
 {
@@ -1668,15 +1735,14 @@ asmlinkage int vprintk_emit(int facility, int level,
 	unsigned long flags;
 	int this_cpu;
 	int printed_len = 0;
-	bool in_sched = false;
+	bool sync_print = printk_sync;
 	/* cpu currently holding logbuf_lock in this function */
 	static unsigned int logbuf_cpu = UINT_MAX;
 
 	if (level == LOGLEVEL_SCHED) {
 		level = LOGLEVEL_DEFAULT;
-		in_sched = true;
+		sync_print = false;
 	}
-
 	boot_delay_msec(level);
 	printk_delay();
 
@@ -1803,10 +1869,24 @@ asmlinkage int vprintk_emit(int facility, int level,
 	logbuf_cpu = UINT_MAX;
 	raw_spin_unlock(&logbuf_lock);
 	lockdep_on();
+	/*
+	 * By default we print message to console asynchronously so that kernel
+	 * doesn't get stalled due to slow serial console. That can lead to
+	 * softlockups, lost interrupts, or userspace timing out under heavy
+	 * printing load.
+	 *
+	 * However we resort to synchronous printing of messages during early
+	 * boot, when oops is in progress, or when synchronous printing was
+	 * explicitely requested by kernel parameter.
+	 */
+	if (keventd_up() && !oops_in_progress && !sync_print) {
+		__this_cpu_or(printk_pending, PRINTK_PENDING_OUTPUT);
+		irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
+	} else
+		sync_print = true;
 	local_irq_restore(flags);
 
-	/* If called from the scheduler, we can not call up(). */
-	if (!in_sched) {
+	if (sync_print) {
 		lockdep_off();
 		/*
 		 * Disable preemption to avoid being preempted while holding
@@ -2688,60 +2768,6 @@ late_initcall(printk_late_init);
 
 #if defined CONFIG_PRINTK
 /*
- * Delayed printk version, for scheduler-internal messages:
- */
-#define PRINTK_PENDING_WAKEUP	0x01
-#define PRINTK_PENDING_OUTPUT	0x02
-
-static DEFINE_PER_CPU(int, printk_pending);
-
-static void wake_up_klogd_work_func(struct irq_work *irq_work)
-{
-	int pending = __this_cpu_xchg(printk_pending, 0);
-
-	if (pending & PRINTK_PENDING_OUTPUT) {
-		/* If trylock fails, someone else is doing the printing */
-		if (console_trylock())
-			console_unlock();
-	}
-
-	if (pending & PRINTK_PENDING_WAKEUP)
-		wake_up_interruptible(&log_wait);
-}
-
-static DEFINE_PER_CPU(struct irq_work, wake_up_klogd_work) = {
-	.func = wake_up_klogd_work_func,
-	.flags = IRQ_WORK_LAZY,
-};
-
-void wake_up_klogd(void)
-{
-	preempt_disable();
-	if (waitqueue_active(&log_wait)) {
-		this_cpu_or(printk_pending, PRINTK_PENDING_WAKEUP);
-		irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
-	}
-	preempt_enable();
-}
-
-int printk_deferred(const char *fmt, ...)
-{
-	va_list args;
-	int r;
-
-	preempt_disable();
-	va_start(args, fmt);
-	r = vprintk_emit(0, LOGLEVEL_SCHED, NULL, 0, fmt, args);
-	va_end(args);
-
-	__this_cpu_or(printk_pending, PRINTK_PENDING_OUTPUT);
-	irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
-	preempt_enable();
-
-	return r;
-}
-
-/*
  * printk rate limiting, lifted from the networking subsystem.
  *
  * This enforces a rate limit: not more than 10 kernel messages
-- 
2.6.2


[-- Attachment #3: 0002-printk-Skip-messages-on-oops.patch --]
[-- Type: text/x-patch, Size: 2753 bytes --]

>From be116ae18f15f0d2d05ddf0b53eaac184943d312 Mon Sep 17 00:00:00 2001
From: Jan Kara <jack@suse.cz>
Date: Mon, 21 Dec 2015 14:26:13 +0100
Subject: [PATCH 2/2] printk: Skip messages on oops

When there are too many messages in the kernel printk buffer it can take
very long to print them to console (especially when using slow serial
console). This is undesirable during oops so when we encounter oops and
there are more than 100 messages to print, print just the newest 100
messages and then the oops message.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 kernel/printk/printk.c | 34 +++++++++++++++++++++++++++++++++-
 1 file changed, 33 insertions(+), 1 deletion(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index d455d1bd0d2c..fc67ab70e9c7 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -262,6 +262,9 @@ static u64 console_seq;
 static u32 console_idx;
 static enum log_flags console_prev;
 
+/* current record sequence when oops happened */
+static u64 oops_start_seq;
+
 /* the next printk record to read after the last 'clear' command */
 static u64 clear_seq;
 static u32 clear_idx;
@@ -1783,6 +1786,8 @@ asmlinkage int vprintk_emit(int facility, int level,
 					 NULL, 0, recursion_msg,
 					 strlen(recursion_msg));
 	}
+	if (oops_in_progress && !sync_print && !oops_start_seq)
+		oops_start_seq = log_next_seq;
 
 	/*
 	 * The printf needs to come first; we need the syslog
@@ -2292,6 +2297,12 @@ out:
 	raw_spin_unlock_irqrestore(&logbuf_lock, flags);
 }
 
+/*
+ * When oops happens and there are more messages to be printed in the printk
+ * buffer that this, skip some mesages and print only this many newest messages.
+ */
+#define PRINT_MSGS_BEFORE_OOPS 100
+
 /**
  * console_unlock - unlock the console system
  *
@@ -2348,7 +2359,28 @@ again:
 			seen_seq = log_next_seq;
 		}
 
-		if (console_seq < log_first_seq) {
+		/*
+		 * If oops happened and there are more than
+		 * PRINT_MSGS_BEFORE_OOPS messages pending before oops message,
+		 * skip them to make oops appear faster.
+		 */
+		if (oops_start_seq &&
+		    console_seq + PRINT_MSGS_BEFORE_OOPS < oops_start_seq) {
+			len = sprintf(text,
+				"** %u printk messages dropped due to oops ** ",
+				(unsigned)(oops_start_seq - console_seq -
+					   PRINT_MSGS_BEFORE_OOPS));
+			if (console_seq < log_first_seq) {
+				console_seq = log_first_seq;
+				console_idx = log_first_idx;
+			}
+			while (console_seq <
+			       oops_start_seq - PRINT_MSGS_BEFORE_OOPS) {
+				console_idx = log_next(console_idx);
+				console_seq++;
+			}
+			console_prev = 0;
+		} else if (console_seq < log_first_seq) {
 			len = sprintf(text, "** %u printk messages dropped ** ",
 				      (unsigned)(log_first_seq - console_seq));
 
-- 
2.6.2


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/7] printk: Hand over printing to console if printing too long
  2015-12-11  4:27 ` Sergey Senozhatsky
@ 2015-12-11  6:29   ` Sergey Senozhatsky
  0 siblings, 0 replies; 35+ messages in thread
From: Sergey Senozhatsky @ 2015-12-11  6:29 UTC (permalink / raw)
  To: Jan Kara
  Cc: Andrew Morton, Petr Mladek, KY Srinivasan, Steven Rostedt,
	linux-kernel, Sergey Senozhatsky, Sergey Senozhatsky

On (12/11/15 13:27), Sergey Senozhatsky wrote:
[..]
> > static bool cpu_stop_printing(int printed_chars)
> > {
> >         /* Oops? Print everything now to maximize chances user will see it */
> >         if (oops_in_progress)
> >                 return false;
> >         if (!printk_offload_chars || printed_chars < printk_offload_chars)
> >                 return false;
> >         /*
> >          * Make sure we load fresh value of printing_tasks_spinning. Matches
> >          * the barrier in printing_task()
> >          */
> >         smp_rmb();
> >         if (atomic_read(&printing_tasks_spinning))
> >                 return true;
> >         wake_up(&print_queue);
> >
> -	return false;
> +	return true;
> > }

*just as an idea*, I was thinking about having two different offload
limits -- for user space processes doing console_unlock() for whatever
reason (printk in syscall or because of console_lock, etc.) and for
KTHREADS. the kernel threads can have normal offload_limit, while user
space processes can return back from syscal sooner (doing only half of
printk worload, for example). but this is probably too `custom', though
sort of make some sense.

(compile tested only)

---
 kernel/printk/printk.c | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 79915da..cff1dd1 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -330,6 +330,7 @@ static struct kernel_param_ops offload_chars_ops = {
  * 0.1s maximum latency due to printing.
  */
 static unsigned int __read_mostly printk_offload_chars = 1000;
+static unsigned int __read_mostly printk_offload_limits[2] = {500, 1000};
 
 module_param_cb(offload_chars, &offload_chars_ops, &printk_offload_chars,
 		   S_IRUGO | S_IWUSR);
@@ -343,10 +344,14 @@ MODULE_PARM_DESC(offload_chars, "offload printing to console to a different"
  */
 static bool cpu_stop_printing(int printed_chars)
 {
+	bool type = current->flags & PF_KTHREAD;
+
 	/* Oops? Print everything now to maximize chances user will see it */
 	if (oops_in_progress)
 		return false;
-	if (!printk_offload_chars || printed_chars < printk_offload_chars)
+	if (!printk_offload_chars)
+		return false;
+	if (printed_chars < printk_offload_limits[type])
 		return false;
 	/*
 	 * Make sure we load fresh value of printing_tasks_spinning. Matches
@@ -2995,6 +3000,12 @@ out_err:
 	return ret;
 }
 
+static void offload_limits_set(void)
+{
+	printk_offload_limits[0] = printk_offload_chars >> 1;
+	printk_offload_limits[1] = printk_offload_chars;
+}
+
 static int offload_chars_set(const char *val, const struct kernel_param *kp)
 {
 	int ret;
@@ -3006,6 +3017,8 @@ static int offload_chars_set(const char *val, const struct kernel_param *kp)
 		mutex_unlock(&printing_kthread_mutex);
 		return ret;
 	}
+
+	offload_limits_set();
 	ret = printk_start_offload_kthreads();
 	mutex_unlock(&printing_kthread_mutex);
 	return ret;
@@ -3014,11 +3027,13 @@ static int offload_chars_set(const char *val, const struct kernel_param *kp)
 static void printk_offload_init(void)
 {
 	mutex_lock(&printing_kthread_mutex);
+	offload_limits_set();
 	if (num_possible_cpus() <= 1) {
 		/* Offloading doesn't make sense. Disable print offloading. */
 		printk_offload_chars = 0;
-	} else
+	} else {
 		printk_start_offload_kthreads();
+	}
 	mutex_unlock(&printing_kthread_mutex);
 }
 
-- 
2.6.4


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/7] printk: Hand over printing to console if printing too long
  2015-12-10 14:52 [PATCH 1/7] printk: Hand over printing to console if printing too long Sergey Senozhatsky
  2015-12-10 15:24 ` Sergey Senozhatsky
@ 2015-12-11  4:27 ` Sergey Senozhatsky
  2015-12-11  6:29   ` Sergey Senozhatsky
  2015-12-22 13:47 ` Jan Kara
  2 siblings, 1 reply; 35+ messages in thread
From: Sergey Senozhatsky @ 2015-12-11  4:27 UTC (permalink / raw)
  To: Jan Kara
  Cc: Andrew Morton, Petr Mladek, KY Srinivasan, Steven Rostedt,
	linux-kernel, Sergey Senozhatsky, Sergey Senozhatsky

On (12/10/15 23:52), Sergey Senozhatsky wrote:
> > Currently, console_unlock() prints messages from kernel printk buffer to
> > console while the buffer is non-empty. When serial console is attached,
> > printing is slow and thus other CPUs in the system have plenty of time
> > to append new messages to the buffer while one CPU is printing. Thus the
> > CPU can spend unbounded amount of time doing printing in console_unlock().
> > This is especially serious problem if the printk() calling
> > console_unlock() was called with interrupts disabled.
> >
> > In practice users have observed a CPU can spend tens of seconds printing
> > in console_unlock() (usually during boot when hundreds of SCSI devices
> > are discovered) resulting in RCU stalls (CPU doing printing doesn't
> > reach quiescent state for a long time), softlockup reports (IPIs for the
> > printing CPU don't get served and thus other CPUs are spinning waiting
> > for the printing CPU to process IPIs), and eventually a machine death
> > (as messages from stalls and lockups append to printk buffer faster than
> > we are able to print). So these machines are unable to boot with serial
> > console attached. Also during artificial stress testing SATA disk
> > disappears from the system because its interrupts aren't served for too
> > long.
> >
> > This patch implements a mechanism where after printing specified number
> > of characters (tunable as a kernel parameter printk.offload_chars), CPU
> > doing printing asks for help by waking up one of dedicated kthreads.  As
> > soon as the printing CPU notices kthread got scheduled and is spinning
> > on print_lock dedicated for that purpose, it drops console_sem,
> > print_lock, and exits console_unlock(). Kthread then takes over printing
> > instead. This way no CPU should spend printing too long even if there
> > is heavy printk traffic.
> >
> > Signed-off-by: Jan Kara <jack@suse.cz>


> static bool cpu_stop_printing(int printed_chars)
> {
>         /* Oops? Print everything now to maximize chances user will see it */
>         if (oops_in_progress)
>                 return false;
>         if (!printk_offload_chars || printed_chars < printk_offload_chars)
>                 return false;
>         /*
>          * Make sure we load fresh value of printing_tasks_spinning. Matches
>          * the barrier in printing_task()
>          */
>         smp_rmb();
>         if (atomic_read(&printing_tasks_spinning))
>                 return true;
>         wake_up(&print_queue);
>
-	return false;
+	return true;
> }


we just woke up a task we will offload printing to. no need to do another round
of call_console_drivers() from current, possibly overrunning printk_offload_chars
by strlen of next msg, which can be close to max text length in the worst case,
while woken up print task will burn cpu cycles spinning on the print_lock.

	-ss

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/7] printk: Hand over printing to console if printing too long
  2015-12-10 14:52 [PATCH 1/7] printk: Hand over printing to console if printing too long Sergey Senozhatsky
@ 2015-12-10 15:24 ` Sergey Senozhatsky
  2015-12-11  4:27 ` Sergey Senozhatsky
  2015-12-22 13:47 ` Jan Kara
  2 siblings, 0 replies; 35+ messages in thread
From: Sergey Senozhatsky @ 2015-12-10 15:24 UTC (permalink / raw)
  To: Jan Kara
  Cc: Andrew Morton, Petr Mladek, KY Srinivasan, Steven Rostedt,
	linux-kernel, Sergey Senozhatsky, Sergey Senozhatsky

On (12/10/15 23:52), Sergey Senozhatsky wrote:
> 
> I think we better use raw_spin_lock as a print_lock; and, apart from that,
> seems that we don't re-init in zap_lock(). So I ended up with the following
> patch on top of yours (to be folded):
> 
> - use raw_spin_lock
> - do not forget to re-init `print_lock' in zap_locks()

while we are on this, what do you guys think?


CPU1                                        CPU2
console_unlock()
  call_console_drivers()
    con->write()
       ... spin_lock ... uart, etc          panic

zap_lock() will raw_spin_lock_init(&logbuf_lock) and
sema_init(&console_sem, 1), but we still have `spin_lock'
held by con->write().

so the `panic' flush or print out will see con->write() being already
blocked.


===8<====

We do zap_lock() in printk to make a panic print out possible, but we
can end up having a locked serial console - e.g. panic has occurred
whilst CPUx was in con->write(), which takes some internal locks, thus
call_console_drivers() will perform con->write() on an already locked
console.

Try to reset() every console in zap_lock() via console specific
->reset() call.

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
---
 include/linux/console.h | 1 +
 kernel/printk/printk.c  | 6 ++++++
 2 files changed, 7 insertions(+)

diff --git a/include/linux/console.h b/include/linux/console.h
index bd19434..1cb8f72 100644
--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -125,6 +125,7 @@ struct console {
 	void	(*unblank)(void);
 	int	(*setup)(struct console *, char *);
 	int	(*match)(struct console *, char *name, int idx, char *options);
+	void	(*reset)(struct console *);
 	short	flags;
 	short	index;
 	int	cflag;
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index f4a9565..ad172c4 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1580,6 +1580,7 @@ static void call_console_drivers(int level,
  */
 static void zap_locks(void)
 {
+	struct console *c;
 	static unsigned long oops_timestamp;
 
 	if (time_after_eq(jiffies, oops_timestamp) &&
@@ -1589,6 +1590,11 @@ static void zap_locks(void)
 	oops_timestamp = jiffies;
 
 	debug_locks_off();
+
+	for_each_console(c)
+		if ((c->flags & CON_ENABLED) && c->reset)
+			c->reset(c);
+
 	/* If a crash is occurring, make sure we can't deadlock */
 	raw_spin_lock_init(&logbuf_lock);
 	/* And make sure that we print immediately */
-- 
2.6.3


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/7] printk: Hand over printing to console if printing too long
@ 2015-12-10 14:52 Sergey Senozhatsky
  2015-12-10 15:24 ` Sergey Senozhatsky
                   ` (2 more replies)
  0 siblings, 3 replies; 35+ messages in thread
From: Sergey Senozhatsky @ 2015-12-10 14:52 UTC (permalink / raw)
  To: Jan Kara
  Cc: Andrew Morton, Petr Mladek, KY Srinivasan, Steven Rostedt,
	linux-kernel, Sergey Senozhatsky, Sergey Senozhatsky

Hello,

*** in this email and in every later emails ***
    Sorry, if I messed up with Cc list or message-ids. It's suprisingly
hard to jump in into a loop that has never been in your inbox. It took
some `googling' effort.

I haven't tested the patch set yet, I just 'ported' it to linux-next.
I reverted 073696a8bc7779b ("printk: do cond_resched() between lines while
outputting to consoles") as a first step, but it comes in later again. I can
send out the updated series (off list is OK).

> Currently, console_unlock() prints messages from kernel printk buffer to
> console while the buffer is non-empty. When serial console is attached,
> printing is slow and thus other CPUs in the system have plenty of time
> to append new messages to the buffer while one CPU is printing. Thus the
> CPU can spend unbounded amount of time doing printing in console_unlock().
> This is especially serious problem if the printk() calling
> console_unlock() was called with interrupts disabled.
>
> In practice users have observed a CPU can spend tens of seconds printing
> in console_unlock() (usually during boot when hundreds of SCSI devices
> are discovered) resulting in RCU stalls (CPU doing printing doesn't
> reach quiescent state for a long time), softlockup reports (IPIs for the
> printing CPU don't get served and thus other CPUs are spinning waiting
> for the printing CPU to process IPIs), and eventually a machine death
> (as messages from stalls and lockups append to printk buffer faster than
> we are able to print). So these machines are unable to boot with serial
> console attached. Also during artificial stress testing SATA disk
> disappears from the system because its interrupts aren't served for too
> long.
>
> This patch implements a mechanism where after printing specified number
> of characters (tunable as a kernel parameter printk.offload_chars), CPU
> doing printing asks for help by waking up one of dedicated kthreads.  As
> soon as the printing CPU notices kthread got scheduled and is spinning
> on print_lock dedicated for that purpose, it drops console_sem,
> print_lock, and exits console_unlock(). Kthread then takes over printing
> instead. This way no CPU should spend printing too long even if there
> is heavy printk traffic.
>
> Signed-off-by: Jan Kara <jack@suse.cz>

I think we better use raw_spin_lock as a print_lock; and, apart from that,
seems that we don't re-init in zap_lock(). So I ended up with the following
patch on top of yours (to be folded):

- use raw_spin_lock
- do not forget to re-init `print_lock' in zap_locks()
---
 kernel/printk/printk.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index d986599..2a86ff1 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -85,7 +85,7 @@ EXPORT_SYMBOL_GPL(console_drivers);
  * we can spin on it when some other thread wants to take over printing to
  * console.
  */
-static DEFINE_SPINLOCK(print_lock);
+static DEFINE_RAW_SPINLOCK(print_lock);
 
 /*
  * Number of printing threads spinning on print_lock. Can go away once
@@ -1516,6 +1516,7 @@ static void zap_locks(void)
 	/* If a crash is occurring, make sure we can't deadlock */
 	raw_spin_lock_init(&logbuf_lock);
 	/* And make sure that we print immediately */
+	raw_spin_lock_init(&print_lock);
 	sema_init(&console_sem, 1);
 }
 
@@ -2311,7 +2312,7 @@ void console_unlock(void)
 	console_cont_flush(text, sizeof(text));
 again:
 	retry = false;
-	spin_lock_irqsave(&print_lock, flags);
+	raw_spin_lock_irqsave(&print_lock, flags);
 	for (;;) {
 		struct printk_log *msg;
 		size_t ext_len = 0;
@@ -2410,7 +2411,7 @@ skip:
 	 * succeeds in getting console_sem (unless someone else takes it and
 	 * then he'll be responsible for printing).
          */
-	spin_unlock_irqrestore(&print_lock, flags);
+	raw_spin_unlock_irqrestore(&print_lock, flags);
 
 	/*
 	 * In case we cannot trylock the console_sem again, there's a new owner
@@ -2773,9 +2774,9 @@ static int printing_task(void *arg)
 		 * want to sleep once we got scheduled to make sure we take
 		 * over printing without depending on the scheduler.
 		 */
-		spin_lock_irqsave(&print_lock, flags);
+		raw_spin_lock_irqsave(&print_lock, flags);
 		atomic_dec(&printing_tasks_spinning);
-		spin_unlock_irqrestore(&print_lock, flags);
+		raw_spin_unlock_irqrestore(&print_lock, flags);
 		if (console_trylock())
 			console_unlock();
 		preempt_enable();
-- 
2.6.3


^ permalink raw reply related	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2016-03-02  9:30 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-10-26  4:52 [PATCH 0/6 v2] printk: Softlockup avoidance Jan Kara
2015-10-26  4:52 ` [PATCH 1/7] printk: Hand over printing to console if printing too long Jan Kara
2016-03-01 17:22   ` Denys Vlasenko
2016-03-02  9:30     ` Jan Kara
2015-10-26  4:52 ` [PATCH 2/7] printk: Start printing handover kthreads on demand Jan Kara
2015-10-26  4:52 ` [PATCH 3/7] kernel: Avoid softlockups in stop_machine() during heavy printing Jan Kara
2015-10-26  4:56   ` Jan Kara
2015-10-26  4:52 ` [PATCH 4/7] panic: Always flush printk buffer before panic Jan Kara
2015-10-26  4:52 ` [PATCH 5/7] printk: Add config option for disabling printk offloading Jan Kara
2015-10-26  4:52 ` [PATCH 6/7] printk: Avoid scheduling printing threads on the same CPU Jan Kara
2015-12-10 14:52 [PATCH 1/7] printk: Hand over printing to console if printing too long Sergey Senozhatsky
2015-12-10 15:24 ` Sergey Senozhatsky
2015-12-11  4:27 ` Sergey Senozhatsky
2015-12-11  6:29   ` Sergey Senozhatsky
2015-12-22 13:47 ` Jan Kara
2015-12-22 14:48   ` Sergey Senozhatsky
2015-12-23  1:54   ` Sergey Senozhatsky
2015-12-23  3:37     ` Sergey Senozhatsky
2015-12-23  3:57       ` Sergey Senozhatsky
2015-12-23  4:15         ` Sergey Senozhatsky
2016-01-05 14:37     ` Jan Kara
2016-01-06  1:41       ` Sergey Senozhatsky
2016-01-06  6:48       ` Sergey Senozhatsky
2016-01-06 12:25         ` Jan Kara
2016-01-11 13:25           ` Sergey Senozhatsky
2015-12-31  2:44   ` Sergey Senozhatsky
2015-12-31  3:13     ` Sergey Senozhatsky
2015-12-31  4:58       ` Sergey Senozhatsky
2016-01-05 14:48         ` Jan Kara
2016-01-06  3:38           ` Sergey Senozhatsky
2016-01-06  8:36             ` Sergey Senozhatsky
2016-01-06 10:21               ` Jan Kara
2016-01-06 11:10                 ` Sergey Senozhatsky
2016-01-11 12:54   ` Petr Mladek
2016-01-12 14:00     ` Jan Kara

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.