[RFC][PATCHv6 00/12] printk: introduce printing kernel thread

All of lore.kernel.org
 help / color / mirror / Atom feed

* [RFC][PATCHv6 00/12] printk: introduce printing kernel thread
@ 2017-12-04 13:48 Sergey Senozhatsky
  2017-12-04 13:48 ` [RFC][PATCHv6 01/12] printk: move printk_pending out of per-cpu Sergey Senozhatsky
                   ` (14 more replies)
  0 siblings, 15 replies; 79+ messages in thread
From: Sergey Senozhatsky @ 2017-12-04 13:48 UTC (permalink / raw)
  To: Steven Rostedt, Petr Mladek
  Cc: Jan Kara, Andrew Morton, Peter Zijlstra, Rafael Wysocki,
	Pavel Machek, Tetsuo Handa, Tejun Heo, linux-kernel,
	Sergey Senozhatsky, Sergey Senozhatsky

Hello,

	RFC

	A new version, yet another rework. Lots of changes, e.g. hand off
control based on Steven's patch. Another change is that this time around
we finally have a kernel module to test printk offloading (YAYY!). The
module tests a bunch use cases; we also have trace printk events to...
trace offloading. I'll post the testing script and test module in reply
to this mail. So... let's have some progress ;) The code is not completely
awesome, but not tremendously difficult at the same time. We can verify
the approach/design (we have tests and traces now) first and then start
improving the code.

8<---- ---- ----

	This patch set adds a printk() kernel thread which let us
to print kernel messages to the console from a non-atomic/schedule-able
context, avoiding different sort of lockups, stalls, etc.

v5->v6:
-- add hand off control (Steven)
-- dropped some of the previous patches (auto emergency mode, etc.)
-- remove `atomic_print_limit' knob
-- tons of other changes.

v4->v5
-- split some of the patches
-- make offloading time-based (not number of lines printed)
-- move offloading control to per-CPU
-- remove a pessimistic offloading spin from console_unlock()
-- adjust printk_kthread CPU affinity mask
-- disable preemption in console_unlock()
-- always offload printing from user space processes
-- add sync version of emergency_begin API
-- offload from printk_kthread as well, to periodically up() console_sem
-- limit `atomic_print_limit' to `watchdog_thresh'
-- and some other changes...

v3->v4 (Petr, Jan)
-- add syscore notifiers
-- fix 0001 compilation warnings
-- use proper CPU notifiers return values

v2->v3 (Petr, Pavel, Andreas):
-- rework offloading
-- use PM notifiers
-- dropped some patches, etc. etc.

v1->v2:
-- introduce printk_emergency mode and API to switch it on/off
-- move printk_pending out of per-CPU memory
-- add printk emergency_mode sysfs node
-- switch sysrq handlers (some of them) to printk_emergency
-- cleanus/etc.

Sergey Senozhatsky (12):
  printk: move printk_pending out of per-cpu
  printk: introduce printing kernel thread
  printk: consider watchdogs thresholds for offloading
  printk: add sync printk_emergency API
  printk: enable printk offloading
  PM: switch between printk emergency modes
  printk: register syscore notifier
  printk: force printk_kthread to offload printing
  printk: do not cond_resched() when we can offload
  printk: move offloading logic to per-cpu
  printk: add offloading watchdog API
  printk: improve printk offloading mechanism

 Documentation/admin-guide/kernel-parameters.txt |   7 +
 drivers/base/power/main.c                       |   5 +
 include/linux/console.h                         |   7 +
 kernel/printk/printk.c                          | 552 ++++++++++++++++++++++--
 kernel/watchdog.c                               |   6 +-
 5 files changed, 551 insertions(+), 26 deletions(-)

-- 
2.15.1

^ permalink raw reply	[flat|nested] 79+ messages in thread

* [RFC][PATCHv6 01/12] printk: move printk_pending out of per-cpu
  2017-12-04 13:48 [RFC][PATCHv6 00/12] printk: introduce printing kernel thread Sergey Senozhatsky
@ 2017-12-04 13:48 ` Sergey Senozhatsky
  2017-12-04 13:48 ` [RFC][PATCHv6 02/12] printk: introduce printing kernel thread Sergey Senozhatsky
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 79+ messages in thread
From: Sergey Senozhatsky @ 2017-12-04 13:48 UTC (permalink / raw)
  To: Steven Rostedt, Petr Mladek
  Cc: Jan Kara, Andrew Morton, Peter Zijlstra, Rafael Wysocki,
	Pavel Machek, Tetsuo Handa, Tejun Heo, linux-kernel,
	Sergey Senozhatsky, Sergey Senozhatsky

Do not keep `printk_pending' in per-CPU area. We set the following bits
of printk_pending:
a) PRINTK_PENDING_WAKEUP
	when we need to wakeup klogd
b) PRINTK_PENDING_OUTPUT
	when there is a pending output from deferred printk and we need
	to call console_unlock().

So none of the bits control/represent a state of a particular CPU and,
basically, they should be global instead.

Besides we will use `printk_pending' to control printk kthread, so this
patch is also a preparation work.

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Suggested-by: Petr Mladek <pmladek@suse.com>
---
 kernel/printk/printk.c | 26 ++++++++++++--------------
 1 file changed, 12 insertions(+), 14 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 8aa27be96012..81c19e51a4a4 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -401,6 +401,14 @@ DEFINE_RAW_SPINLOCK(logbuf_lock);
 	} while (0)
 
 #ifdef CONFIG_PRINTK
+/*
+ * Delayed printk version, for scheduler-internal messages:
+ */
+#define PRINTK_PENDING_WAKEUP	0x01
+#define PRINTK_PENDING_OUTPUT	0x02
+
+static unsigned long printk_pending;
+
 DECLARE_WAIT_QUEUE_HEAD(log_wait);
 /* the next printk record to read by syslog(READ) or /proc/kmsg */
 static u64 syslog_seq;
@@ -2677,25 +2685,15 @@ static int __init printk_late_init(void)
 late_initcall(printk_late_init);
 
 #if defined CONFIG_PRINTK
-/*
- * Delayed printk version, for scheduler-internal messages:
- */
-#define PRINTK_PENDING_WAKEUP	0x01
-#define PRINTK_PENDING_OUTPUT	0x02
-
-static DEFINE_PER_CPU(int, printk_pending);
-
 static void wake_up_klogd_work_func(struct irq_work *irq_work)
 {
-	int pending = __this_cpu_xchg(printk_pending, 0);
-
-	if (pending & PRINTK_PENDING_OUTPUT) {
+	if (test_and_clear_bit(PRINTK_PENDING_OUTPUT, &printk_pending)) {
 		/* If trylock fails, someone else is doing the printing */
 		if (console_trylock())
 			console_unlock();
 	}
 
-	if (pending & PRINTK_PENDING_WAKEUP)
+	if (test_and_clear_bit(PRINTK_PENDING_WAKEUP, &printk_pending))
 		wake_up_interruptible(&log_wait);
 }
 
@@ -2708,7 +2706,7 @@ void wake_up_klogd(void)
 {
 	preempt_disable();
 	if (waitqueue_active(&log_wait)) {
-		this_cpu_or(printk_pending, PRINTK_PENDING_WAKEUP);
+		set_bit(PRINTK_PENDING_WAKEUP, &printk_pending);
 		irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
 	}
 	preempt_enable();
@@ -2721,7 +2719,7 @@ int vprintk_deferred(const char *fmt, va_list args)
 	r = vprintk_emit(0, LOGLEVEL_SCHED, NULL, 0, fmt, args);
 
 	preempt_disable();
-	__this_cpu_or(printk_pending, PRINTK_PENDING_OUTPUT);
+	set_bit(PRINTK_PENDING_OUTPUT, &printk_pending);
 	irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
 	preempt_enable();
 
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [RFC][PATCHv6 02/12] printk: introduce printing kernel thread
  2017-12-04 13:48 [RFC][PATCHv6 00/12] printk: introduce printing kernel thread Sergey Senozhatsky
  2017-12-04 13:48 ` [RFC][PATCHv6 01/12] printk: move printk_pending out of per-cpu Sergey Senozhatsky
@ 2017-12-04 13:48 ` Sergey Senozhatsky
  2017-12-04 13:48 ` [RFC][PATCHv6 03/12] printk: consider watchdogs thresholds for offloading Sergey Senozhatsky
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 79+ messages in thread
From: Sergey Senozhatsky @ 2017-12-04 13:48 UTC (permalink / raw)
  To: Steven Rostedt, Petr Mladek
  Cc: Jan Kara, Andrew Morton, Peter Zijlstra, Rafael Wysocki,
	Pavel Machek, Tetsuo Handa, Tejun Heo, linux-kernel,
	Sergey Senozhatsky, Sergey Senozhatsky

printk() is quite complex internally and, basically, it does two
slightly independent things:
 a) adds a new message to a kernel log buffer (log_store())
 b) prints kernel log messages to serial consoles (console_unlock())

while (a) is guaranteed to be executed by printk(), (b) is not, for a
variety of reasons, and, unlike log_store(), it comes at a price:

 1) console_unlock() attempts to flush all pending kernel log messages
to the console and it can loop indefinitely.

 2) while console_unlock() is executed on one particular CPU, printing
pending kernel log messages, other CPUs can simultaneously append new
messages to the kernel log buffer.

 3) the time it takes console_unlock() to print kernel messages also
depends on the speed of the console -- which may not be fast at all.

 4) console_unlock() is executed in the same context as printk(), so
it may be non-preemptible/atomic, which makes 1)-3) dangerous.

As a result, nobody knows how long a printk() call will take, so
it's not really safe to call printk() in a number of situations,
including atomic context, RCU critical sections, interrupt context,
etc.

To avoid lockups we, clearly, need to break out of console_unlock() printing
loop before watchdog detects the lockup condition. This patch introduces a
'/sys/module/printk/parameters/offloading_enabled' sysfs param, which enables
and disables such functionality by offloading printing duty to another task.
The printing offloading is happening from console_unlock() function and,
briefly, looks as follows:
	if a process spends in console_unlock() more than half of watchdog's
threshold value it breaks out of printing loop and unlocks console_sem.

Since nothing guarantees that there will another process sleeping on the
console_sem or calling printk() on another CPU simultaneously, the patch
also introduces auxiliary kernel thread - printk_kthread, the main
purpose of which is to take over printing duty. The workflow is, thus,
turns into:
	if a process spends in console_unlock() more than half of watchdog's
threshold value it wakes up printk_kthread, breaks out of printing loop and
unlocks the console_sem.

The wakeup part is also a bit tricky, since scheduler is in position to
decide that printk_kthread should run on the very same CPU with the process
that is currently doing printing. This means that offloading potentially
may never take place. That's why we try to play games with printk_kthread
affinity mask and basically want to wake it up on a foreign CPU, so
printing take over has more chances to succeed.

There are, however, cases when we can't (or should not) offload. For
example, we can't call into the scheduler from panic(), because this
may deadlock. Therefore printk() has a new 'emergency mode': in this
mode we never attempt to offload printing to printk_kthread. There are
places, where printk switches to printk_emergency mode automatically:
for instance, once a EMERG log level message appears in the log buffer;
in others - user must explicitly forbid offloading. For that purpose
we provide two new functions:

 -- printk_emergency_begin()
    Disables printk offloading. All printk() calls (except for deferred
    printk) will attempt to lock the console_sem and, if successful,
    flush kernel log messages.

 -- printk_emergency_end()
    Enables printk offloading.

Offloading is not possible yet, it will be enabled in a later patch.

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
---
 Documentation/admin-guide/kernel-parameters.txt |   7 +
 include/linux/console.h                         |   3 +
 kernel/printk/printk.c                          | 179 ++++++++++++++++++++++--
 3 files changed, 175 insertions(+), 14 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 28467638488d..fa40d68db39d 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3241,6 +3241,13 @@
 	printk.time=	Show timing data prefixed to each printk message line
 			Format: <bool>  (1/Y/y=enable, 0/N/n=disable)
 
+	printk.offloading_enabled=
+			Enable/disable print out offloading to a dedicated
+			printk kthread. When enabled a task will wake up the
+			printk kthread every watchdog lockup threshold seconds.
+			Format: <bool>  (1/Y/y=enable, 0/N/n=disable)
+			default: disable.
+
 	processor.max_cstate=	[HW,ACPI]
 			Limit processor to maximum C-state
 			max_cstate=9 overrides any DMI blacklist limit.
diff --git a/include/linux/console.h b/include/linux/console.h
index b8920a031a3e..07005db4c788 100644
--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -187,6 +187,9 @@ extern bool console_suspend_enabled;
 extern void suspend_console(void);
 extern void resume_console(void);
 
+extern void printk_emergency_begin(void);
+extern void printk_emergency_end(void);
+
 int mda_console_init(void);
 void prom_con_init(void);
 
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 81c19e51a4a4..f4e84f83bff5 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -48,6 +48,7 @@
 #include <linux/sched/clock.h>
 #include <linux/sched/debug.h>
 #include <linux/sched/task_stack.h>
+#include <linux/kthread.h>
 
 #include <linux/uaccess.h>
 #include <asm/sections.h>
@@ -400,15 +401,16 @@ DEFINE_RAW_SPINLOCK(logbuf_lock);
 		printk_safe_exit_irqrestore(flags);	\
 	} while (0)
 
-#ifdef CONFIG_PRINTK
 /*
- * Delayed printk version, for scheduler-internal messages:
+ * Used both for deferred printk version (scheduler-internal messages)
+ * and printk_kthread control.
  */
 #define PRINTK_PENDING_WAKEUP	0x01
 #define PRINTK_PENDING_OUTPUT	0x02
 
 static unsigned long printk_pending;
 
+#ifdef CONFIG_PRINTK
 DECLARE_WAIT_QUEUE_HEAD(log_wait);
 /* the next printk record to read by syslog(READ) or /proc/kmsg */
 static u64 syslog_seq;
@@ -444,6 +446,118 @@ static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN);
 static char *log_buf = __log_buf;
 static u32 log_buf_len = __LOG_BUF_LEN;
 
+static struct task_struct *printk_kthread;
+static cpumask_var_t printk_offload_cpus;
+/*
+ * We can't call into the scheduler (wake_up() printk kthread) during
+ * suspend/kexec/etc. This temporarily switches printk to old behaviour.
+ */
+static atomic_t printk_emergency __read_mostly;
+/*
+ * Enable/disable printk_kthread permanently. Unlike `oops_in_progress'
+ * it doesn't go back to 0.
+ */
+static bool offloading_enabled;
+
+module_param_named(offloading_enabled, offloading_enabled, bool, 0644);
+MODULE_PARM_DESC(offloading_enabled,
+		 "enable/disable print out offloading to printk kthread");
+
+static inline bool printk_offloading_enabled(void)
+{
+	if (system_state != SYSTEM_RUNNING || oops_in_progress)
+		return false;
+
+	return printk_kthread && offloading_enabled &&
+		atomic_read(&printk_emergency) == 0;
+}
+
+/*
+ * This disables printing offloading and instead attempts
+ * to do the usual console_trylock()->console_unlock().
+ *
+ * Note, this does not wait for printk_kthread to stop (if it's
+ * already printing logbuf messages).
+ */
+void printk_emergency_begin(void)
+{
+	atomic_inc(&printk_emergency);
+}
+EXPORT_SYMBOL_GPL(printk_emergency_begin);
+
+/* This re-enables printk_kthread offloading. */
+void printk_emergency_end(void)
+{
+	atomic_dec(&printk_emergency);
+}
+EXPORT_SYMBOL_GPL(printk_emergency_end);
+
+static inline int offloading_threshold(void)
+{
+	/* Default threshold value is 10 seconds */
+	return 10;
+}
+
+/*
+ * Under heavy printing load or with a slow serial console (or both)
+ * console_unlock() can stall CPUs, which can result in soft/hard-lockups,
+ * lost interrupts, RCU stalls, etc. Therefore we attempt to limit the
+ * amount of time a process can print from console_unlock().
+ *
+ * This function must be called from 'printk_safe' context under
+ * console_sem lock.
+ */
+static inline bool should_handoff_printing(u64 printing_start_ts)
+{
+	static struct task_struct *printing_task;
+	static u64 printing_elapsed;
+	u64 now = local_clock();
+
+	if (!printk_offloading_enabled()) {
+		/* We are in emergency mode, disable printk_kthread */
+		if (current == printk_kthread)
+			return true;
+		return false;
+	}
+
+	/* A new task - reset the counters. */
+	if (printing_task != current) {
+		printing_task = current;
+		printing_elapsed = 0;
+		return false;
+	}
+
+	if (time_after_eq64(now, printing_start_ts))
+		printing_elapsed += now - printing_start_ts;
+
+	/* Once we offloaded to printk_ktread - keep printing */
+	if (current == printk_kthread)
+		return false;
+
+	/* Shrink down to seconds and check the offloading threshold */
+	if ((printing_elapsed >> 30LL) < offloading_threshold())
+		return false;
+
+	/*
+	 * We try to set `printk_kthread' CPU affinity to any online CPU
+	 * except for this_cpu. Because otherwise `printk_kthread' may be
+	 * scheduled on the same CPU and offloading will not take place.
+	 */
+	cpumask_copy(printk_offload_cpus, cpu_online_mask);
+	cpumask_clear_cpu(smp_processor_id(), printk_offload_cpus);
+
+	/*
+	 * If this_cpu is the one and only online CPU, then try to wake up
+	 * `printk_kthread' on it.
+	 */
+	if (cpumask_empty(printk_offload_cpus))
+		cpumask_set_cpu(smp_processor_id(), printk_offload_cpus);
+
+	set_cpus_allowed_ptr(printk_kthread, printk_offload_cpus);
+	wake_up_process(printk_kthread);
+	return true;
+}
+
 /* Return log buffer address */
 char *log_buf_addr_get(void)
 {
@@ -1752,6 +1866,15 @@ asmlinkage int vprintk_emit(int facility, int level,
 
 	printed_len = log_output(facility, level, lflags, dict, dictlen, text, text_len);
 
+	/*
+	 * Emergency level indicates that the system is unstable and, thus,
+	 * we better stop relying on wake_up(printk_kthread) and try to do
+	 * a direct printing.
+	 */
+	if (level == LOGLEVEL_EMERG)
+		offloading_enabled = false;
+
+	set_bit(PRINTK_PENDING_OUTPUT, &printk_pending);
 	logbuf_unlock_irqrestore(flags);
 
 	/* If called from the scheduler, we can not call up(). */
@@ -1869,6 +1992,13 @@ static size_t msg_print_text(const struct printk_log *msg,
 			     bool syslog, char *buf, size_t size) { return 0; }
 static bool suppress_message_printing(int level) { return false; }
 
+void printk_emergency_begin(void) {}
+EXPORT_SYMBOL_GPL(printk_emergency_begin);
+
+void printk_emergency_end(void) {}
+EXPORT_SYMBOL_GPL(printk_emergency_end);
+
+static bool should_handoff_printing(u64 printing_start_ts) { return false; }
 #endif /* CONFIG_PRINTK */
 
 #ifdef CONFIG_EARLY_PRINTK
@@ -2149,9 +2279,18 @@ void console_unlock(void)
 	static u64 seen_seq;
 	unsigned long flags;
 	bool wake_klogd = false;
-	bool do_cond_resched, retry;
+	bool do_cond_resched, retry = false;
+	bool do_handoff = false;
 
 	if (console_suspended) {
+		/*
+		 * Here and later, we need to clear the PENDING_OUTPUT bit
+		 * in order to avoid an infinite loop in printk_kthread
+		 * function when console_unlock() cannot flush messages
+		 * because we suspended consoles. Someone else will print
+		 * the messages from resume_console().
+		 */
+		clear_bit(PRINTK_PENDING_OUTPUT, &printk_pending);
 		up_console_sem();
 		return;
 	}
@@ -2180,6 +2319,7 @@ void console_unlock(void)
 	 * console.
 	 */
 	if (!can_use_console()) {
+		clear_bit(PRINTK_PENDING_OUTPUT, &printk_pending);
 		console_locked = 0;
 		up_console_sem();
 		return;
@@ -2189,6 +2329,7 @@ void console_unlock(void)
 		struct printk_log *msg;
 		size_t ext_len = 0;
 		size_t len;
+		u64 printing_start_ts = local_clock();
 
 		printk_safe_enter_irqsave(flags);
 		raw_spin_lock(&logbuf_lock);
@@ -2208,7 +2349,7 @@ void console_unlock(void)
 			len = 0;
 		}
 skip:
-		if (console_seq == log_next_seq)
+		if (do_handoff || console_seq == log_next_seq)
 			break;
 
 		msg = log_from_idx(console_idx);
@@ -2240,9 +2381,12 @@ void console_unlock(void)
 		stop_critical_timings();	/* don't trace print latency */
 		call_console_drivers(ext_text, ext_len, text, len);
 		start_critical_timings();
+
+		/* Must be called under printk_safe */
+		do_handoff = should_handoff_printing(printing_start_ts);
 		printk_safe_exit_irqrestore(flags);
 
-		if (do_cond_resched)
+		if (!do_handoff && do_cond_resched)
 			cond_resched();
 	}
 	console_locked = 0;
@@ -2256,14 +2400,18 @@ void console_unlock(void)
 	up_console_sem();
 
 	/*
-	 * Someone could have filled up the buffer again, so re-check if there's
-	 * something to flush. In case we cannot trylock the console_sem again,
-	 * there's a new owner and the console_unlock() from them will do the
-	 * flush, no worries.
+	 * Someone could have filled up the buffer again, so re-check
+	 * if there's something to flush. In case when trylock fails,
+	 * there's a new owner and the console_unlock() from them will
+	 * do the flush, no worries.
 	 */
-	raw_spin_lock(&logbuf_lock);
-	retry = console_seq != log_next_seq;
-	raw_spin_unlock(&logbuf_lock);
+	if (!do_handoff) {
+		raw_spin_lock(&logbuf_lock);
+		retry = console_seq != log_next_seq;
+		if (!retry)
+			clear_bit(PRINTK_PENDING_OUTPUT, &printk_pending);
+		raw_spin_unlock(&logbuf_lock);
+	}
 	printk_safe_exit_irqrestore(flags);
 
 	if (retry && console_trylock())
@@ -2687,8 +2835,11 @@ late_initcall(printk_late_init);
 #if defined CONFIG_PRINTK
 static void wake_up_klogd_work_func(struct irq_work *irq_work)
 {
-	if (test_and_clear_bit(PRINTK_PENDING_OUTPUT, &printk_pending)) {
-		/* If trylock fails, someone else is doing the printing */
+	if (test_bit(PRINTK_PENDING_OUTPUT, &printk_pending)) {
+		/*
+		 * If trylock fails, someone else is doing the printing.
+		 * PRINTK_PENDING_OUTPUT bit is cleared by console_unlock().
+		 */
 		if (console_trylock())
 			console_unlock();
 	}
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [RFC][PATCHv6 03/12] printk: consider watchdogs thresholds for offloading
  2017-12-04 13:48 [RFC][PATCHv6 00/12] printk: introduce printing kernel thread Sergey Senozhatsky
  2017-12-04 13:48 ` [RFC][PATCHv6 01/12] printk: move printk_pending out of per-cpu Sergey Senozhatsky
  2017-12-04 13:48 ` [RFC][PATCHv6 02/12] printk: introduce printing kernel thread Sergey Senozhatsky
@ 2017-12-04 13:48 ` Sergey Senozhatsky
  2017-12-04 13:48 ` [RFC][PATCHv6 04/12] printk: add sync printk_emergency API Sergey Senozhatsky
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 79+ messages in thread
From: Sergey Senozhatsky @ 2017-12-04 13:48 UTC (permalink / raw)
  To: Steven Rostedt, Petr Mladek
  Cc: Jan Kara, Andrew Morton, Peter Zijlstra, Rafael Wysocki,
	Pavel Machek, Tetsuo Handa, Tejun Heo, linux-kernel,
	Sergey Senozhatsky, Sergey Senozhatsky

There are several watchdogs, using different timeout values, so
we need to set printk offloading threshold appropriately. This
patch set extends offloading_threshold() to take RCU, softlockup
and hardlockup timeouts into consideration.

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
---
 kernel/printk/printk.c | 21 ++++++++++++++++++++-
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index f4e84f83bff5..3d4df3f02854 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -492,10 +492,29 @@ void printk_emergency_end(void)
 }
 EXPORT_SYMBOL_GPL(printk_emergency_end);
 
+/*
+ * Adjust max timeout value in the following order:
+ * a) 1/2 of RCU stall timeout - it is usually the largest
+ * b) 1/2 of hardlockup threshold (if enabled) - lower than softlockup
+ * c) 1/2 of softlockup threshold (if enabled)
+ * e) default value - 10 seconds
+ */
 static inline int offloading_threshold(void)
 {
+	int timeout = CONFIG_RCU_CPU_STALL_TIMEOUT / 2 + 1;
+
+#ifdef CONFIG_LOCKUP_DETECTOR
+	/* Hardlockup detector has a sample_period of `watchdog_thresh'. */
+	if ((watchdog_enabled & NMI_WATCHDOG_ENABLED) && watchdog_thresh)
+		return min(timeout, watchdog_thresh / 2) + 1;
+
+	/* Softlockup uses '2 * watchdog_thresh' as a threshold value. */
+	if ((watchdog_enabled & SOFT_WATCHDOG_ENABLED) && watchdog_thresh)
+		return min(timeout, watchdog_thresh) + 1;
+#endif
+
 	/* Default threshold value is 10 seconds */
-	return 10;
+	return min(10, timeout);
 }
 
 /*
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [RFC][PATCHv6 04/12] printk: add sync printk_emergency API
  2017-12-04 13:48 [RFC][PATCHv6 00/12] printk: introduce printing kernel thread Sergey Senozhatsky
                   ` (2 preceding siblings ...)
  2017-12-04 13:48 ` [RFC][PATCHv6 03/12] printk: consider watchdogs thresholds for offloading Sergey Senozhatsky
@ 2017-12-04 13:48 ` Sergey Senozhatsky
  2017-12-04 13:48 ` [RFC][PATCHv6 05/12] printk: enable printk offloading Sergey Senozhatsky
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 79+ messages in thread
From: Sergey Senozhatsky @ 2017-12-04 13:48 UTC (permalink / raw)
  To: Steven Rostedt, Petr Mladek
  Cc: Jan Kara, Andrew Morton, Peter Zijlstra, Rafael Wysocki,
	Pavel Machek, Tetsuo Handa, Tejun Heo, linux-kernel,
	Sergey Senozhatsky, Sergey Senozhatsky

We already have `async' printk_emergency_begin(), which returns
immediately and does not guarantee that `printk_kthread' will
stop by the time it returns. Add `sync' version, which waits for
`printk_kthread' to stop.

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
---
 include/linux/console.h |  2 ++
 kernel/printk/printk.c  | 52 ++++++++++++++++++++++++++++++++++++++++++++-----
 2 files changed, 49 insertions(+), 5 deletions(-)

diff --git a/include/linux/console.h b/include/linux/console.h
index 07005db4c788..8ce29b2381d2 100644
--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -189,6 +189,8 @@ extern void resume_console(void);
 
 extern void printk_emergency_begin(void);
 extern void printk_emergency_end(void);
+extern int printk_emergency_begin_sync(void);
+extern int printk_emergency_end_sync(void);
 
 int mda_console_init(void);
 void prom_con_init(void);
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 3d4df3f02854..16f5f5c7e541 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -472,6 +472,13 @@ static inline bool printk_offloading_enabled(void)
 		atomic_read(&printk_emergency) == 0;
 }
 
+static inline bool printk_kthread_should_stop(bool emergency)
+{
+	if (current != printk_kthread)
+		return false;
+	return emergency || kthread_should_park();
+}
+
 /*
  * This disables printing offloading and instead attempts
  * to do the usual console_trylock()->console_unlock().
@@ -492,6 +499,34 @@ void printk_emergency_end(void)
 }
 EXPORT_SYMBOL_GPL(printk_emergency_end);
 
+/*
+ * This disables printing offloading and instead attempts
+ * to do the usual console_trylock()->console_unlock().
+ *
+ * Note, this does wait for printk_kthread to stop.
+ */
+int printk_emergency_begin_sync(void)
+{
+	atomic_inc(&printk_emergency);
+	if (!printk_kthread)
+		return -EINVAL;
+
+	return kthread_park(printk_kthread);
+}
+EXPORT_SYMBOL_GPL(printk_emergency_begin_sync);
+
+/* This re-enables printk_kthread offloading. */
+int printk_emergency_end_sync(void)
+{
+	atomic_dec(&printk_emergency);
+	if (!printk_kthread)
+		return -EINVAL;
+
+	kthread_unpark(printk_kthread);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(printk_emergency_end_sync);
+
 /*
  * Adjust max timeout value in the following order:
  * a) 1/2 of RCU stall timeout - it is usually the largest
@@ -531,13 +566,14 @@ static inline bool should_handoff_printing(u64 printing_start_ts)
 	static struct task_struct *printing_task;
 	static u64 printing_elapsed;
 	u64 now = local_clock();
+	bool emergency = !printk_offloading_enabled();
+
+	/* We are in emergency mode, disable printk_kthread */
+	if (printk_kthread_should_stop(emergency))
+		return true;
 
-	if (!printk_offloading_enabled()) {
-		/* We are in emergency mode, disable printk_kthread */
-		if (current == printk_kthread)
-			return true;
+	if (emergency)
 		return false;
-	}
 
 	/* A new task - reset the counters. */
 	if (printing_task != current) {
@@ -2017,6 +2053,12 @@ EXPORT_SYMBOL_GPL(printk_emergency_begin);
 void printk_emergency_end(void) {}
 EXPORT_SYMBOL_GPL(printk_emergency_end);
 
+int printk_emergency_begin_sync(void) { return 0; }
+EXPORT_SYMBOL_GPL(printk_emergency_begin_sync);
+
+int printk_emergency_end_sync(void) { return 0; }
+EXPORT_SYMBOL_GPL(printk_emergency_end_sync);
+
 static bool should_handoff_printing(u64 printing_start_ts) { return false; }
 #endif /* CONFIG_PRINTK */
 
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [RFC][PATCHv6 05/12] printk: enable printk offloading
  2017-12-04 13:48 [RFC][PATCHv6 00/12] printk: introduce printing kernel thread Sergey Senozhatsky
                   ` (3 preceding siblings ...)
  2017-12-04 13:48 ` [RFC][PATCHv6 04/12] printk: add sync printk_emergency API Sergey Senozhatsky
@ 2017-12-04 13:48 ` Sergey Senozhatsky
  2017-12-04 13:48 ` [RFC][PATCHv6 06/12] PM: switch between printk emergency modes Sergey Senozhatsky
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 79+ messages in thread
From: Sergey Senozhatsky @ 2017-12-04 13:48 UTC (permalink / raw)
  To: Steven Rostedt, Petr Mladek
  Cc: Jan Kara, Andrew Morton, Peter Zijlstra, Rafael Wysocki,
	Pavel Machek, Tetsuo Handa, Tejun Heo, linux-kernel,
	Sergey Senozhatsky, Sergey Senozhatsky

Initialize kernel printing thread and make printk offloading
possible. By default `offloading_enabled' module parameter is
set to `false', so no offloading will take place, unless
requested by user.

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
---
 kernel/printk/printk.c | 59 ++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 59 insertions(+)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 16f5f5c7e541..a427cee2aa00 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2914,6 +2914,65 @@ static DEFINE_PER_CPU(struct irq_work, wake_up_klogd_work) = {
 	.flags = IRQ_WORK_LAZY,
 };
 
+static int printk_kthread_func(void *data)
+{
+	while (1) {
+		set_current_state(TASK_INTERRUPTIBLE);
+		/*
+		 * We must check `printk_emergency' as well, to let
+		 * printk_emergency_begin() stop active `printk_kthread' at
+		 * some point. Otherwise we can end up in a loop:
+		 *   - we bail out of console_unlock() because of
+		 *     printk_kthread_should_stop()
+		 * and
+		 *   - don't schedule() and attempt to return back
+		 *     immediately to console_unlock() because we
+		 *     see PRINTK_PENDING_OUTPUT bit set.
+		 */
+		if (!test_bit(PRINTK_PENDING_OUTPUT, &printk_pending) ||
+				atomic_read(&printk_emergency) != 0)
+			schedule();
+
+		__set_current_state(TASK_RUNNING);
+
+		/* We might have been woken for stop */
+		if (kthread_should_park())
+			kthread_parkme();
+
+		console_lock();
+		console_unlock();
+
+		/* We might have been blocked on console_sem */
+		if (kthread_should_park())
+			kthread_parkme();
+	}
+
+	return 0;
+}
+
+/*
+ * Init printk kthread at late_initcall stage, after core/arch/device/etc.
+ * initialization.
+ */
+static int __init init_printk_kthread(void)
+{
+	struct task_struct *thread;
+
+	if (!alloc_cpumask_var(&printk_offload_cpus, GFP_KERNEL))
+		return -ENOMEM;
+
+	thread = kthread_run(printk_kthread_func, NULL, "printk");
+	if (IS_ERR(thread)) {
+		pr_err("printk: unable to create printing thread\n");
+		free_cpumask_var(printk_offload_cpus);
+		return PTR_ERR(thread);
+	}
+
+	printk_kthread = thread;
+	return 0;
+}
+late_initcall(init_printk_kthread);
+
 void wake_up_klogd(void)
 {
 	preempt_disable();
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [RFC][PATCHv6 06/12] PM: switch between printk emergency modes
  2017-12-04 13:48 [RFC][PATCHv6 00/12] printk: introduce printing kernel thread Sergey Senozhatsky
                   ` (4 preceding siblings ...)
  2017-12-04 13:48 ` [RFC][PATCHv6 05/12] printk: enable printk offloading Sergey Senozhatsky
@ 2017-12-04 13:48 ` Sergey Senozhatsky
  2017-12-04 13:48 ` [RFC][PATCHv6 07/12] printk: register syscore notifier Sergey Senozhatsky
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 79+ messages in thread
From: Sergey Senozhatsky @ 2017-12-04 13:48 UTC (permalink / raw)
  To: Steven Rostedt, Petr Mladek
  Cc: Jan Kara, Andrew Morton, Peter Zijlstra, Rafael Wysocki,
	Pavel Machek, Tetsuo Handa, Tejun Heo, linux-kernel,
	Sergey Senozhatsky, Sergey Senozhatsky

It's not always possible/safe to wake_up() printk kernel
thread. For example, late suspend/early resume may printk()
while timekeeping is not initialized yet, so calling into the
scheduler may result in recursive warnings.

Another thing to notice is the fact that PM at some point
freezes user space and kernel threads: freeze_processes()
and freeze_kernel_threads(), correspondingly. Thus we need
printk() to operate in emergency mode there and attempt to
immediately flush pending kernel message to the console.

This patch adds printk emergency switches to dpm.

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Suggested-by: Rafael J. Wysocki <rjw@rjwysocki.net>
---
 drivers/base/power/main.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
index c0d5f4a3611d..5bc2cf1f812c 100644
--- a/drivers/base/power/main.c
+++ b/drivers/base/power/main.c
@@ -34,6 +34,7 @@
 #include <linux/cpufreq.h>
 #include <linux/cpuidle.h>
 #include <linux/timer.h>
+#include <linux/console.h>
 
 #include "../base.h"
 #include "power.h"
@@ -1059,6 +1060,8 @@ void dpm_complete(pm_message_t state)
 	/* Allow device probing and trigger re-probing of deferred devices */
 	device_unblock_probing();
 	trace_suspend_resume(TPS("dpm_complete"), state.event, false);
+
+	printk_emergency_end_sync();
 }
 
 /**
@@ -1789,6 +1792,8 @@ int dpm_prepare(pm_message_t state)
 	trace_suspend_resume(TPS("dpm_prepare"), state.event, true);
 	might_sleep();
 
+	printk_emergency_begin_sync();
+
 	/*
 	 * Give a chance for the known devices to complete their probes, before
 	 * disable probing of devices. This sync point is important at least
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [RFC][PATCHv6 07/12] printk: register syscore notifier
  2017-12-04 13:48 [RFC][PATCHv6 00/12] printk: introduce printing kernel thread Sergey Senozhatsky
                   ` (5 preceding siblings ...)
  2017-12-04 13:48 ` [RFC][PATCHv6 06/12] PM: switch between printk emergency modes Sergey Senozhatsky
@ 2017-12-04 13:48 ` Sergey Senozhatsky
  2017-12-04 13:48 ` [RFC][PATCHv6 08/12] printk: force printk_kthread to offload printing Sergey Senozhatsky
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 79+ messages in thread
From: Sergey Senozhatsky @ 2017-12-04 13:48 UTC (permalink / raw)
  To: Steven Rostedt, Petr Mladek
  Cc: Jan Kara, Andrew Morton, Peter Zijlstra, Rafael Wysocki,
	Pavel Machek, Tetsuo Handa, Tejun Heo, linux-kernel,
	Sergey Senozhatsky, Sergey Senozhatsky

We need to switch to emergency printk mode in kernel_kexec(). One
kernel_kexec() branch calls kernel_restart_prepare(), which updates
`system_state', however, the other one, when user requested to
->preserve_context, does not and we are lacking the information
in printk about kexec being executed. Register a syscore notifier
so printk will be notified by syscore_suspend().

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
---
 kernel/printk/printk.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index a427cee2aa00..372c71f69e8e 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -49,6 +49,7 @@
 #include <linux/sched/debug.h>
 #include <linux/sched/task_stack.h>
 #include <linux/kthread.h>
+#include <linux/syscore_ops.h>
 
 #include <linux/uaccess.h>
 #include <asm/sections.h>
@@ -2914,6 +2915,22 @@ static DEFINE_PER_CPU(struct irq_work, wake_up_klogd_work) = {
 	.flags = IRQ_WORK_LAZY,
 };
 
+static int printk_syscore_suspend(void)
+{
+	printk_emergency_begin();
+	return 0;
+}
+
+static void printk_syscore_resume(void)
+{
+	printk_emergency_end();
+}
+
+static struct syscore_ops printk_syscore_ops = {
+	.suspend = printk_syscore_suspend,
+	.resume = printk_syscore_resume,
+};
+
 static int printk_kthread_func(void *data)
 {
 	while (1) {
@@ -2961,6 +2978,8 @@ static int __init init_printk_kthread(void)
 	if (!alloc_cpumask_var(&printk_offload_cpus, GFP_KERNEL))
 		return -ENOMEM;
 
+	register_syscore_ops(&printk_syscore_ops);
+
 	thread = kthread_run(printk_kthread_func, NULL, "printk");
 	if (IS_ERR(thread)) {
 		pr_err("printk: unable to create printing thread\n");
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [RFC][PATCHv6 08/12] printk: force printk_kthread to offload printing
  2017-12-04 13:48 [RFC][PATCHv6 00/12] printk: introduce printing kernel thread Sergey Senozhatsky
                   ` (6 preceding siblings ...)
  2017-12-04 13:48 ` [RFC][PATCHv6 07/12] printk: register syscore notifier Sergey Senozhatsky
@ 2017-12-04 13:48 ` Sergey Senozhatsky
  2017-12-04 13:48 ` [RFC][PATCHv6 09/12] printk: do not cond_resched() when we can offload Sergey Senozhatsky
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 79+ messages in thread
From: Sergey Senozhatsky @ 2017-12-04 13:48 UTC (permalink / raw)
  To: Steven Rostedt, Petr Mladek
  Cc: Jan Kara, Andrew Morton, Peter Zijlstra, Rafael Wysocki,
	Pavel Machek, Tetsuo Handa, Tejun Heo, linux-kernel,
	Sergey Senozhatsky, Sergey Senozhatsky

As of now we don't `offload' printing from printk_kthread and
print all pending logbuf messages. This, however, may have a
negative effect. We still hold console_sem as long as we have
got messages to print, and there might be other console_lock()
callers sleeping on console_sem in TASK_UNINTERRUPTIBLE,
including user space processes (tty_open, drm IOCTL, etc.).

A bigger issue is that we still can schedule with console_sem
locked, which is not right and we want to avoid it. This patch
also serves as a preparation for preemption disabled printing
loop.

So we need to up() console_sem every once in a while, even if
current console_sem owner is printk_kthread, just in order to
wake up those other processes that can sleep on console_sem.

If there are no tasks sleeping on console_sem, then printk_kthread
will immediately return back to console_unclok(), because we don't
clear the PRINTK_PENDING_OUTPUT bit and printk_kthread checks it
before it decides to schedule().

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
---
 kernel/printk/printk.c | 21 +++++++++++++++++----
 1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 372c71f69e8e..4d91182e25e3 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -586,14 +586,27 @@ static inline bool should_handoff_printing(u64 printing_start_ts)
 	if (time_after_eq64(now, printing_start_ts))
 		printing_elapsed += now - printing_start_ts;
 
-	/* Once we offloaded to printk_ktread - keep printing */
-	if (current == printk_kthread)
-		return false;
-
 	/* Shrink down to seconds and check the offloading threshold */
 	if ((printing_elapsed >> 30LL) < offloading_threshold())
 		return false;
 
+	if (current == printk_kthread) {
+		/*
+		 * All tasks must offload - we don't want to keep console_sem
+		 * locked for too long. However, printk_kthread may be the
+		 * only process left willing to down(). So we may return back
+		 * immediately after we leave, because PRINTK_PENDING_OUTPUT
+		 * bit is still set and printk_kthread_func() won't schedule.
+		 * This still counts as offloading, so reset the stats.
+		 *
+		 * Should `printk_kthread' immediately return back to
+		 * console_unlock(), it will have another full
+		 * `offloading_threshold()' time slice.
+		 */
+		printing_elapsed = 0;
+		return true;
+	}
+
 	/*
 	 * We try to set `printk_kthread' CPU affinity to any online CPU
 	 * except for this_cpu. Because otherwise `printk_kthread' may be
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [RFC][PATCHv6 09/12] printk: do not cond_resched() when we can offload
  2017-12-04 13:48 [RFC][PATCHv6 00/12] printk: introduce printing kernel thread Sergey Senozhatsky
                   ` (7 preceding siblings ...)
  2017-12-04 13:48 ` [RFC][PATCHv6 08/12] printk: force printk_kthread to offload printing Sergey Senozhatsky
@ 2017-12-04 13:48 ` Sergey Senozhatsky
  2017-12-04 13:48 ` [RFC][PATCHv6 10/12] printk: move offloading logic to per-cpu Sergey Senozhatsky
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 79+ messages in thread
From: Sergey Senozhatsky @ 2017-12-04 13:48 UTC (permalink / raw)
  To: Steven Rostedt, Petr Mladek
  Cc: Jan Kara, Andrew Morton, Peter Zijlstra, Rafael Wysocki,
	Pavel Machek, Tetsuo Handa, Tejun Heo, linux-kernel,
	Sergey Senozhatsky, Sergey Senozhatsky

console_unlock() may sleep with console_sem locked, which is a bit
counter intuitive: we neither print pending logbuf messages to the
serial console, nor let anyone else to do it for us.

With printing offloading enabled, however, we can disable preemption,
because we know for sure how long we can stay in console_unlock() and
that eventually we will offload to another task.

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
---
 kernel/printk/printk.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 4d91182e25e3..2a12d4c02da1 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2074,6 +2074,7 @@ int printk_emergency_end_sync(void) { return 0; }
 EXPORT_SYMBOL_GPL(printk_emergency_end_sync);
 
 static bool should_handoff_printing(u64 printing_start_ts) { return false; }
+static bool printk_offloading_enabled(void) { return false; }
 #endif /* CONFIG_PRINTK */
 
 #ifdef CONFIG_EARLY_PRINTK
@@ -2385,6 +2386,14 @@ void console_unlock(void)
 	 * and cleared after the the "again" goto label.
 	 */
 	do_cond_resched = console_may_schedule;
+	/*
+	 * Forbid scheduling under the console_sem lock when offloading
+	 * is enabled. Scheduling will just slow down the print out in
+	 * this case.
+	 */
+	if (printk_offloading_enabled())
+		do_cond_resched = 0;
+
 again:
 	console_may_schedule = 0;
 
@@ -2400,6 +2409,7 @@ void console_unlock(void)
 		return;
 	}
 
+	preempt_disable();
 	for (;;) {
 		struct printk_log *msg;
 		size_t ext_len = 0;
@@ -2461,8 +2471,11 @@ void console_unlock(void)
 		do_handoff = should_handoff_printing(printing_start_ts);
 		printk_safe_exit_irqrestore(flags);
 
-		if (!do_handoff && do_cond_resched)
+		if (!do_handoff && do_cond_resched) {
+			preempt_enable();
 			cond_resched();
+			preempt_disable();
+		}
 	}
 	console_locked = 0;
 
@@ -2473,6 +2486,7 @@ void console_unlock(void)
 	raw_spin_unlock(&logbuf_lock);
 
 	up_console_sem();
+	preempt_enable();
 
 	/*
 	 * Someone could have filled up the buffer again, so re-check
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [RFC][PATCHv6 10/12] printk: move offloading logic to per-cpu
  2017-12-04 13:48 [RFC][PATCHv6 00/12] printk: introduce printing kernel thread Sergey Senozhatsky
                   ` (8 preceding siblings ...)
  2017-12-04 13:48 ` [RFC][PATCHv6 09/12] printk: do not cond_resched() when we can offload Sergey Senozhatsky
@ 2017-12-04 13:48 ` Sergey Senozhatsky
  2017-12-04 13:48 ` [RFC][PATCHv6 11/12] printk: add offloading watchdog API Sergey Senozhatsky
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 79+ messages in thread
From: Sergey Senozhatsky @ 2017-12-04 13:48 UTC (permalink / raw)
  To: Steven Rostedt, Petr Mladek
  Cc: Jan Kara, Andrew Morton, Peter Zijlstra, Rafael Wysocki,
	Pavel Machek, Tetsuo Handa, Tejun Heo, linux-kernel,
	Sergey Senozhatsky, Sergey Senozhatsky

We have a global offloading state and make the offloading
decision based on printing task pointer and elapsed time.
If we keep seeing the same task performing printing for too
long we request offloading; otherwise, when we see that
printing is now performed by another task, we reset the
printing task pointer and its elapsed counter.

This, however, will not work in the following case:

===============================================================================

CPU0						CPU1
//taskA						//taskB
preempt_disable()				preempt_disable()

 printk()
  console_trylock()
  console_unlock()
   printing_task = taskA
  up()
						printk()
						 console_trylock()
						 console_unlock()
						  printing_task = taskB
						  ^^^ reset offloading control
						up()
 printk()
  console_trylock()
  console_unlock()
   printing_task = taskA
   ^^^ reset offloading control
  up()
						printk()
						 console_trylock()
						 console_unlock()
						  printing_task = taskB
						  ^^^ reset offloading control
						up()
...
===============================================================================

So this printk ping-pong confuses our offloading control logic.
Move it to per-CPU area and have a separate offloading control
on every CPU.

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
---
 kernel/printk/printk.c | 22 +++++++++++++++-------
 1 file changed, 15 insertions(+), 7 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 2a12d4c02da1..2f9697c71cf1 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -560,12 +560,12 @@ static inline int offloading_threshold(void)
  * amount of time a process can print from console_unlock().
  *
  * This function must be called from 'printk_safe' context under
- * console_sem lock.
+ * console_sem lock with preemption disabled.
  */
 static inline bool should_handoff_printing(u64 printing_start_ts)
 {
+	static DEFINE_PER_CPU(u64, printing_elapsed);
 	static struct task_struct *printing_task;
-	static u64 printing_elapsed;
 	u64 now = local_clock();
 	bool emergency = !printk_offloading_enabled();
 
@@ -578,19 +578,26 @@ static inline bool should_handoff_printing(u64 printing_start_ts)
 
 	/* A new task - reset the counters. */
 	if (printing_task != current) {
+		__this_cpu_write(printing_elapsed, 0);
 		printing_task = current;
-		printing_elapsed = 0;
 		return false;
 	}
 
-	if (time_after_eq64(now, printing_start_ts))
-		printing_elapsed += now - printing_start_ts;
+	if (time_after_eq64(now, printing_start_ts)) {
+		u64 t = __this_cpu_read(printing_elapsed);
+
+		t += now - printing_start_ts;
+		__this_cpu_write(printing_elapsed, t);
+	}
 
 	/* Shrink down to seconds and check the offloading threshold */
-	if ((printing_elapsed >> 30LL) < offloading_threshold())
+	if ((__this_cpu_read(printing_elapsed) >> 30LL) <
+			offloading_threshold())
 		return false;
 
 	if (current == printk_kthread) {
+		unsigned int cpu;
+
 		/*
 		 * All tasks must offload - we don't want to keep console_sem
 		 * locked for too long. However, printk_kthread may be the
@@ -603,7 +610,8 @@ static inline bool should_handoff_printing(u64 printing_start_ts)
 		 * console_unlock(), it will have another full
 		 * `offloading_threshold()' time slice.
 		 */
-		printing_elapsed = 0;
+		for_each_possible_cpu(cpu)
+			per_cpu(printing_elapsed, cpu) = 0;
 		return true;
 	}
 
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [RFC][PATCHv6 11/12] printk: add offloading watchdog API
  2017-12-04 13:48 [RFC][PATCHv6 00/12] printk: introduce printing kernel thread Sergey Senozhatsky
                   ` (9 preceding siblings ...)
  2017-12-04 13:48 ` [RFC][PATCHv6 10/12] printk: move offloading logic to per-cpu Sergey Senozhatsky
@ 2017-12-04 13:48 ` Sergey Senozhatsky
  2017-12-04 13:48 ` [RFC][PATCHv6 12/12] printk: improve printk offloading mechanism Sergey Senozhatsky
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 79+ messages in thread
From: Sergey Senozhatsky @ 2017-12-04 13:48 UTC (permalink / raw)
  To: Steven Rostedt, Petr Mladek
  Cc: Jan Kara, Andrew Morton, Peter Zijlstra, Rafael Wysocki,
	Pavel Machek, Tetsuo Handa, Tejun Heo, linux-kernel,
	Sergey Senozhatsky, Sergey Senozhatsky

Introduce printk_offloading watchdog API to control the behaviour
of offloading. Some of the control paths, basically, disable the
soft-lockup watchdog by calling touch_all_softlockup_watchdogs().
One such example could be sysrq-t:

	__handle_sysrq()
	 sysrq_handle_showstate()
	  show_state()
	   show_state_filter()
	    touch_all_softlockup_watchdogs()

This control path deliberately and forcibly silent the watchdog
for various reasons, one of which is the fact that sysrq-t may
be called when system is in bad condition and we need to print
backtraces as soon as possible. The argument here might be that
  "In this case calling into the scheduler from printk offloading
   may be dangerous and in general should be avoided"

But this argument is sort of false. And the reason is that we
already can call into the scheduler from sysrq, simply because
every time we call printk() from show_state() loop we end up
in up():

</sysrq>
	__handle_sysrq()
	 sysrq_handle_showstate()
	  show_state()
	   show_state_filter()
	    printk()
	     console_unlock()
	      up()
	       wake_up_process()

So offloading to printk_kthread does not add anything to the
picture. It, however, does change the behaviour of sysrq in
some corner cases, and we have regression reports. The problem
is that sysrq attempts to "flush all pending logbuf messages"
before it actually handles the sysrq: all those pr_info()
and pr_cont() in __handle_sysrq(). Offloading to printk
kthread will let sysrq to handle sysrq event before we flush
logbuf entries, so emergency_restart(), for instance, may lead
to missing kernel logs.

The thing we need to notice is that such "flush logbuf from
sysrq" has never been guaranteed and, in fact, is surprising,
to say the least. For example, emergency_restart() is probably
should not call into any subsystems and just reboot the kernel.
We have what we have, and with this patch we are just trying
to preserve the existing behaviour.

This patch adds touch_printk_offloading_watchdog() call to
watchdog's touch_softlockup_watchdog(), so each time a control
path resets watchdog it also resets the printk offloading
timestamp, effective disabling printing offloading.

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
---
 include/linux/console.h |  2 ++
 kernel/printk/printk.c  | 27 ++++++++++++++++++++++++---
 kernel/watchdog.c       |  6 +++++-
 3 files changed, 31 insertions(+), 4 deletions(-)

diff --git a/include/linux/console.h b/include/linux/console.h
index 8ce29b2381d2..7408a345f4b1 100644
--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -191,6 +191,8 @@ extern void printk_emergency_begin(void);
 extern void printk_emergency_end(void);
 extern int printk_emergency_begin_sync(void);
 extern int printk_emergency_end_sync(void);
+extern void touch_printk_offloading_watchdog(void);
+extern void touch_printk_offloading_watchdog_on_cpu(unsigned int cpu);
 
 int mda_console_init(void);
 void prom_con_init(void);
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 2f9697c71cf1..2a1ec075cc13 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -460,6 +460,9 @@ static atomic_t printk_emergency __read_mostly;
  */
 static bool offloading_enabled;
 
+/* How long have this CPU spent in console_unlock() */
+static DEFINE_PER_CPU(u64, printing_elapsed);
+
 module_param_named(offloading_enabled, offloading_enabled, bool, 0644);
 MODULE_PARM_DESC(offloading_enabled,
 		 "enable/disable print out offloading to printk kthread");
@@ -553,6 +556,23 @@ static inline int offloading_threshold(void)
 	return min(10, timeout);
 }
 
+/*
+ * Must be called by the watchdog. When control path calls
+ * touch_all_softlockup_watchdogs() or touch_softlockup_watchdog()
+ * to silent the watchdog we need to also reset the printk
+ * offloading counter in order to avoid printk offloading from a
+ * potentially unsafe context.
+ */
+void touch_printk_offloading_watchdog(void)
+{
+	this_cpu_write(printing_elapsed, 0);
+}
+
+void touch_printk_offloading_watchdog_on_cpu(unsigned int cpu)
+{
+	per_cpu(printing_elapsed, cpu) = 0;
+}
+
 /*
  * Under heavy printing load or with a slow serial console (or both)
  * console_unlock() can stall CPUs, which can result in soft/hard-lockups,
@@ -564,7 +584,6 @@ static inline int offloading_threshold(void)
  */
 static inline bool should_handoff_printing(u64 printing_start_ts)
 {
-	static DEFINE_PER_CPU(u64, printing_elapsed);
 	static struct task_struct *printing_task;
 	u64 now = local_clock();
 	bool emergency = !printk_offloading_enabled();
@@ -578,7 +597,7 @@ static inline bool should_handoff_printing(u64 printing_start_ts)
 
 	/* A new task - reset the counters. */
 	if (printing_task != current) {
-		__this_cpu_write(printing_elapsed, 0);
+		touch_printk_offloading_watchdog();
 		printing_task = current;
 		return false;
 	}
@@ -611,7 +630,7 @@ static inline bool should_handoff_printing(u64 printing_start_ts)
 		 * `offloading_threshold()' time slice.
 		 */
 		for_each_possible_cpu(cpu)
-			per_cpu(printing_elapsed, cpu) = 0;
+			touch_printk_offloading_watchdog_on_cpu(cpu);
 		return true;
 	}
 
@@ -2083,6 +2102,8 @@ EXPORT_SYMBOL_GPL(printk_emergency_end_sync);
 
 static bool should_handoff_printing(u64 printing_start_ts) { return false; }
 static bool printk_offloading_enabled(void) { return false; }
+void touch_printk_offloading_watchdog(void) {}
+void touch_printk_offloading_watchdog_on_cpu(unsigned int cpu) {}
 #endif /* CONFIG_PRINTK */
 
 #ifdef CONFIG_EARLY_PRINTK
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 576d18045811..27b7ce1088c7 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -26,6 +26,7 @@
 #include <linux/sched/clock.h>
 #include <linux/sched/debug.h>
 #include <linux/sched/isolation.h>
+#include <linux/console.h>
 
 #include <asm/irq_regs.h>
 #include <linux/kvm_para.h>
@@ -277,6 +278,7 @@ void touch_softlockup_watchdog_sched(void)
 
 void touch_softlockup_watchdog(void)
 {
+	touch_printk_offloading_watchdog();
 	touch_softlockup_watchdog_sched();
 	wq_watchdog_touch(raw_smp_processor_id());
 }
@@ -295,8 +297,10 @@ void touch_all_softlockup_watchdogs(void)
 	 * update as well, the only side effect might be a cycle delay for
 	 * the softlockup check.
 	 */
-	for_each_cpu(cpu, &watchdog_allowed_mask)
+	for_each_cpu(cpu, &watchdog_allowed_mask) {
 		per_cpu(watchdog_touch_ts, cpu) = 0;
+		touch_printk_offloading_watchdog_on_cpu(cpu);
+	}
 	wq_watchdog_touch(-1);
 }
 
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [RFC][PATCHv6 12/12] printk: improve printk offloading mechanism
  2017-12-04 13:48 [RFC][PATCHv6 00/12] printk: introduce printing kernel thread Sergey Senozhatsky
                   ` (10 preceding siblings ...)
  2017-12-04 13:48 ` [RFC][PATCHv6 11/12] printk: add offloading watchdog API Sergey Senozhatsky
@ 2017-12-04 13:48 ` Sergey Senozhatsky
  2017-12-04 13:53 ` [PATCH 0/4] printk: offloading testing module/trace events Sergey Senozhatsky
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 79+ messages in thread
From: Sergey Senozhatsky @ 2017-12-04 13:48 UTC (permalink / raw)
  To: Steven Rostedt, Petr Mladek
  Cc: Jan Kara, Andrew Morton, Peter Zijlstra, Rafael Wysocki,
	Pavel Machek, Tetsuo Handa, Tejun Heo, linux-kernel,
	Sergey Senozhatsky, Sergey Senozhatsky

The existing offloading mechanism break out of console_unlock()
loop without knowing if offloading will ever succeed. This is
not a big problem for
	while (...)
		printk()

loops, because the control path will return back to console_unlock()
anyway; which is not always true for the following case

	CPU0				CPU1
	console_lock()
					printk()
					...
					printk()
	console_unlock()

breaking out of console_unlock() in this case might leave pending
messages in the logbuf.

Steven Rostedt came up with the following printing hand off scheme [1]:

: I added a "console_owner" which is set to a task that is actively
: writing to the consoles. It is *not* the same an the owner of the
: console_lock. It is only set when doing the calls to the console
: functions. It is protected by a console_owner_lock which is a raw spin
: lock.
:
: There is a console_waiter. This is set when there is an active console
: owner that is not current, and waiter is not set. This too is protected
: by console_owner_lock.
:
: In printk() when it tries to write to the consoles, we have:
:
:        if (console_trylock())
:                console_unlock();
:
: Now I added an else, which will check if there is an active owner, and
: no current waiter. If that is the case, then console_waiter is set, and
: the task goes into a spin until it is no longer set.
:
: When the active console owner finishes writing the current message to
: the consoles, it grabs the console_owner_lock and sees if there is a
: waiter, and clears console_owner.
:
: If there is a waiter, then it breaks out of the loop, clears the waiter
: flag (because that will release the waiter from its spin), and exits.
: Note, it does *not* release the console semaphore. Because it is a
: semaphore, there is no owner. Another task may release it. This means
: that the waiter is guaranteed to be the new console owner! Which it
: becomes.
:
: Then the waiter calls console_unlock() and continues to write to the
: consoles.

This patch is based on Steven's idea. The key difference is that we are
using printk_kthread to hand off printing to; and we hand off using the
existing printk offloading logic. So we have only one task doing the
print out in console_unlock() and only one task we can hand off printing
to - printk_kthread. Since console_owner is set only during call to
console drivers, printk_kthread can be waiting for hand off from two
paths:

a) a fast path - console_owner is set and printk_kthread is looping
   waiting for printing CPU to hand off (SPINNING_WAITER_HANDOFF case)

b) a slow path - console_owner is not set, so printk_kthread cannot
   spin on console_owner (SLEEPING_WAITER_HANDOFF case). Instead it
   must sleep on console_sem semaphore and hand off scheme in this
   case involves up().

The printing CPU then detects from console_unlock() loop that
printk_kthread is either in SPINNING_WAITER_HANDOFF state and
thus we need to hand off console semaphore ownership; or in
SLEEPING_WAITER_HANDOFF state, so we need to up() console
semaphore and wake up it.

[1] lkml.kernel.org/r/20171108102723.602216b1@gandalf.local.home
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
---
 kernel/printk/printk.c | 180 ++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 169 insertions(+), 11 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 2a1ec075cc13..2395f18fec53 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -411,6 +411,13 @@ DEFINE_RAW_SPINLOCK(logbuf_lock);
 
 static unsigned long printk_pending;
 
+enum console_handoff {
+	DONT_HANDOFF		= 0,
+	SPINNING_WAITER_HANDOFF	= (1 << 0),
+	SLEEPING_WAITER_HANDOFF	= (1 << 1),
+	PRINTK_KTHREAD_HANDOFF	= (1 << 2),
+};
+
 #ifdef CONFIG_PRINTK
 DECLARE_WAIT_QUEUE_HEAD(log_wait);
 /* the next printk record to read by syslog(READ) or /proc/kmsg */
@@ -467,6 +474,16 @@ module_param_named(offloading_enabled, offloading_enabled, bool, 0644);
 MODULE_PARM_DESC(offloading_enabled,
 		 "enable/disable print out offloading to printk kthread");
 
+#ifdef CONFIG_LOCKDEP
+static struct lockdep_map console_owner_dep_map = {
+	.name = "console_owner"
+};
+#endif
+
+static DEFINE_RAW_SPINLOCK(console_owner_lock);
+static bool active_console_owner;
+static unsigned long console_handoff_waiter;
+
 static inline bool printk_offloading_enabled(void)
 {
 	if (system_state != SYSTEM_RUNNING || oops_in_progress)
@@ -573,6 +590,43 @@ void touch_printk_offloading_watchdog_on_cpu(unsigned int cpu)
 	per_cpu(printing_elapsed, cpu) = 0;
 }
 
+static void spinning_waiter_handoff_enable(void)
+{
+	raw_spin_lock(&console_owner_lock);
+	active_console_owner = true;
+	raw_spin_unlock(&console_owner_lock);
+	/* The waiter may spin on us after setting console_owner */
+	spin_acquire(&console_owner_dep_map, 0, 0, _THIS_IP_);
+}
+
+static unsigned long spinning_waiter_handoff_disable(void)
+{
+	unsigned long waiter;
+
+	raw_spin_lock(&console_owner_lock);
+	active_console_owner = false;
+	waiter = READ_ONCE(console_handoff_waiter);
+	raw_spin_unlock(&console_owner_lock);
+
+	if (!(waiter == SPINNING_WAITER_HANDOFF)) {
+		/* There was no waiter, and nothing will spin on us here */
+		spin_release(&console_owner_dep_map, 1, _THIS_IP_);
+	}
+	return waiter;
+}
+
+static void console_handoff_printing(void)
+{
+	WRITE_ONCE(console_handoff_waiter, DONT_HANDOFF);
+	/* The waiter is now free to continue */
+	spin_release(&console_owner_dep_map, 1, _THIS_IP_);
+	/*
+	 * Hand off console_lock to waiter. The waiter will perform
+	 * the up(). After this, the waiter is the console_lock owner.
+	 */
+	mutex_release(&console_lock_dep_map, 1, _THIS_IP_);
+}
+
 /*
  * Under heavy printing load or with a slow serial console (or both)
  * console_unlock() can stall CPUs, which can result in soft/hard-lockups,
@@ -582,24 +636,35 @@ void touch_printk_offloading_watchdog_on_cpu(unsigned int cpu)
  * This function must be called from 'printk_safe' context under
  * console_sem lock with preemption disabled.
  */
-static inline bool should_handoff_printing(u64 printing_start_ts)
+static inline enum console_handoff
+should_handoff_printing(u64 printing_start_ts)
 {
 	static struct task_struct *printing_task;
 	u64 now = local_clock();
 	bool emergency = !printk_offloading_enabled();
+	unsigned long waiter = spinning_waiter_handoff_disable();
 
 	/* We are in emergency mode, disable printk_kthread */
 	if (printk_kthread_should_stop(emergency))
-		return true;
+		return PRINTK_KTHREAD_HANDOFF;
+
+	/*
+	 * There is a printk_kthread waiting for us to release the
+	 * console_sem, either in SPINNING_WAITER_HANDOFF or in
+	 * SLEEPING_WAITER_HANDOFF mode. console_unlock() will take
+	 * care of it.
+	 */
+	if (waiter)
+		return waiter;
 
 	if (emergency)
-		return false;
+		return DONT_HANDOFF;
 
 	/* A new task - reset the counters. */
 	if (printing_task != current) {
 		touch_printk_offloading_watchdog();
 		printing_task = current;
-		return false;
+		return DONT_HANDOFF;
 	}
 
 	if (time_after_eq64(now, printing_start_ts)) {
@@ -612,7 +677,7 @@ static inline bool should_handoff_printing(u64 printing_start_ts)
 	/* Shrink down to seconds and check the offloading threshold */
 	if ((__this_cpu_read(printing_elapsed) >> 30LL) <
 			offloading_threshold())
-		return false;
+		return DONT_HANDOFF;
 
 	if (current == printk_kthread) {
 		unsigned int cpu;
@@ -631,7 +696,7 @@ static inline bool should_handoff_printing(u64 printing_start_ts)
 		 */
 		for_each_possible_cpu(cpu)
 			touch_printk_offloading_watchdog_on_cpu(cpu);
-		return true;
+		return PRINTK_KTHREAD_HANDOFF;
 	}
 
 	/*
@@ -651,7 +716,7 @@ static inline bool should_handoff_printing(u64 printing_start_ts)
 
 	set_cpus_allowed_ptr(printk_kthread, printk_offload_cpus);
 	wake_up_process(printk_kthread);
-	return true;
+	return DONT_HANDOFF;
 }
 
 /* Return log buffer address */
@@ -2100,10 +2165,16 @@ EXPORT_SYMBOL_GPL(printk_emergency_begin_sync);
 int printk_emergency_end_sync(void) { return 0; }
 EXPORT_SYMBOL_GPL(printk_emergency_end_sync);
 
-static bool should_handoff_printing(u64 printing_start_ts) { return false; }
+static enum console_handoff should_handoff_printing(u64 printing_start_ts)
+{
+	return DONT_HANDOFF;
+}
+
 static bool printk_offloading_enabled(void) { return false; }
 void touch_printk_offloading_watchdog(void) {}
 void touch_printk_offloading_watchdog_on_cpu(unsigned int cpu) {}
+static void spinning_waiter_handoff_enable(void) {}
+static void console_handoff_printing(void) {}
 #endif /* CONFIG_PRINTK */
 
 #ifdef CONFIG_EARLY_PRINTK
@@ -2385,7 +2456,7 @@ void console_unlock(void)
 	unsigned long flags;
 	bool wake_klogd = false;
 	bool do_cond_resched, retry = false;
-	bool do_handoff = false;
+	enum console_handoff do_handoff = DONT_HANDOFF;
 
 	if (console_suspended) {
 		/*
@@ -2492,12 +2563,43 @@ void console_unlock(void)
 		console_seq++;
 		raw_spin_unlock(&logbuf_lock);
 
+		/*
+		 * Disable is called from should_handoff_printing(). See
+		 * comment below.
+		 */
+		spinning_waiter_handoff_enable();
+
 		stop_critical_timings();	/* don't trace print latency */
 		call_console_drivers(ext_text, ext_len, text, len);
 		start_critical_timings();
 
 		/* Must be called under printk_safe */
 		do_handoff = should_handoff_printing(printing_start_ts);
+		/*
+		 * We have two paths for hand off:
+		 *
+		 * 1) a fast one when we have a SPINNING_WAITER with local
+		 * IRQs disables on another CPU;
+		 *
+		 * 2) a slow one when we have a SLEEPING_WAITER on the
+		 * console_sem.
+		 *
+		 * For the fast path we pass off the printing to the waiter.
+		 * The waiter will continue printing on its CPU and up() the
+		 * console_sem.
+		 *
+		 * For slow path we need to go through the 'normal' return
+		 * from console_unlock(), which involves up_console_sem().
+		 *
+		 * When all writing has finished, the last printer will wake
+		 * up klogd.
+		 */
+		if (do_handoff == SPINNING_WAITER_HANDOFF) {
+			console_handoff_printing();
+			printk_safe_exit_irqrestore(flags);
+			preempt_enable();
+			return;
+		}
 		printk_safe_exit_irqrestore(flags);
 
 		if (!do_handoff && do_cond_resched) {
@@ -2522,8 +2624,11 @@ void console_unlock(void)
 	 * if there's something to flush. In case when trylock fails,
 	 * there's a new owner and the console_unlock() from them will
 	 * do the flush, no worries.
+	 *
+	 * Do not retry printing if we have SLEEPING_WAITER, up() should
+	 * wake him up.
 	 */
-	if (!do_handoff) {
+	if (do_handoff == DONT_HANDOFF) {
 		raw_spin_lock(&logbuf_lock);
 		retry = console_seq != log_next_seq;
 		if (!retry)
@@ -2990,6 +3095,9 @@ static struct syscore_ops printk_syscore_ops = {
 static int printk_kthread_func(void *data)
 {
 	while (1) {
+		unsigned long flags;
+		bool spin = false;
+
 		set_current_state(TASK_INTERRUPTIBLE);
 		/*
 		 * We must check `printk_emergency' as well, to let
@@ -3012,7 +3120,57 @@ static int printk_kthread_func(void *data)
 		if (kthread_should_park())
 			kthread_parkme();
 
-		console_lock();
+		local_irq_save(flags);
+		raw_spin_lock(&console_owner_lock);
+		/*
+		 * Printing CPU has requested printk_kthread offloading. There
+		 * are two cases here:
+		 * a) `active_console_owner' is set, so we can be a
+		 *     SPINNING_WAITER and wait busy-looping for printing CPU
+		 *     to transfer console_sem ownership to us.
+		 *
+		 * b) otherwise, the printing task has IRQs enabled and it
+		 *    may be interrupted anytime, while still holding the
+		 *    console_sem. We must become a SLEEPING_WAITER and do
+		 *    console_lock(). The printing task will do up() as soon
+		 *    as possible.
+		 */
+		if (active_console_owner) {
+			spin = true;
+			WRITE_ONCE(console_handoff_waiter,
+					SPINNING_WAITER_HANDOFF);
+		} else {
+			spin = false;
+			WRITE_ONCE(console_handoff_waiter,
+					SLEEPING_WAITER_HANDOFF);
+		}
+		raw_spin_unlock(&console_owner_lock);
+
+		if (spin) {
+			/* We spin waiting for the owner to release us */
+			spin_acquire(&console_owner_dep_map, 0, 0, _THIS_IP_);
+			/* Owner will clear SPINNING_WAITER bit on hand off */
+			while (READ_ONCE(console_handoff_waiter) ==
+					SPINNING_WAITER_HANDOFF)
+				cpu_relax();
+
+			spin_release(&console_owner_dep_map, 1, _THIS_IP_);
+			local_irq_restore(flags);
+			/*
+			 * The owner passed the console lock to us.
+			 * Since we did not spin on console lock, annotate
+			 * this as a trylock. Otherwise lockdep will
+			 * complain.
+			 */
+			mutex_acquire(&console_lock_dep_map, 0, 1, _THIS_IP_);
+		} else {
+			local_irq_restore(flags);
+			console_lock();
+
+			/* We've been woken up by up() */
+			WRITE_ONCE(console_handoff_waiter, DONT_HANDOFF);
+		}
+
 		console_unlock();
 
 		/* We might have been blocked on console_sem */
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH 0/4] printk: offloading testing module/trace events
  2017-12-04 13:48 [RFC][PATCHv6 00/12] printk: introduce printing kernel thread Sergey Senozhatsky
                   ` (11 preceding siblings ...)
  2017-12-04 13:48 ` [RFC][PATCHv6 12/12] printk: improve printk offloading mechanism Sergey Senozhatsky
@ 2017-12-04 13:53 ` Sergey Senozhatsky
  2017-12-04 13:53   ` [PATCH 1/4] printk/lib: add offloading trace events and test_printk module Sergey Senozhatsky
                     ` (3 more replies)
  2017-12-14 14:27 ` [RFC][PATCHv6 00/12] printk: introduce printing kernel thread Petr Mladek
  2018-01-05  2:54 ` Sergey Senozhatsky
  14 siblings, 4 replies; 79+ messages in thread
From: Sergey Senozhatsky @ 2017-12-04 13:53 UTC (permalink / raw)
  To: Steven Rostedt, Petr Mladek
  Cc: Jan Kara, Andrew Morton, Peter Zijlstra, Rafael Wysocki,
	Pavel Machek, Tetsuo Handa, Tejun Heo, linux-kernel,
	Sergey Senozhatsky, Sergey Senozhatsky

Hello,

	*** FOR TESTING ONLY ***

	printk testing module and some dirty hacks I use for
offloading verification.

Sergey Senozhatsky (4):
  printk/lib: add offloading trace events and test_printk module
  printk/lib: simulate slow consoles
  printk: add offloading takeover traces
  printk: add task name and CPU to console messages

 include/trace/events/printk.h |  26 +++
 kernel/printk/printk.c        |  40 ++++
 lib/Kconfig.debug             |   3 +
 lib/Makefile                  |   1 +
 lib/test_printk.c             | 423 ++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 493 insertions(+)
 create mode 100644 lib/test_printk.c

-- 
2.15.1

^ permalink raw reply	[flat|nested] 79+ messages in thread

* [PATCH 1/4] printk/lib: add offloading trace events and test_printk module
  2017-12-04 13:53 ` [PATCH 0/4] printk: offloading testing module/trace events Sergey Senozhatsky
@ 2017-12-04 13:53   ` Sergey Senozhatsky
  2017-12-04 13:53   ` [PATCH 2/4] printk/lib: simulate slow consoles Sergey Senozhatsky
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 79+ messages in thread
From: Sergey Senozhatsky @ 2017-12-04 13:53 UTC (permalink / raw)
  To: Steven Rostedt, Petr Mladek
  Cc: Jan Kara, Andrew Morton, Peter Zijlstra, Rafael Wysocki,
	Pavel Machek, Tetsuo Handa, Tejun Heo, linux-kernel,
	Sergey Senozhatsky, Sergey Senozhatsky

*** FOR TESTING ***

Add console_unlock() offloading trace events and a new test_printk
testing module. test_printk does a number of offloading/handoff
tests - console_unlock() with preemption disabled, under rcu read
lock, with IRQs disabled, and so on.

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
---
 include/trace/events/printk.h |  26 +++
 kernel/printk/printk.c        |  17 ++
 lib/Kconfig.debug             |   3 +
 lib/Makefile                  |   1 +
 lib/test_printk.c             | 415 ++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 462 insertions(+)
 create mode 100644 lib/test_printk.c

diff --git a/include/trace/events/printk.h b/include/trace/events/printk.h
index 13d405b2fd8b..d883f5015cd2 100644
--- a/include/trace/events/printk.h
+++ b/include/trace/events/printk.h
@@ -31,6 +31,32 @@ TRACE_EVENT(console,
 
 	TP_printk("%s", __get_str(msg))
 );
+
+TRACE_EVENT(offloading,
+	TP_PROTO(char *ev,
+		char *key1,
+		unsigned long value1),
+
+	TP_ARGS(ev, key1, value1),
+
+	TP_STRUCT__entry(
+		__string(event, ev)
+
+		__string(__key1, key1)
+		__field(u64, __value1)
+	),
+
+	TP_fast_assign(
+		__assign_str(event, ev ? ev : " ? ");
+		__assign_str(__key1, key1 ? key1 : " -- ");
+		__entry->__value1 = value1;
+	),
+
+	TP_printk("%s %s:%llu",
+		__get_str(event),
+		__get_str(__key1),
+		__entry->__value1)
+);
 #endif /* _TRACE_PRINTK_H */
 
 /* This part must be outside protection */
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 2395f18fec53..d4e1abb36d3f 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -662,6 +662,12 @@ should_handoff_printing(u64 printing_start_ts)
 
 	/* A new task - reset the counters. */
 	if (printing_task != current) {
+		trace_offloading_rcuidle("reset counters, prev_task data",
+				printing_task ?
+					printing_task->comm :
+					"NO_PREVIOUS_TASK",
+				this_cpu_read(printing_elapsed));
+
 		touch_printk_offloading_watchdog();
 		printing_task = current;
 		return DONT_HANDOFF;
@@ -694,6 +700,9 @@ should_handoff_printing(u64 printing_start_ts)
 		 * console_unlock(), it will have another full
 		 * `offloading_threshold()' time slice.
 		 */
+		trace_offloading_rcuidle("[!] forced up()",
+				"elapsed", this_cpu_read(printing_elapsed));
+
 		for_each_possible_cpu(cpu)
 			touch_printk_offloading_watchdog_on_cpu(cpu);
 		return PRINTK_KTHREAD_HANDOFF;
@@ -707,6 +716,10 @@ should_handoff_printing(u64 printing_start_ts)
 	cpumask_copy(printk_offload_cpus, cpu_online_mask);
 	cpumask_clear_cpu(smp_processor_id(), printk_offload_cpus);
 
+	trace_offloading_rcuidle("wake up kthread",
+			"elapsed",
+			this_cpu_read(printing_elapsed));
+
 	/*
 	 * If this_cpu is the one and only online CPU, then try to wake up
 	 * `printk_kthread' on it.
@@ -3173,6 +3186,10 @@ static int printk_kthread_func(void *data)
 
 		console_unlock();
 
+		trace_offloading_rcuidle("kthread released console_sem",
+				"PRINTK_PENDING_OUTPUT",
+				test_bit(PRINTK_PENDING_OUTPUT, &printk_pending));
+
 		/* We might have been blocked on console_sem */
 		if (kthread_should_park())
 			kthread_parkme();
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index c076234802d9..9e37988b0cfa 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1948,6 +1948,9 @@ config TEST_DEBUG_VIRTUAL
 
 	  If unsure, say N.
 
+config TEST_PRINTK
+	tristate "Test printk() and console_unlock()"
+
 endmenu # runtime tests
 
 config MEMTEST
diff --git a/lib/Makefile b/lib/Makefile
index f495fd46fdc7..65667a03443d 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -66,6 +66,7 @@ obj-$(CONFIG_TEST_UUID) += test_uuid.o
 obj-$(CONFIG_TEST_PARMAN) += test_parman.o
 obj-$(CONFIG_TEST_KMOD) += test_kmod.o
 obj-$(CONFIG_TEST_DEBUG_VIRTUAL) += test_debug_virtual.o
+obj-$(CONFIG_TEST_PRINTK) += test_printk.o
 
 ifeq ($(CONFIG_DEBUG_KOBJECT),y)
 CFLAGS_kobject.o += -DDEBUG
diff --git a/lib/test_printk.c b/lib/test_printk.c
new file mode 100644
index 000000000000..9b01a03ef385
--- /dev/null
+++ b/lib/test_printk.c
@@ -0,0 +1,415 @@
+/*
+ * Test cases for printk() offloading [console_unlock()] functionality.
+ *
+ * Copyright (c) 2017 Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
+ *
+ * This file is released under the GPLv2.
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/delay.h>
+#include <linux/sched.h>
+#include <linux/printk.h>
+#include <linux/console.h>
+#include <linux/mutex.h>
+#include <linux/workqueue.h>
+#include <linux/hrtimer.h>
+#include <linux/sysfs.h>
+#include <linux/device.h>
+#include <linux/rcupdate.h>
+
+#define MAX_MESSAGES	4242
+#define ALL_TESTS	(~0UL)
+
+static unsigned long max_num_messages;
+static unsigned long tests_mask;
+
+static DEFINE_MUTEX(hog_mutex);
+
+static struct hrtimer printk_timer;
+static ktime_t timer_interval;
+
+static int test_done;
+
+#define TEST_PREEMPT_CONSOLE_UNLOCK	(1 << 0)
+#define TEST_NONPREEMPT_CONSOLE_UNLOCK	(1 << 1)
+#define TEST_NOIRQ_CONSOLE_UNLOCK	(1 << 2)
+#define TEST_NONPREEMPT_PRINTK_STORM	(1 << 3)
+#define TEST_NOIRQ_PRINTK_STORM	(1 << 4)
+#define TEST_NONPREEMPT_PRINTK_HOGGER	(1 << 5)
+#define TEST_NOIRQ_PRINTK_HOGGER	(1 << 6)
+#define TEST_PREEMPT_PRINTK_EMERG_SYNC	(1 << 7)
+#define TEST_RCU_LOCK_CONSOLE_UNLOCK	(1 << 8)
+
+static void test_preemptible_console_unlock(void)
+{
+	unsigned long num_messages = 0;
+
+	pr_err("=== TEST %s\n", __func__);
+
+	console_lock();
+	while (num_messages++ < max_num_messages)
+		pr_info("=== %s Append message %lu out of %lu\n",
+				__func__,
+				num_messages,
+				max_num_messages);
+	console_unlock();
+}
+
+static void test_nonpreemptible_console_unlock(void)
+{
+	unsigned long num_messages = 0;
+
+	pr_err("=== TEST %s\n", __func__);
+
+	num_messages = 0;
+	console_lock();
+	while (num_messages++ < max_num_messages)
+		pr_info("=== %s Append message %lu out of %lu\n",
+				__func__,
+				num_messages,
+				max_num_messages);
+
+	preempt_disable();
+	console_unlock();
+	preempt_enable();
+}
+
+static void test_rculock_console_unlock(void)
+{
+	unsigned long num_messages = 0;
+
+	pr_err("=== TEST %s\n", __func__);
+
+	num_messages = 0;
+	console_lock();
+	while (num_messages++ < max_num_messages)
+		pr_info("=== %s Append message %lu out of %lu\n",
+				__func__,
+				num_messages,
+				max_num_messages);
+
+	rcu_read_lock();
+	console_unlock();
+	rcu_read_unlock();
+}
+
+static void test_noirq_console_unlock(void)
+{
+	unsigned long flags;
+	unsigned long num_messages = 0;
+
+	pr_err("=== TEST %s\n", __func__);
+
+	num_messages = 0;
+	console_lock();
+	while (num_messages++ < max_num_messages)
+		pr_info("=== %s Append message %lu out of %lu\n",
+				__func__,
+				num_messages,
+				max_num_messages);
+
+	local_irq_save(flags);
+	console_unlock();
+	local_irq_restore(flags);
+}
+
+static void test_nonpreemptible_printk_storm(void)
+{
+	unsigned long num_messages = 0;
+
+	pr_err("=== TEST %s\n", __func__);
+
+	num_messages = 0;
+	preempt_disable();
+	while (num_messages++ < max_num_messages)
+		pr_info("=== %s Append message %lu out of %lu\n",
+				__func__,
+				num_messages,
+				max_num_messages);
+	preempt_enable();
+}
+
+static void test_noirq_printk_storm(void)
+{
+	unsigned long flags;
+	unsigned long num_messages = 0;
+
+	pr_err("=== TEST %s\n", __func__);
+
+	num_messages = 0;
+	local_irq_save(flags);
+	while (num_messages++ < max_num_messages)
+		pr_info("=== %s Append message %lu out of %lu\n",
+				__func__,
+				num_messages,
+				max_num_messages);
+	local_irq_restore(flags);
+}
+
+/*
+ * hogger printk() tests are based on Tejun Heo's code
+ */
+static void nonpreemptible_printk_workfn(struct work_struct *work)
+{
+	unsigned long num_messages = 0;
+
+	while (num_messages++ < max_num_messages) {
+		mutex_lock(&hog_mutex);
+		mutex_unlock(&hog_mutex);
+		preempt_disable();
+		pr_info("=== %s Append message %lu out of %lu\n",
+				__func__,
+				num_messages,
+				max_num_messages);
+		preempt_enable();
+		cond_resched();
+	}
+}
+static DECLARE_WORK(nonpreemptible_printk_work, nonpreemptible_printk_workfn);
+
+static void hog_printk_workfn(struct work_struct *work)
+{
+	unsigned long num_messages = 0;
+
+	while (num_messages++ < max_num_messages) {
+		mutex_lock(&hog_mutex);
+		pr_info("=== %s Append message %lu out of %lu\n",
+				__func__,
+				num_messages,
+				max_num_messages);
+		mutex_unlock(&hog_mutex);
+		cond_resched();
+	}
+}
+static DECLARE_WORK(hog_printk_work, hog_printk_workfn);
+
+static void test_nonpreemptoble_printk_hogger(void)
+{
+	pr_err("=== TEST %s\n", __func__);
+
+	queue_work_on(0, system_wq, &nonpreemptible_printk_work);
+	msleep(42);
+	queue_work_on(1, system_wq, &hog_printk_work);
+
+	msleep(420);
+
+	flush_work(&nonpreemptible_printk_work);
+	flush_work(&hog_printk_work);
+
+	console_lock();
+	console_unlock();
+}
+
+static enum hrtimer_restart printk_timerfn(struct hrtimer *timer)
+{
+	static long iter = 1024;
+	unsigned long num_messages = 0;
+
+	if (!console_trylock()) {
+		while (num_messages++ < max_num_messages / 10) {
+			pr_info("=== %s [F] Append message %lu out of %lu\n",
+					__func__,
+					num_messages,
+					max_num_messages / 10);
+		}
+	} else {
+		while (num_messages++ < max_num_messages) {
+			pr_info("=== %s [S] Append message %lu out of %lu\n",
+					__func__,
+					num_messages,
+					max_num_messages);
+		}
+
+		console_unlock();
+	}
+
+	hrtimer_forward_now(&printk_timer, timer_interval);
+	if (--iter < 1)
+		return HRTIMER_NORESTART;
+	return HRTIMER_RESTART;
+}
+
+static void preempt_printk_workfn(struct work_struct *work)
+{
+	unsigned long num_messages = 0;
+
+	hrtimer_init(&printk_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+	printk_timer.function = printk_timerfn;
+	timer_interval = ktime_set(0, 2 * NSEC_PER_MSEC);
+	hrtimer_start(&printk_timer, timer_interval, HRTIMER_MODE_REL);
+
+	while (num_messages++ < max_num_messages) {
+		preempt_disable();
+		pr_info("=== %s Append message %lu out of %lu\n",
+				__func__,
+				num_messages,
+				max_num_messages);
+		preempt_enable();
+	}
+}
+static DECLARE_WORK(preempt_printk_work, preempt_printk_workfn);
+
+static void test_noirq_printk_hogger(void)
+{
+	pr_err("=== TEST %s\n", __func__);
+
+	queue_work_on(0, system_wq, &preempt_printk_work);
+
+	msleep(420);
+
+	flush_work(&preempt_printk_work);
+	hrtimer_cancel(&printk_timer);
+
+	console_lock();
+	console_unlock();
+}
+
+static void test_preemptible_printk_emergency_sync(void)
+{
+	unsigned long num_messages = 0;
+
+	pr_err("=== TEST %s\n", __func__);
+
+	console_lock();
+	while (num_messages++ < max_num_messages)
+		pr_info("=== %s Append message %lu out of %lu\n",
+				__func__,
+				num_messages,
+				max_num_messages);
+	console_unlock();
+	msleep(840);
+
+	printk_emergency_begin_sync();
+	console_lock();
+	console_unlock();
+	printk_emergency_end_sync();
+}
+
+static void wait_for_test(const char *test_name)
+{
+	int done = 0;
+
+	do {
+		pr_err("... waiting for %s\n", test_name);
+		msleep(1000);
+
+		if (console_trylock()) {
+			console_unlock();
+			done = 1;
+		}
+	} while (done == 0);
+}
+
+static void run_tests(void)
+{
+	if (tests_mask & TEST_PREEMPT_CONSOLE_UNLOCK) {
+		test_preemptible_console_unlock();
+		wait_for_test("preemptible_console_unlock()");
+	}
+
+	if (tests_mask & TEST_NONPREEMPT_CONSOLE_UNLOCK) {
+		test_nonpreemptible_console_unlock();
+		wait_for_test("nonpreemptible_console_unlock()");
+	}
+
+	if (tests_mask & TEST_NOIRQ_CONSOLE_UNLOCK) {
+		test_noirq_console_unlock();
+		wait_for_test("noirq_console_unlock()");
+	}
+
+	if (tests_mask & TEST_NONPREEMPT_PRINTK_STORM) {
+		test_nonpreemptible_printk_storm();
+		wait_for_test("nonpreemptible_printk_storm()");
+	}
+
+	if (tests_mask & TEST_NOIRQ_PRINTK_STORM) {
+		test_noirq_printk_storm();
+		wait_for_test("noirq_printk_storm()");
+	}
+
+	if (tests_mask & TEST_NONPREEMPT_PRINTK_HOGGER) {
+		test_nonpreemptoble_printk_hogger();
+		wait_for_test("nonpreemptoble_printk_hogger()");
+	}
+
+	if (tests_mask & TEST_NOIRQ_PRINTK_HOGGER) {
+		test_noirq_printk_hogger();
+		wait_for_test("noirq_printk_hogger()");
+	}
+
+	if (tests_mask & TEST_PREEMPT_PRINTK_EMERG_SYNC) {
+		test_preemptible_printk_emergency_sync();
+		wait_for_test("preemptible_printk_emergency_sync()");
+	}
+
+	if (tests_mask & TEST_RCU_LOCK_CONSOLE_UNLOCK) {
+		test_rculock_console_unlock();
+		wait_for_test("rculock_console_unlock()");
+	}
+
+	test_done = 1;
+}
+
+static ssize_t test_done_show(struct device *dev,
+			      struct device_attribute *attr,
+			      char *buf)
+{
+	char *s = buf;
+
+	s += sprintf(s, "%d\n", test_done);
+	return (s - buf);
+}
+static DEVICE_ATTR_RO(test_done);
+
+static struct kobject *test_kobj;
+
+static struct attribute *test_attrs[] = {
+	&dev_attr_test_done.attr,
+	NULL,
+};
+
+static const struct attribute_group attr_group = {
+	.attrs = test_attrs,
+};
+
+static int __init test_init(void)
+{
+	int ret;
+
+	if (!max_num_messages)
+		max_num_messages = MAX_MESSAGES;
+
+	if (!tests_mask)
+		tests_mask = ALL_TESTS;
+
+	test_kobj = kobject_create_and_add("test_printk", NULL);
+	if (!test_kobj)
+		return -ENOMEM;
+	ret = sysfs_create_group(test_kobj, &attr_group);
+	if (ret) {
+		kobject_put(test_kobj);
+		return ret;
+	}
+
+	run_tests();
+	return 0;
+}
+
+static void __exit test_exit(void)
+{
+	sysfs_remove_group(test_kobj, &attr_group);
+	kobject_put(test_kobj);
+}
+
+module_param(max_num_messages, ulong, 0);
+MODULE_PARM_DESC(num_devices, "Number of messages to printk() in each test");
+
+module_param(tests_mask, ulong, 0);
+MODULE_PARM_DESC(tests_mask, "Which tests to run");
+
+module_init(test_init);
+module_exit(test_exit);
+
+MODULE_LICENSE("GPL");
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH 2/4] printk/lib: simulate slow consoles
  2017-12-04 13:53 ` [PATCH 0/4] printk: offloading testing module/trace events Sergey Senozhatsky
  2017-12-04 13:53   ` [PATCH 1/4] printk/lib: add offloading trace events and test_printk module Sergey Senozhatsky
@ 2017-12-04 13:53   ` Sergey Senozhatsky
  2017-12-04 13:53   ` [PATCH 3/4] printk: add offloading takeover traces Sergey Senozhatsky
  2017-12-04 13:53   ` [PATCH 4/4] printk: add task name and CPU to console messages Sergey Senozhatsky
  3 siblings, 0 replies; 79+ messages in thread
From: Sergey Senozhatsky @ 2017-12-04 13:53 UTC (permalink / raw)
  To: Steven Rostedt, Petr Mladek
  Cc: Jan Kara, Andrew Morton, Peter Zijlstra, Rafael Wysocki,
	Pavel Machek, Tetsuo Handa, Tejun Heo, linux-kernel,
	Sergey Senozhatsky, Sergey Senozhatsky

*** FOR TESTING ***

Add a hack to delay console_drivers and simulate slow console(s). Doing
something like

	preempt_disable();
	while (...) {
		printk();
		delay();
	}
	preempt_enable();

is not correct. First, not every printk() ends up in console_unlock(),
second - delay should happen in console_unlock() if we want to test
printk() offloading, not outside of printk().

A simple test_printk.sh script to run the tests:

8<-----------------------------------------------------------------------------

TEST_CASE=1
_MAX_NUM_MESSAGES=1024
_CONSOLE_DRIVERS_DELAY=3500

sysctl kernel.watchdog_thresh=5

if [ "z$MAX_NUM_MESSAGES" != "z" ]; then
	_MAX_NUM_MESSAGES=$MAX_NUM_MESSAGES
fi

if [ "z$CONSOLE_DRIVERS_DELAY" != "z" ]; then
	_CONSOLE_DRIVERS_DELAY=$CONSOLE_DRIVERS_DELAY
fi

while [ $TEST_CASE -le 256 ]; do
	echo 1 > /sys/kernel/debug/tracing/events/printk/offloading/enable
	echo 1 > /sys/kernel/debug/tracing/trace

	echo "Executing test $TEST_CASE"

	modprobe test_printk max_num_messages=$_MAX_NUM_MESSAGES \
		console_drivers_delay=$_CONSOLE_DRIVERS_DELAY \
		tests_mask=$TEST_CASE

	TEST_DONE=`cat /sys/test_printk/test_done`
	while [ $TEST_DONE -ne 1 ]; do
		sleep 1s;
		let TEST_DONE=`cat /sys/test_printk/test_done`
	done

	rmmod test_printk

	echo 0 > /sys/kernel/debug/tracing/events/printk/offloading/enable
	cat /sys/kernel/debug/tracing/trace > /tmp/trace-test_case-$TEST_CASE

	echo "Done... cat /tmp/trace-test_case-$TEST_CASE"
	cat /tmp/trace-test_case-$TEST_CASE

	echo "================================================================"

	let TEST_CASE=$TEST_CASE*2
done

sysctl kernel.watchdog_thresh=10

8<-----------------------------------------------------------------------------

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
---
 kernel/printk/printk.c | 15 +++++++++++++++
 lib/test_printk.c      | 10 +++++++++-
 2 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index d4e1abb36d3f..01626f2f42bd 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -75,6 +75,9 @@ int console_printk[4] = {
 int oops_in_progress;
 EXPORT_SYMBOL(oops_in_progress);
 
+int __CONSOLE_DRIVERS_DELAY__ = 0;
+EXPORT_SYMBOL(__CONSOLE_DRIVERS_DELAY__);
+
 /*
  * console_sem protects the console_drivers list, and also
  * provides serialisation for access to the entire console
@@ -2584,6 +2587,18 @@ void console_unlock(void)
 
 		stop_critical_timings();	/* don't trace print latency */
 		call_console_drivers(ext_text, ext_len, text, len);
+
+		/* pretend we have a slow console */
+		{
+			volatile int num_chars, num_iter;
+
+			for (num_chars = 0; num_chars < len; num_chars++) {
+				num_iter = 0;
+				while (num_iter++ < __CONSOLE_DRIVERS_DELAY__)
+					cpu_relax();
+			}
+		}
+
 		start_critical_timings();
 
 		/* Must be called under printk_safe */
diff --git a/lib/test_printk.c b/lib/test_printk.c
index 9b01a03ef385..a030f1e61745 100644
--- a/lib/test_printk.c
+++ b/lib/test_printk.c
@@ -22,9 +22,10 @@
 #define MAX_MESSAGES	4242
 #define ALL_TESTS	(~0UL)
 
+static int console_drivers_delay;
+
 static unsigned long max_num_messages;
 static unsigned long tests_mask;
-
 static DEFINE_MUTEX(hog_mutex);
 
 static struct hrtimer printk_timer;
@@ -148,6 +149,8 @@ static void test_noirq_printk_storm(void)
 	local_irq_restore(flags);
 }
 
+extern int __CONSOLE_DRIVERS_DELAY__;
+
 /*
  * hogger printk() tests are based on Tejun Heo's code
  */
@@ -381,6 +384,8 @@ static int __init test_init(void)
 	if (!max_num_messages)
 		max_num_messages = MAX_MESSAGES;
 
+	__CONSOLE_DRIVERS_DELAY__ = console_drivers_delay;
+
 	if (!tests_mask)
 		tests_mask = ALL_TESTS;
 
@@ -409,6 +414,9 @@ MODULE_PARM_DESC(num_devices, "Number of messages to printk() in each test");
 module_param(tests_mask, ulong, 0);
 MODULE_PARM_DESC(tests_mask, "Which tests to run");
 
+module_param(console_drivers_delay, int, 0);
+MODULE_PARM_DESC(console_drivers_delay, "Delay console drivers");
+
 module_init(test_init);
 module_exit(test_exit);
 
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH 3/4] printk: add offloading takeover traces
  2017-12-04 13:53 ` [PATCH 0/4] printk: offloading testing module/trace events Sergey Senozhatsky
  2017-12-04 13:53   ` [PATCH 1/4] printk/lib: add offloading trace events and test_printk module Sergey Senozhatsky
  2017-12-04 13:53   ` [PATCH 2/4] printk/lib: simulate slow consoles Sergey Senozhatsky
@ 2017-12-04 13:53   ` Sergey Senozhatsky
  2017-12-04 13:53   ` [PATCH 4/4] printk: add task name and CPU to console messages Sergey Senozhatsky
  3 siblings, 0 replies; 79+ messages in thread
From: Sergey Senozhatsky @ 2017-12-04 13:53 UTC (permalink / raw)
  To: Steven Rostedt, Petr Mladek
  Cc: Jan Kara, Andrew Morton, Peter Zijlstra, Rafael Wysocki,
	Pavel Machek, Tetsuo Handa, Tejun Heo, linux-kernel,
	Sergey Senozhatsky, Sergey Senozhatsky

*** FOR TESTING ***

Add more trace events for printing offloading (handoff).

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
---
 kernel/printk/printk.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 01626f2f42bd..a34a839e6045 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -620,6 +620,8 @@ static unsigned long spinning_waiter_handoff_disable(void)
 
 static void console_handoff_printing(void)
 {
+	trace_offloading_rcuidle("handoff", " ", 0);
+
 	WRITE_ONCE(console_handoff_waiter, DONT_HANDOFF);
 	/* The waiter is now free to continue */
 	spin_release(&console_owner_dep_map, 1, _THIS_IP_);
@@ -3172,6 +3174,9 @@ static int printk_kthread_func(void *data)
 			WRITE_ONCE(console_handoff_waiter,
 					SLEEPING_WAITER_HANDOFF);
 		}
+
+		trace_offloading_rcuidle("set", "console_handoff_waiter",
+				READ_ONCE(console_handoff_waiter));
 		raw_spin_unlock(&console_owner_lock);
 
 		if (spin) {
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH 4/4] printk: add task name and CPU to console messages
  2017-12-04 13:53 ` [PATCH 0/4] printk: offloading testing module/trace events Sergey Senozhatsky
                     ` (2 preceding siblings ...)
  2017-12-04 13:53   ` [PATCH 3/4] printk: add offloading takeover traces Sergey Senozhatsky
@ 2017-12-04 13:53   ` Sergey Senozhatsky
  3 siblings, 0 replies; 79+ messages in thread
From: Sergey Senozhatsky @ 2017-12-04 13:53 UTC (permalink / raw)
  To: Steven Rostedt, Petr Mladek
  Cc: Jan Kara, Andrew Morton, Peter Zijlstra, Rafael Wysocki,
	Pavel Machek, Tetsuo Handa, Tejun Heo, linux-kernel,
	Sergey Senozhatsky, Sergey Senozhatsky

*** FOR TESTING ***

Add current->comm/smp_processor_id() prefix to every console line.

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
---
 kernel/printk/printk.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index a34a839e6045..a36cc6b9148e 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1540,6 +1540,9 @@ static size_t print_prefix(const struct printk_log *msg, bool syslog, char *buf)
 		}
 	}
 
+	if (buf)
+		len += sprintf(buf, "{%s/%d}", current->comm,
+				smp_processor_id());
 	len += print_time(msg->ts_nsec, buf ? buf + len : NULL);
 	return len;
 }
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread
  2017-12-04 13:48 [RFC][PATCHv6 00/12] printk: introduce printing kernel thread Sergey Senozhatsky
                   ` (12 preceding siblings ...)
  2017-12-04 13:53 ` [PATCH 0/4] printk: offloading testing module/trace events Sergey Senozhatsky
@ 2017-12-14 14:27 ` Petr Mladek
  2017-12-14 14:39   ` Sergey Senozhatsky
  2017-12-14 15:25   ` Tejun Heo
  2018-01-05  2:54 ` Sergey Senozhatsky
  14 siblings, 2 replies; 79+ messages in thread
From: Petr Mladek @ 2017-12-14 14:27 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Steven Rostedt, Jan Kara, Andrew Morton, Peter Zijlstra,
	Rafael Wysocki, Pavel Machek, Tetsuo Handa, Tejun Heo,
	linux-kernel, Sergey Senozhatsky

On Mon 2017-12-04 22:48:13, Sergey Senozhatsky wrote:
> Hello,
> 
> 	RFC
> 
> 	A new version, yet another rework. Lots of changes, e.g. hand off
> control based on Steven's patch. Another change is that this time around
> we finally have a kernel module to test printk offloading (YAYY!). The
> module tests a bunch use cases; we also have trace printk events to...
> trace offloading.

Ah, I know that it was me who was pessimistic about Steven's approach[1]
and persuaded you that offloading idea was still alive. But I am less
sure now.

My three main concerns about Steven's approach were:

1. I was afraid that it might introduce new type of deadlocks.

   But it seems that it is quite safe after all.

2. Steven's code, implementing the hand shake, is far from trivial.
   Few people were confused and reported false bugs.

   But the basic idea is pretty simple and straightforward. If
   we manage to encapsulate it into few helpers, it might become
   rather self-contained and maintainable. In each case, the needed
   changes are much smaller than I expected.

3. Soft-lockups are still theoretically possible with Steven's
   approach.

   But it seems to be quite efficient in many real life scenarios,
   including Tetsuo's stress testing. Or am I wrong?

Therefore I tend to give Steven's solution a chance before this
combined approach.

In each case, I do not feel comfortable with this combined solution.
I know that it might work much better that the two approaches
alone. But it has the complexity and possible risks of both
implementations. I would prefer to go with smaller steps.

[1] https://lkml.kernel.org/r/20171108102723.602216b1@gandalf.local.home

> I'll post the testing script and test module in reply
> to this mail. So... let's have some progress ;) The code is not completely
> awesome, but not tremendously difficult at the same time. We can verify
> the approach/design (we have tests and traces now) first and then start
> improving the code.

The testing module and code is great. Thanks a lot for posting it.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread
  2017-12-14 14:27 ` [RFC][PATCHv6 00/12] printk: introduce printing kernel thread Petr Mladek
@ 2017-12-14 14:39   ` Sergey Senozhatsky
  2017-12-15 15:55     ` Steven Rostedt
  2017-12-14 15:25   ` Tejun Heo
  1 sibling, 1 reply; 79+ messages in thread
From: Sergey Senozhatsky @ 2017-12-14 14:39 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Jan Kara, Andrew Morton,
	Peter Zijlstra, Rafael Wysocki, Pavel Machek, Tetsuo Handa,
	Tejun Heo, linux-kernel, Sergey Senozhatsky

On (12/14/17 15:27), Petr Mladek wrote:
>
> Therefore I tend to give Steven's solution a chance before this
> combined approach.
> 

have you seen this https://marc.info/?l=linux-kernel&m=151015850209859
or this https://marc.info/?l=linux-kernel&m=151011840830776&w=2
or this https://marc.info/?l=linux-kernel&m=151020275921953&w=2

?

	-ss

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread
  2017-12-14 14:27 ` [RFC][PATCHv6 00/12] printk: introduce printing kernel thread Petr Mladek
  2017-12-14 14:39   ` Sergey Senozhatsky
@ 2017-12-14 15:25   ` Tejun Heo
  2017-12-14 17:55     ` Steven Rostedt
  1 sibling, 1 reply; 79+ messages in thread
From: Tejun Heo @ 2017-12-14 15:25 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Jan Kara, Andrew Morton,
	Peter Zijlstra, Rafael Wysocki, Pavel Machek, Tetsuo Handa,
	linux-kernel, Sergey Senozhatsky

Hello, Petr.

On Thu, Dec 14, 2017 at 03:27:09PM +0100, Petr Mladek wrote:
> Ah, I know that it was me who was pessimistic about Steven's approach[1]
> and persuaded you that offloading idea was still alive. But I am less
> sure now.

So, I don't really care which one gets in as long as the livelock
problem is fixed although to my obviously partial eyes the two
alternatives seem overly complex.  That said,

> My three main concerns about Steven's approach were:
> 
> 1. I was afraid that it might introduce new type of deadlocks.
> 
>    But it seems that it is quite safe after all.
> 
> 
> 2. Steven's code, implementing the hand shake, is far from trivial.
>    Few people were confused and reported false bugs.
> 
>    But the basic idea is pretty simple and straightforward. If
>    we manage to encapsulate it into few helpers, it might become
>    rather self-contained and maintainable. In each case, the needed
>    changes are much smaller than I expected.
> 
> 
> 3. Soft-lockups are still theoretically possible with Steven's
>    approach.
> 
>    But it seems to be quite efficient in many real life scenarios,
>    including Tetsuo's stress testing. Or am I wrong?

AFAICS, Steven's approach doesn't fix the livelock that we see quite
often in the fleet where we don't have a safe context to keep flushing
messages.  This isn't theoretical at all.  You simply don't have a
safe context on the cpu to go to.  I said I'd come back with a repro
case but haven't had a chance to yet.  I'll try to do it before the
end of the year, but idk this is pretty obvious.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread
  2017-12-14 15:25   ` Tejun Heo
@ 2017-12-14 17:55     ` Steven Rostedt
  2017-12-14 18:11       ` Tejun Heo
  0 siblings, 1 reply; 79+ messages in thread
From: Steven Rostedt @ 2017-12-14 17:55 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Petr Mladek, Sergey Senozhatsky, Jan Kara, Andrew Morton,
	Peter Zijlstra, Rafael Wysocki, Pavel Machek, Tetsuo Handa,
	linux-kernel, Sergey Senozhatsky

On Thu, 14 Dec 2017 07:25:51 -0800
Tejun Heo <tj@kernel.org> wrote:

> > 3. Soft-lockups are still theoretically possible with Steven's
> >    approach.
> > 
> >    But it seems to be quite efficient in many real life scenarios,
> >    including Tetsuo's stress testing. Or am I wrong?  
> 
> AFAICS, Steven's approach doesn't fix the livelock that we see quite
> often in the fleet where we don't have a safe context to keep flushing
> messages.  This isn't theoretical at all.  You simply don't have a
> safe context on the cpu to go to.  I said I'd come back with a repro
> case but haven't had a chance to yet.  I'll try to do it before the
> end of the year, but idk this is pretty obvious.

Yes! Please create a reproducer, because I still don't believe there is
one. And it's all hand waving until there's an actual report that we can
lock up the system with my approach.

-- Steve

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread
  2017-12-14 17:55     ` Steven Rostedt
@ 2017-12-14 18:11       ` Tejun Heo
  2017-12-14 18:21         ` Steven Rostedt
  2017-12-15  2:10         ` Sergey Senozhatsky
  0 siblings, 2 replies; 79+ messages in thread
From: Tejun Heo @ 2017-12-14 18:11 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Petr Mladek, Sergey Senozhatsky, Jan Kara, Andrew Morton,
	Peter Zijlstra, Rafael Wysocki, Pavel Machek, Tetsuo Handa,
	linux-kernel, Sergey Senozhatsky

Hey, Steven.

On Thu, Dec 14, 2017 at 12:55:06PM -0500, Steven Rostedt wrote:
> Yes! Please create a reproducer, because I still don't believe there is
> one. And it's all hand waving until there's an actual report that we can
> lock up the system with my approach.

Yeah, will do, but out of curiosity, Sergey and I already described
what the root problem was and you didn't really seem to take that.  Is
that because the explanation didn't make sense to you or us
misunderstanding what your code does?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread
  2017-12-14 18:11       ` Tejun Heo
@ 2017-12-14 18:21         ` Steven Rostedt
  2017-12-22  0:09           ` Tejun Heo
  2017-12-15  2:10         ` Sergey Senozhatsky
  1 sibling, 1 reply; 79+ messages in thread
From: Steven Rostedt @ 2017-12-14 18:21 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Petr Mladek, Sergey Senozhatsky, Jan Kara, Andrew Morton,
	Peter Zijlstra, Rafael Wysocki, Pavel Machek, Tetsuo Handa,
	linux-kernel, Sergey Senozhatsky

On Thu, 14 Dec 2017 10:11:53 -0800
Tejun Heo <tj@kernel.org> wrote:

> Hey, Steven.
> 
> On Thu, Dec 14, 2017 at 12:55:06PM -0500, Steven Rostedt wrote:
> > Yes! Please create a reproducer, because I still don't believe there is
> > one. And it's all hand waving until there's an actual report that we can
> > lock up the system with my approach.  
> 
> Yeah, will do, but out of curiosity, Sergey and I already described
> what the root problem was and you didn't really seem to take that.  Is
> that because the explanation didn't make sense to you or us
> misunderstanding what your code does?

Can you post the message id of the discussion you are referencing.
Because I've been swamped with other activities and only been skimming
these threads.

Thanks,

-- Steve

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread
  2017-12-14 18:11       ` Tejun Heo
  2017-12-14 18:21         ` Steven Rostedt
@ 2017-12-15  2:10         ` Sergey Senozhatsky
  2017-12-15  3:18           ` Steven Rostedt
  1 sibling, 1 reply; 79+ messages in thread
From: Sergey Senozhatsky @ 2017-12-15  2:10 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Tejun Heo, Sergey Senozhatsky, Petr Mladek, Jan Kara,
	Andrew Morton, Peter Zijlstra, Rafael Wysocki, Pavel Machek,
	Tetsuo Handa, linux-kernel, Sergey Senozhatsky

Hello,

On (12/14/17 10:11), Tejun Heo wrote:
> Hey, Steven.
> 
> On Thu, Dec 14, 2017 at 12:55:06PM -0500, Steven Rostedt wrote:
> > Yes! Please create a reproducer, because I still don't believe there is
> > one. And it's all hand waving until there's an actual report that we can
> > lock up the system with my approach.
> 
> Yeah, will do, but out of curiosity, Sergey and I already described
> what the root problem was and you didn't really seem to take that.  Is
> that because the explanation didn't make sense to you or us
> misunderstanding what your code does?

I second _everything_ that Tejun has said.


Steven, your approach works ONLY when we have the following preconditions:

 a) there is a CPU that is calling printk() from the 'safe' (non-atomic,
    etc) context

        what does guarantee that? what happens if there is NO non-atomic
        CPU or that non-atomic simplky missses the console_owner != false
        point? we are going to conclude

        "if printk() doesn't work for you, it's because you are holding it wrong"?


        what if that non-atomic CPU does not call printk(), but instead
        it does console_lock()/console_unlock()? why there is no handoff?

        CPU0				CPU1 ~ CPU10
					in atomic contexts [!]. ping-ponging console_sem
					ownership to each other. while what they really
					need to do is to simply up() and let CPU0 to
					handle it.
					printk
	console_lock()
	 schedule()
					...
					printk
					printk
					...
					printk
					printk

					up()

	// woken up
	console_unlock()

        why do we make an emphasis on fixing vprintk_printk()?


 b) non-atomic CPU sees console_owner set (which is set for a very short
    period of time)

        again. what if that non-atomic CPU does not see console_owner?
        "don't use printk()"?

 c) the task that is looping in console_unlock() sees non-atomic CPU when
    console_owner is set.


IOW, we need to have


   the right CPU (a) at the very right moment (b && c) doing the very right thing.


   * and the "very right moment" is tiny and additionally depends
     on a foreign CPU [the one that is looping in console_unlock()].



a simple question - how is that going to work for everyone? are we
"fixing" a small fraction of possible use-cases?



Steven, I thought we reached the agreement [**] that the solution we should
be working on is a combination of prinkt_kthread and console_sem hand
off. Simply because it adds the missing "there is a non-atomic CPU wishing
to console_unlock()" thing.

	lkml.kernel.org/r/20171108162813.GA983427@devbig577.frc2.facebook.com

	https://marc.info/?l=linux-kernel&m=151011840830776&w=2
	https://marc.info/?l=linux-kernel&m=151015141407368&w=2
	https://marc.info/?l=linux-kernel&m=151018900919386&w=2
	https://marc.info/?l=linux-kernel&m=151019815721161&w=2
	https://marc.info/?l=linux-kernel&m=151020275921953&w=2
**	https://marc.info/?l=linux-kernel&m=151020404622181&w=2
**	https://marc.info/?l=linux-kernel&m=151020565222469&w=2


what am I missing?

	-ss

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread
  2017-12-15  2:10         ` Sergey Senozhatsky
@ 2017-12-15  3:18           ` Steven Rostedt
  2017-12-15  5:06             ` Sergey Senozhatsky
  0 siblings, 1 reply; 79+ messages in thread
From: Steven Rostedt @ 2017-12-15  3:18 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Tejun Heo, Sergey Senozhatsky, Petr Mladek, Jan Kara,
	Andrew Morton, Peter Zijlstra, Rafael Wysocki, Pavel Machek,
	Tetsuo Handa, linux-kernel

On Fri, 15 Dec 2017 11:10:24 +0900
Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> wrote:

> Steven, your approach works ONLY when we have the following preconditions:
> 
>  a) there is a CPU that is calling printk() from the 'safe' (non-atomic,
>     etc) context
> 
>         what does guarantee that? what happens if there is NO non-atomic
>         CPU or that non-atomic simplky missses the console_owner != false
>         point? we are going to conclude
> 
>         "if printk() doesn't work for you, it's because you are holding it wrong"?
> 
> 
>         what if that non-atomic CPU does not call printk(), but instead
>         it does console_lock()/console_unlock()? why there is no handoff?
> 
>         CPU0				CPU1 ~ CPU10
> 					in atomic contexts [!]. ping-ponging console_sem
> 					ownership to each other. while what they really
> 					need to do is to simply up() and let CPU0 to
> 					handle it.
> 					printk
> 	console_lock()
> 	 schedule()
> 					...
> 					printk
> 					printk
> 					...
> 					printk
> 					printk
> 
> 					up()
> 
> 	// woken up
> 	console_unlock()
> 
>         why do we make an emphasis on fixing vprintk_printk()?

Where do we do the above? And has this been proven to be an issue? If
it has, I think it's a separate issue than what I proposed. As what I
proposed is to fix the case where lots of CPUs are doing printks, and
only one actually does the write.

> 
> 
>  b) non-atomic CPU sees console_owner set (which is set for a very short
>     period of time)
> 
>         again. what if that non-atomic CPU does not see console_owner?
>         "don't use printk()"?

May I ask, why are we doing the printk in the first place?

> 
>  c) the task that is looping in console_unlock() sees non-atomic CPU when
>     console_owner is set.

I haven't looked at the latest code, but my last patch didn't care
about "atomic" and "non-atomic" issues, because I don't know if that is
indeed an issue in the real world.

> 
> 
> IOW, we need to have
> 
> 
>    the right CPU (a) at the very right moment (b && c) doing the very right thing.
> 
> 
>    * and the "very right moment" is tiny and additionally depends
>      on a foreign CPU [the one that is looping in console_unlock()].
> 
> 
> 
> a simple question - how is that going to work for everyone? are we
> "fixing" a small fraction of possible use-cases?

Still sounds like you are ;-)

> 
> 
> 
> Steven, I thought we reached the agreement [**] that the solution we should
> be working on is a combination of prinkt_kthread and console_sem hand
> off. Simply because it adds the missing "there is a non-atomic CPU wishing
> to console_unlock()" thing.
> 
> 	lkml.kernel.org/r/20171108162813.GA983427@devbig577.frc2.facebook.com
> 
> 	https://marc.info/?l=linux-kernel&m=151011840830776&w=2
> 	https://marc.info/?l=linux-kernel&m=151015141407368&w=2
> 	https://marc.info/?l=linux-kernel&m=151018900919386&w=2
> 	https://marc.info/?l=linux-kernel&m=151019815721161&w=2
> 	https://marc.info/?l=linux-kernel&m=151020275921953&w=2
> **	https://marc.info/?l=linux-kernel&m=151020404622181&w=2
> **	https://marc.info/?l=linux-kernel&m=151020565222469&w=2

I'm still fine with the hybrid approach, but I want to see a problem
first before we fix it.

> 
> 
> what am I missing?

The reproducer. Let Tejun do the test with just my patch, and if it
still has problems, then we can add more logic to the code. I like to
take things one step at a time. What I'm seeing is that there was a
problem that could be solved with my solution, but during this process,
people have found hundreds of theoretical problems and started down the
path to solve each of them. I want to see a real bug, before we go down
the path of having to have external threads and such, to solve a bug
that we don't really know exists yet.

-- Steve

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread
  2017-12-15  3:18           ` Steven Rostedt
@ 2017-12-15  5:06             ` Sergey Senozhatsky
  2017-12-15  6:52               ` Sergey Senozhatsky
                                 ` (2 more replies)
  0 siblings, 3 replies; 79+ messages in thread
From: Sergey Senozhatsky @ 2017-12-15  5:06 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Sergey Senozhatsky, Tejun Heo, Sergey Senozhatsky, Petr Mladek,
	Jan Kara, Andrew Morton, Peter Zijlstra, Rafael Wysocki,
	Pavel Machek, Tetsuo Handa, linux-kernel

Hello,

On (12/14/17 22:18), Steven Rostedt wrote:
> > Steven, your approach works ONLY when we have the following preconditions:
> > 
> >  a) there is a CPU that is calling printk() from the 'safe' (non-atomic,
> >     etc) context
> > 
> >         what does guarantee that? what happens if there is NO non-atomic
> >         CPU or that non-atomic simplky missses the console_owner != false
> >         point? we are going to conclude
> > 
> >         "if printk() doesn't work for you, it's because you are holding it wrong"?
> > 
> > 
> >         what if that non-atomic CPU does not call printk(), but instead
> >         it does console_lock()/console_unlock()? why there is no handoff?
> > 
> >         CPU0				CPU1 ~ CPU10
> > 					in atomic contexts [!]. ping-ponging console_sem
> > 					ownership to each other. while what they really
> > 					need to do is to simply up() and let CPU0 to
> > 					handle it.
> > 					printk
> > 	console_lock()
> > 	 schedule()
> > 					...
> > 					printk
> > 					printk
> > 					...
> > 					printk
> > 					printk
> > 
> > 					up()
> > 
> > 	// woken up
> > 	console_unlock()
> > 
> >         why do we make an emphasis on fixing vprintk_printk()?
> 
> Where do we do the above? And has this been proven to be an issue?

um... hundreds of cases.

deep-stack spin_lock_irqsave() lockup reports from multiple CPUs (3 cpus)
happening at the same moment + NMI backtraces from all the CPUs (more
than 3 cpus) that follows the lockups, over not-so-fast serial console.
exactly the bug report I received two days ago. so which one of the CPUs
here is a good candidate to successfully emit all of the pending logbuf
entries? none. all of them either have local IRQs disabled, or dump_stack()
from either backtrace IPI or backtrace NMI (depending on the configuration).


do we periodically do console_lock() on a running system? yes, we do.
add to console_unlock()

---

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index b9006617710f..1c811f6d94bf 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2143,6 +2143,10 @@ void console_unlock(void)
        bool wake_klogd = false;
        bool do_cond_resched, retry;
 
+       if (!(current->flags & PF_KTHREAD))
+               dump_stack();
+
+
        if (console_suspended) {
                up_console_sem();
                return;

---

and just boot the system.


I work for a company that has several thousand engineers spread
across the globe. and people do use printk(), and issues do happen.

the scenarios that Tejun and I talk about are not theoretical. if those
scenarios are completely theoretical, as you suggest, - then, OK, what
exactly guarantees that

	whenever atomic CPUs printk there is always a non-atomic
	CPU to take over the printing?



> >  b) non-atomic CPU sees console_owner set (which is set for a very short
> >     period of time)
> > 
> >         again. what if that non-atomic CPU does not see console_owner?
> >         "don't use printk()"?
> 
> May I ask, why are we doing the printk in the first place?

this argument is really may be applied against your patch as well. I
really don't want us to have this type of "technical" discussion.

printk() is a tool for developers. but developers can't use.


> >  c) the task that is looping in console_unlock() sees non-atomic CPU when
> >     console_owner is set.
> 
> I haven't looked at the latest code, but my last patch didn't care
> about "atomic" and "non-atomic"

I know. and I think it is sort of a problem.

lots of printk-s are happening from IRQs / softirqs and so on.
take a look at CONFIG_IP_ROUTE_VERBOSE, for example.

do_softirq() -> ip_handle_martian_source() and a bunch of other
places. 
these irq->printk-s can "steal" the console_sem and go to
console_unlock().

"don't use printk() then" type of argument does not really help
to a guy who reports the lockup.


> > Steven, I thought we reached the agreement [**] that the solution we should
> > be working on is a combination of prinkt_kthread and console_sem hand
> > off. Simply because it adds the missing "there is a non-atomic CPU wishing
> > to console_unlock()" thing.
> > 
> > 	lkml.kernel.org/r/20171108162813.GA983427@devbig577.frc2.facebook.com
> > 
> > 	https://marc.info/?l=linux-kernel&m=151011840830776&w=2
> > 	https://marc.info/?l=linux-kernel&m=151015141407368&w=2
> > 	https://marc.info/?l=linux-kernel&m=151018900919386&w=2
> > 	https://marc.info/?l=linux-kernel&m=151019815721161&w=2
> > 	https://marc.info/?l=linux-kernel&m=151020275921953&w=2
> > **	https://marc.info/?l=linux-kernel&m=151020404622181&w=2
> > **	https://marc.info/?l=linux-kernel&m=151020565222469&w=2
> 
> I'm still fine with the hybrid approach, but I want to see a problem
> first before we fix it.
> 
> > 
> > 
> > what am I missing?
> 
> The reproducer.

will that printk_test module

  lkml.kernel.org/r/20171204135314.9122-2-sergey.senozhatsky@gmail.com

suffice?

	-ss

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread
  2017-12-15  5:06             ` Sergey Senozhatsky
@ 2017-12-15  6:52               ` Sergey Senozhatsky
  2017-12-15 15:39                 ` Steven Rostedt
  2017-12-15  8:31               ` Petr Mladek
  2017-12-15 15:19               ` Steven Rostedt
  2 siblings, 1 reply; 79+ messages in thread
From: Sergey Senozhatsky @ 2017-12-15  6:52 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Tejun Heo, Sergey Senozhatsky, Sergey Senozhatsky, Petr Mladek,
	Jan Kara, Andrew Morton, Peter Zijlstra, Rafael Wysocki,
	Pavel Machek, Tetsuo Handa, linux-kernel

On (12/15/17 14:06), Sergey Senozhatsky wrote:
[..]
> > Where do we do the above? And has this been proven to be an issue?
> 
> um... hundreds of cases.
> 
> deep-stack spin_lock_irqsave() lockup reports from multiple CPUs (3 cpus)
> happening at the same moment + NMI backtraces from all the CPUs (more
> than 3 cpus) that follows the lockups, over not-so-fast serial console.
> exactly the bug report I received two days ago. so which one of the CPUs
> here is a good candidate to successfully emit all of the pending logbuf
> entries? none. all of them either have local IRQs disabled, or dump_stack()
> from either backtrace IPI or backtrace NMI (depending on the configuration).

and, Steven, one more thing. wondering what's your opinion.

suppose we have consoe_owner hand off enabled, 1 non-atomic CPU doing
printk-s and several atomic CPUs doing printk-s. Is proposed hand off
scheme really useful in this case? CPUs will now

a) print their lines (a potentially slow call_console_drivers())

and

b) spin in vprintk_emit on console_owner with local IRQs disabled
   waiting for either non-atomic printk CPU or another atomic CPU
   to finish printing its line (call_console_drivers()) and to hand
   off printing. so current CPU, after busy-waiting for foreign CPU's
   call_console_drivers(), will go and do his own call_console_drivers().
   which, time-wise, simply doubles (roughly) the amount of time that
   CPU spends in printk()->console_unlock(). agreed?

   if we previously could have a case when non-atomic printk CPU would
   grab the console_sem and print all atomic printk CPUs messages first,
   and then its own messages, thus atomic printk CPUs would have just
   log_store(), now we will have CPUs to call_console_driver() and to
   spin on console_sem owner waiting for call_console_driver() on a foreign
   CPU  [not all of them: it's one CPU doing the print out and one CPU
   spinning console_owner. but overall I think all CPUs will experience
   that spin on console_sem waiting for call_console_driver() and then do
   its own call_console_driver()].

even two CPUs case is not so simple anymore. see below.

- first, assume one CPU is atomic and one is non-atomic.
- second, assume that both CPUs are atomic CPUs, and go thought it again.

CPU0                            CPU1

printk()                        printk()
 log_store()
                                 log_store()
 console_unlock()
  set console_owner
                                 sees console_owner
                                 sets console_waiter
                                 spin
  call_console_drivers()
  sees console_waiter
   break

printk()
 log_store()
                                 console_unlock()
                                  set console_owner
 sees console_owner
 sets console_waiter
 spin
                                 call_console_drivers()
                                 sees console_waiter
                                  break

                                printk()
                                 log_store()
 console_unlock()
  set console_owner
                                 sees console_owner
                                 sets console_waiter
                                 spin
  call_console_drivers()
  sees console_waiter
  break

printk()
 log_store()
                                 console_unlock()
                                  set console_owner
 sees console_owner
 sets console_waiter
 spin

....

that "wait for call_console_drivers() on another CPU and then do
its own call_console_drivers()" pattern does look dangerous. the
benefit of hand-off is really fragile sometimes, isn't it?

	-ss

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread
  2017-12-15  5:06             ` Sergey Senozhatsky
  2017-12-15  6:52               ` Sergey Senozhatsky
@ 2017-12-15  8:31               ` Petr Mladek
  2017-12-15  8:42                 ` Sergey Senozhatsky
  2017-12-15 15:42                 ` Steven Rostedt
  2017-12-15 15:19               ` Steven Rostedt
  2 siblings, 2 replies; 79+ messages in thread
From: Petr Mladek @ 2017-12-15  8:31 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Steven Rostedt, Tejun Heo, Sergey Senozhatsky, Jan Kara,
	Andrew Morton, Peter Zijlstra, Rafael Wysocki, Pavel Machek,
	Tetsuo Handa, linux-kernel

On Fri 2017-12-15 14:06:07, Sergey Senozhatsky wrote:
> Hello,
> 
> On (12/14/17 22:18), Steven Rostedt wrote:
> > > Steven, your approach works ONLY when we have the following preconditions:
> > > 
> > >  a) there is a CPU that is calling printk() from the 'safe' (non-atomic,
> > >     etc) context
> > > 
> > >         what does guarantee that? what happens if there is NO non-atomic
> > >         CPU or that non-atomic simplky missses the console_owner != false
> > >         point? we are going to conclude
> > > 
> > >         "if printk() doesn't work for you, it's because you are holding it wrong"?
> > > 
> > > 
> > >         what if that non-atomic CPU does not call printk(), but instead
> > >         it does console_lock()/console_unlock()? why there is no handoff?
> > > 
> > >         CPU0				CPU1 ~ CPU10
> > > 					in atomic contexts [!]. ping-ponging console_sem
> > > 					ownership to each other. while what they really
> > > 					need to do is to simply up() and let CPU0 to
> > > 					handle it.
> > > 					printk
> > > 	console_lock()
> > > 	 schedule()
> > > 					...
> > > 					printk
> > > 					printk
> > > 					...
> > > 					printk
> > > 					printk
> > > 
> > > 					up()
> > > 
> > > 	// woken up
> > > 	console_unlock()
> > > 
> > >         why do we make an emphasis on fixing vprintk_printk()?

Is the above scenario really dangerous? console_lock() owner is
able to sleep. Therefore there is no risk of a softlockup.

Sure, many messages will get stacked in the meantime and the console
owner my get then passed to another owner in atomic context. But
do you really see this in the real life?

> > Where do we do the above? And has this been proven to be an issue?
> 
> um... hundreds of cases.
> 
> I work for a company that has several thousand engineers spread
> across the globe. and people do use printk(), and issues do happen.

Do people have issues with the current upstream printk() or
still even with Steven's patch?

My current view is that Steven's patch could not make things
worse. I was afraid of possible deadlock but it seems that I was
wrong. Other than that the patch should make things just better
because it allows to pass the work from time to time a safe way.

Of course, there is a chance that it will pass the work from
a safe context to atomic one. But there was the same chance that
the work already started in the atomic context. Therefore statistically
this should not make things worse.

This is why I suggest to start with Steven's solution. If people
would still see problems in the real life then we could think
about how to fix it. It is quite likely that we would need to add
offloading to the kthreads in the end but there is a chance...

In each case, I think that is better to split in into
two or even more steps than introducing one mega-complex
change. And given the many-years resistance against offloading
I tend to start with Steven's approach.

Does this make some sense, please?

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread
  2017-12-15  8:31               ` Petr Mladek
@ 2017-12-15  8:42                 ` Sergey Senozhatsky
  2017-12-15  9:08                   ` Petr Mladek
  2017-12-15 15:42                 ` Steven Rostedt
  1 sibling, 1 reply; 79+ messages in thread
From: Sergey Senozhatsky @ 2017-12-15  8:42 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Tejun Heo,
	Sergey Senozhatsky, Jan Kara, Andrew Morton, Peter Zijlstra,
	Rafael Wysocki, Pavel Machek, Tetsuo Handa, linux-kernel

On (12/15/17 09:31), Petr Mladek wrote:
> > On (12/14/17 22:18), Steven Rostedt wrote:
> > > > Steven, your approach works ONLY when we have the following preconditions:
> > > > 
> > > >  a) there is a CPU that is calling printk() from the 'safe' (non-atomic,
> > > >     etc) context
> > > > 
> > > >         what does guarantee that? what happens if there is NO non-atomic
> > > >         CPU or that non-atomic simplky missses the console_owner != false
> > > >         point? we are going to conclude
> > > > 
> > > >         "if printk() doesn't work for you, it's because you are holding it wrong"?
> > > > 
> > > > 
> > > >         what if that non-atomic CPU does not call printk(), but instead
> > > >         it does console_lock()/console_unlock()? why there is no handoff?
> > > > 
> > > >         CPU0				CPU1 ~ CPU10
> > > > 					in atomic contexts [!]. ping-ponging console_sem
> > > > 					ownership to each other. while what they really
> > > > 					need to do is to simply up() and let CPU0 to
> > > > 					handle it.
> > > > 					printk
> > > > 	console_lock()
> > > > 	 schedule()
> > > > 					...
> > > > 					printk
> > > > 					printk
> > > > 					...
> > > > 					printk
> > > > 					printk
> > > > 
> > > > 					up()
> > > > 
> > > > 	// woken up
> > > > 	console_unlock()
> > > > 
> > > >         why do we make an emphasis on fixing vprintk_printk()?
> 
> Is the above scenario really dangerous? console_lock() owner is
> able to sleep. Therefore there is no risk of a softlockup.
> 
> Sure, many messages will get stacked in the meantime and the console
> owner my get then passed to another owner in atomic context. But
> do you really see this in the real life?

console_sem is locked by atomic printk CPU1~CPU10. non-atomic CPU is just
sleeping waiting for the console_sem. while atomic printk CPUs just hand
off console_sem ownership to each other without ever up()-ing the console_sem.
what's the point of hand off here? how is that going to work?

what we need to do is to offload printing from atomic contexts to a
non-atomic one - which is CPU0. and that non-atomic CPU is sleeping
on the console_sem, ready to continue printing. but it never gets its
chance to do so, because CPU0 ~ CPU10 just passing console_sem ownership
around, resulting in the same "we print from atomic context" thing.

> Of course, there is a chance that it will pass the work from
> a safe context to atomic one. But there was the same chance that
> the work already started in the atomic context. Therefore statistically
> this should not make things worse.

which is not a justification. we are not looking for a solution that
does not make the things worse. we are looking for a solution that
does improve the things.

	-ss

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread
  2017-12-15  8:42                 ` Sergey Senozhatsky
@ 2017-12-15  9:08                   ` Petr Mladek
  2017-12-15 15:47                     ` Steven Rostedt
  2017-12-18  9:36                     ` Sergey Senozhatsky
  0 siblings, 2 replies; 79+ messages in thread
From: Petr Mladek @ 2017-12-15  9:08 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Steven Rostedt, Tejun Heo, Sergey Senozhatsky, Jan Kara,
	Andrew Morton, Peter Zijlstra, Rafael Wysocki, Pavel Machek,
	Tetsuo Handa, linux-kernel

On Fri 2017-12-15 17:42:36, Sergey Senozhatsky wrote:
> On (12/15/17 09:31), Petr Mladek wrote:
> > > On (12/14/17 22:18), Steven Rostedt wrote:
> > > > > Steven, your approach works ONLY when we have the following preconditions:
> > > > > 
> > > > >  a) there is a CPU that is calling printk() from the 'safe' (non-atomic,
> > > > >     etc) context
> > > > > 
> > > > >         what does guarantee that? what happens if there is NO non-atomic
> > > > >         CPU or that non-atomic simplky missses the console_owner != false
> > > > >         point? we are going to conclude
> > > > > 
> > > > >         "if printk() doesn't work for you, it's because you are holding it wrong"?
> > > > > 
> > > > > 
> > > > >         what if that non-atomic CPU does not call printk(), but instead
> > > > >         it does console_lock()/console_unlock()? why there is no handoff?
> > > > > 
> > > > >         CPU0				CPU1 ~ CPU10
> > > > > 					in atomic contexts [!]. ping-ponging console_sem
> > > > > 					ownership to each other. while what they really
> > > > > 					need to do is to simply up() and let CPU0 to
> > > > > 					handle it.
> > > > > 					printk
> > > > > 	console_lock()
> > > > > 	 schedule()
> > > > > 					...
> > > > > 					printk
> > > > > 					printk
> > > > > 					...
> > > > > 					printk
> > > > > 					printk
> > > > > 
> > > > > 					up()
> > > > > 
> > > > > 	// woken up
> > > > > 	console_unlock()
> > > > > 
> > > > >         why do we make an emphasis on fixing vprintk_printk()?
> > 
> > Is the above scenario really dangerous? console_lock() owner is
> > able to sleep. Therefore there is no risk of a softlockup.
> > 
> > Sure, many messages will get stacked in the meantime and the console
> > owner my get then passed to another owner in atomic context. But
> > do you really see this in the real life?
> 
> console_sem is locked by atomic printk CPU1~CPU10. non-atomic CPU is just
> sleeping waiting for the console_sem. while atomic printk CPUs just hand
> off console_sem ownership to each other without ever up()-ing the console_sem.
> what's the point of hand off here? how is that going to work?
> 
> what we need to do is to offload printing from atomic contexts to a
> non-atomic one - which is CPU0. and that non-atomic CPU is sleeping
> on the console_sem, ready to continue printing. but it never gets its
> chance to do so, because CPU0 ~ CPU10 just passing console_sem ownership
> around, resulting in the same "we print from atomic context" thing.

Yes, I understand the scenario. The question is how much it is
realistic. And if it is realistic, the question is if the Steven's
patch helps to avoid the softlockup or not.

IMHO, what we need is to push the patch into wild and wait for
real life reports.


> > Of course, there is a chance that it will pass the work from
> > a safe context to atomic one. But there was the same chance that
> > the work already started in the atomic context. Therefore statistically
> > this should not make things worse.
> 
> which is not a justification. we are not looking for a solution that
> does not make the things worse. we are looking for a solution that
> does improve the things.

But it does improve things! The question is if it is enough or not
in the real life.

Do you see a scenario where it makes things statistically or
even deterministically worse?

Why did you completely ignored the paragraph about step by step
approach? Is there anything wrong about it?


You are looking for a perfect solution. But there is no perfect
solution. There still will be conflict between the user requirements:
"get messages out" vs. "do not lockup the machine".

The nice thing about Steven's solution is that it slightly improves
one side and does not make worse the other side. Or am I wrong?
Sure, it is possible that it will not be enough. But why not try
this small step first?

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread
  2017-12-15  5:06             ` Sergey Senozhatsky
  2017-12-15  6:52               ` Sergey Senozhatsky
  2017-12-15  8:31               ` Petr Mladek
@ 2017-12-15 15:19               ` Steven Rostedt
  2017-12-19  0:52                 ` Sergey Senozhatsky
  2 siblings, 1 reply; 79+ messages in thread
From: Steven Rostedt @ 2017-12-15 15:19 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Tejun Heo, Sergey Senozhatsky, Petr Mladek, Jan Kara,
	Andrew Morton, Peter Zijlstra, Rafael Wysocki, Pavel Machek,
	Tetsuo Handa, linux-kernel

On Fri, 15 Dec 2017 14:06:07 +0900
Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> wrote:

> Hello,
> 
> On (12/14/17 22:18), Steven Rostedt wrote:
> > > Steven, your approach works ONLY when we have the following preconditions:
> > > 
> > >  a) there is a CPU that is calling printk() from the 'safe' (non-atomic,
> > >     etc) context
> > > 
> > >         what does guarantee that? what happens if there is NO non-atomic
> > >         CPU or that non-atomic simplky missses the console_owner != false
> > >         point? we are going to conclude
> > > 
> > >         "if printk() doesn't work for you, it's because you are holding it wrong"?
> > > 
> > > 
> > >         what if that non-atomic CPU does not call printk(), but instead
> > >         it does console_lock()/console_unlock()? why there is no handoff?

The case here, you are talking about a CPU doing console_lock() from a
non printk() case. Which is what I was asking about how often this
happens.

As for why there's no handoff. Does the non printk()
console_lock/unlock ever happen from a critical location? I don't think
it does (but I haven't checked). Then it is the perfect candidate to do
all the printing.


> > > 
> > >         CPU0				CPU1 ~ CPU10
> > > 					in atomic contexts [!]. ping-ponging console_sem
> > > 					ownership to each other. while what they really
> > > 					need to do is to simply up() and let CPU0 to
> > > 					handle it.
> > > 					printk
> > > 	console_lock()
> > > 	 schedule()
> > > 					...
> > > 					printk
> > > 					printk
> > > 					...
> > > 					printk
> > > 					printk
> > > 
> > > 					up()
> > > 
> > > 	// woken up
> > > 	console_unlock()
> > > 
> > >         why do we make an emphasis on fixing vprintk_printk()?  
> > 
> > Where do we do the above? And has this been proven to be an issue?  
> 
> um... hundreds of cases.

I was asking about doing the console_unlock from not printk case.

> 
> deep-stack spin_lock_irqsave() lockup reports from multiple CPUs (3 cpus)
> happening at the same moment + NMI backtraces from all the CPUs (more
> than 3 cpus) that follows the lockups, over not-so-fast serial console.
> exactly the bug report I received two days ago. so which one of the CPUs
> here is a good candidate to successfully emit all of the pending logbuf
> entries? none. all of them either have local IRQs disabled, or dump_stack()
> from either backtrace IPI or backtrace NMI (depending on the configuration).
> 

Is the above showing an issue of console_lock happening in the non
printk() case?

> 
> do we periodically do console_lock() on a running system? yes, we do.
> add to console_unlock()

Right, and the non printk() console_lock() should be fine to do all
printing when it unlocks.

> 
> ---
> 
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index b9006617710f..1c811f6d94bf 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -2143,6 +2143,10 @@ void console_unlock(void)
>         bool wake_klogd = false;
>         bool do_cond_resched, retry;
>  
> +       if (!(current->flags & PF_KTHREAD))
> +               dump_stack();
> +
> +
>         if (console_suspended) {
>                 up_console_sem();
>                 return;
> 
> ---
> 
> and just boot the system.
> 
> 
> I work for a company that has several thousand engineers spread
> across the globe. and people do use printk(), and issues do happen.

Sure, and I still think my patch is good enough.

> 
> the scenarios that Tejun and I talk about are not theoretical. if those
> scenarios are completely theoretical, as you suggest, - then, OK, what
> exactly guarantees that

And I still think my patch is good enough.

> 
> 	whenever atomic CPUs printk there is always a non-atomic
> 	CPU to take over the printing?

No, and I don't think it has to.

> 
> 
> 
> > >  b) non-atomic CPU sees console_owner set (which is set for a very short
> > >     period of time)
> > > 
> > >         again. what if that non-atomic CPU does not see console_owner?
> > >         "don't use printk()"?  
> > 
> > May I ask, why are we doing the printk in the first place?  
> 
> this argument is really may be applied against your patch as well. I
> really don't want us to have this type of "technical" discussion.

Sure, but my patch fixes the unfair approach that printk currently does.

> 
> printk() is a tool for developers. but developers can't use.
> 
> 
> > >  c) the task that is looping in console_unlock() sees non-atomic CPU when
> > >     console_owner is set.  
> > 
> > I haven't looked at the latest code, but my last patch didn't care
> > about "atomic" and "non-atomic"  
> 
> I know. and I think it is sort of a problem.

Please show me the case that it is. And don't explain where it is.
Please apply the patch and have the problem occur and show it to me.
That's all that I'm asking for.

> 
> lots of printk-s are happening from IRQs / softirqs and so on.
> take a look at CONFIG_IP_ROUTE_VERBOSE, for example.

Yep, understood.

> 
> do_softirq() -> ip_handle_martian_source() and a bunch of other
> places. 
> these irq->printk-s can "steal" the console_sem and go to
> console_unlock().
> 
> "don't use printk() then" type of argument does not really help
> to a guy who reports the lockup.

Heh, one argument at a time. The "don't use printk" comes later. Right
now, I want to see the problem that is not fixed by my patch.


> 
> 
> > > Steven, I thought we reached the agreement [**] that the solution we should
> > > be working on is a combination of prinkt_kthread and console_sem hand
> > > off. Simply because it adds the missing "there is a non-atomic CPU wishing
> > > to console_unlock()" thing.
> > > 
> > > 	lkml.kernel.org/r/20171108162813.GA983427@devbig577.frc2.facebook.com
> > > 
> > > 	https://marc.info/?l=linux-kernel&m=151011840830776&w=2
> > > 	https://marc.info/?l=linux-kernel&m=151015141407368&w=2
> > > 	https://marc.info/?l=linux-kernel&m=151018900919386&w=2
> > > 	https://marc.info/?l=linux-kernel&m=151019815721161&w=2
> > > 	https://marc.info/?l=linux-kernel&m=151020275921953&w=2
> > > **	https://marc.info/?l=linux-kernel&m=151020404622181&w=2
> > > **	https://marc.info/?l=linux-kernel&m=151020565222469&w=2  
> > 
> > I'm still fine with the hybrid approach, but I want to see a problem
> > first before we fix it.
> >   
> > > 
> > > 
> > > what am I missing?  
> > 
> > The reproducer.  
> 
> will that printk_test module
> 
>   lkml.kernel.org/r/20171204135314.9122-2-sergey.senozhatsky@gmail.com
> 
> suffice?

No, because it is unrealistic. For example:

+static void test_noirq_console_unlock(void)
+{
+       unsigned long flags;
+       unsigned long num_messages = 0;
+
+       pr_err("=== TEST %s\n", __func__);
+
+       num_messages = 0;
+       console_lock();
+       while (num_messages++ < max_num_messages)
+               pr_info("=== %s Append message %lu out of %lu\n",
+                               __func__,
+                               num_messages,
+                               max_num_messages);
+
+       local_irq_save(flags);
+       console_unlock();

Where in the kernel do we do this?

Where do we take console_lock() under preemptible context and then
release it under non preemptible context besides in printk?

You just manufactured a scenario that my patch does not cover, because
it only covers printk() console lock an unlock because that printk is a
known state.

I'm looking for real scenarios in a production kernel.

-- Steve




+       local_irq_restore(flags);
+}

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread
  2017-12-15  6:52               ` Sergey Senozhatsky
@ 2017-12-15 15:39                 ` Steven Rostedt
  0 siblings, 0 replies; 79+ messages in thread
From: Steven Rostedt @ 2017-12-15 15:39 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Tejun Heo, Sergey Senozhatsky, Petr Mladek, Jan Kara,
	Andrew Morton, Peter Zijlstra, Rafael Wysocki, Pavel Machek,
	Tetsuo Handa, linux-kernel

On Fri, 15 Dec 2017 15:52:05 +0900
Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> wrote:

> On (12/15/17 14:06), Sergey Senozhatsky wrote:
> [..]
> > > Where do we do the above? And has this been proven to be an issue?  
> > 
> > um... hundreds of cases.
> > 
> > deep-stack spin_lock_irqsave() lockup reports from multiple CPUs (3 cpus)
> > happening at the same moment + NMI backtraces from all the CPUs (more
> > than 3 cpus) that follows the lockups, over not-so-fast serial console.
> > exactly the bug report I received two days ago. so which one of the CPUs
> > here is a good candidate to successfully emit all of the pending logbuf
> > entries? none. all of them either have local IRQs disabled, or dump_stack()
> > from either backtrace IPI or backtrace NMI (depending on the configuration).  
> 
> 
> and, Steven, one more thing. wondering what's your opinion.
> 
> 
> suppose we have consoe_owner hand off enabled, 1 non-atomic CPU doing
> printk-s and several atomic CPUs doing printk-s. Is proposed hand off
> scheme really useful in this case? CPUs will now
> 
> a) print their lines (a potentially slow call_console_drivers())

My question is, is this an actual problem. Because we haven't fully
evaluated if my patch is enough to make this a non issue.

> 
> and
> 
> b) spin in vprintk_emit on console_owner with local IRQs disabled
>    waiting for either non-atomic printk CPU or another atomic CPU
>    to finish printing its line (call_console_drivers()) and to hand
>    off printing. so current CPU, after busy-waiting for foreign CPU's
>    call_console_drivers(), will go and do his own call_console_drivers().
>    which, time-wise, simply doubles (roughly) the amount of time that
>    CPU spends in printk()->console_unlock(). agreed?

Worse case is doubling, if the two printks happen at the same time.

Today, it is unbounded, which means my patch is much better than what
we do today, and hence we still don't know if it is "good enough". This
is where I say you are trying to solve another problem than what we are
encountering today. Because we don't know if the problem is due to the
unbounded nature of printk, or just a slight longer (but bounded) time.

> 
>    if we previously could have a case when non-atomic printk CPU would
>    grab the console_sem and print all atomic printk CPUs messages first,
>    and then its own messages, thus atomic printk CPUs would have just
>    log_store(), now we will have CPUs to call_console_driver() and to
>    spin on console_sem owner waiting for call_console_driver() on a foreign
>    CPU  [not all of them: it's one CPU doing the print out and one CPU
>    spinning console_owner. but overall I think all CPUs will experience
>    that spin on console_sem waiting for call_console_driver() and then do
>    its own call_console_driver()].
> 
> 
> even two CPUs case is not so simple anymore. see below.
> 
> - first, assume one CPU is atomic and one is non-atomic.
> - second, assume that both CPUs are atomic CPUs, and go thought it again.
> 
> 
> CPU0                            CPU1
> 
> printk()                        printk()
>  log_store()
>                                  log_store()
>  console_unlock()
>   set console_owner
>                                  sees console_owner
>                                  sets console_waiter
>                                  spin
>   call_console_drivers()
>   sees console_waiter
>    break
> 
> printk()
>  log_store()
>                                  console_unlock()
>                                   set console_owner
>  sees console_owner
>  sets console_waiter
>  spin
>                                  call_console_drivers()
>                                  sees console_waiter
>                                   break
> 
>                                 printk()
>                                  log_store()
>  console_unlock()
>   set console_owner
>                                  sees console_owner
>                                  sets console_waiter
>                                  spin
>   call_console_drivers()
>   sees console_waiter
>   break
> 
> printk()
>  log_store()
>                                  console_unlock()
>                                   set console_owner
>  sees console_owner
>  sets console_waiter
>  spin
> 
> ....
> 
> 
> that "wait for call_console_drivers() on another CPU and then do
> its own call_console_drivers()" pattern does look dangerous. the
> benefit of hand-off is really fragile sometimes, isn't it?

Actually, I see your scenario as a perfect example of my patch working
well. You have two CPUs spamming the console with printks, and instead
of one CPU stuck doing nothing but outputting both CPUs work, the two
share the load.

Again, I'm not convinced that all the issues you have encountered so
far can not be solved with my patch. I would like to see a real example
of where it fails. Lets not make this a theoretical approach, lets do
it incrementally and solve one problem at a time.

I know for a fact that the one CPU outputting all other CPUs printks is
a problem. That has been shown time and time again (and I believe is
the cause for the problems you stated).

The cause of the above issue is the unbounded nature of printk. The
fact that there's no guarantee that printk will ever finish. My patch
makes it have to finish and makes it bounded. In RT, that's what we
always work on, and that usually is good enough. Priority inheritance
can take a long time too, but it causes everything to have a guaranteed
max latency. My patch makes printk guaranteed to stop, and I want to
see if this is good enough for all current issues we've come across
before we go into any more elaborate algorithms.

-- Steve

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread
  2017-12-15  8:31               ` Petr Mladek
  2017-12-15  8:42                 ` Sergey Senozhatsky
@ 2017-12-15 15:42                 ` Steven Rostedt
  1 sibling, 0 replies; 79+ messages in thread
From: Steven Rostedt @ 2017-12-15 15:42 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Tejun Heo, Sergey Senozhatsky, Jan Kara,
	Andrew Morton, Peter Zijlstra, Rafael Wysocki, Pavel Machek,
	Tetsuo Handa, linux-kernel

On Fri, 15 Dec 2017 09:31:51 +0100
Petr Mladek <pmladek@suse.com> wrote:

> Do people have issues with the current upstream printk() or
> still even with Steven's patch?
> 
> My current view is that Steven's patch could not make things
> worse. I was afraid of possible deadlock but it seems that I was
> wrong. Other than that the patch should make things just better
> because it allows to pass the work from time to time a safe way.
> 
> Of course, there is a chance that it will pass the work from
> a safe context to atomic one. But there was the same chance that
> the work already started in the atomic context. Therefore statistically
> this should not make things worse.
> 
> This is why I suggest to start with Steven's solution. If people
> would still see problems in the real life then we could think
> about how to fix it. It is quite likely that we would need to add
> offloading to the kthreads in the end but there is a chance...
> 
> In each case, I think that is better to split in into
> two or even more steps than introducing one mega-complex
> change. And given the many-years resistance against offloading
> I tend to start with Steven's approach.

THANK YOU!!!

This is exactly what I'm trying to convey.

> 
> Does this make some sense, please?

It definitely does to me :-)

-- Steve

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread
  2017-12-15  9:08                   ` Petr Mladek
@ 2017-12-15 15:47                     ` Steven Rostedt
  2017-12-18  9:36                     ` Sergey Senozhatsky
  1 sibling, 0 replies; 79+ messages in thread
From: Steven Rostedt @ 2017-12-15 15:47 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Tejun Heo, Sergey Senozhatsky, Jan Kara,
	Andrew Morton, Peter Zijlstra, Rafael Wysocki, Pavel Machek,
	Tetsuo Handa, linux-kernel

On Fri, 15 Dec 2017 10:08:01 +0100
Petr Mladek <pmladek@suse.com> wrote:

> You are looking for a perfect solution. But there is no perfect
> solution.

"Perfection is the enemy of 'good enough'" :-)

-- Steve

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread
  2017-12-14 14:39   ` Sergey Senozhatsky
@ 2017-12-15 15:55     ` Steven Rostedt
  0 siblings, 0 replies; 79+ messages in thread
From: Steven Rostedt @ 2017-12-15 15:55 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Petr Mladek, Jan Kara, Andrew Morton, Peter Zijlstra,
	Rafael Wysocki, Pavel Machek, Tetsuo Handa, linux-kernel,
	Sergey Senozhatsky

On Thu, 14 Dec 2017 23:39:36 +0900
Sergey Senozhatsky <sergey.senozhatsky@gmail.com> wrote:

> On (12/14/17 15:27), Petr Mladek wrote:
> >
> > Therefore I tend to give Steven's solution a chance before this
> > combined approach.
> >   
> 
> have you seen this https://marc.info/?l=linux-kernel&m=151015850209859
> or this https://marc.info/?l=linux-kernel&m=151011840830776&w=2
> or this https://marc.info/?l=linux-kernel&m=151020275921953&w=2
> 

And these are all just still hand waving. Can we please apply my patch
and send it off into the wild. It doesn't not make things worse, but
does make things better.

My patch probably would have kept this from being a problem:

 http://lkml.kernel.org/r/1509017339-4802-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp

-- Steve

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread
  2017-12-15  9:08                   ` Petr Mladek
  2017-12-15 15:47                     ` Steven Rostedt
@ 2017-12-18  9:36                     ` Sergey Senozhatsky
  2017-12-18 10:36                       ` Sergey Senozhatsky
  2017-12-18 13:31                       ` Petr Mladek
  1 sibling, 2 replies; 79+ messages in thread
From: Sergey Senozhatsky @ 2017-12-18  9:36 UTC (permalink / raw)
  To: Petr Mladek, Steven Rostedt
  Cc: Sergey Senozhatsky, Tejun Heo, Sergey Senozhatsky, Jan Kara,
	Andrew Morton, Peter Zijlstra, Rafael Wysocki, Pavel Machek,
	Tetsuo Handa, linux-kernel

On (12/15/17 10:08), Petr Mladek wrote:
[..]
> > > Is the above scenario really dangerous? console_lock() owner is
> > > able to sleep. Therefore there is no risk of a softlockup.
> > > 
> > > Sure, many messages will get stacked in the meantime and the console
> > > owner my get then passed to another owner in atomic context. But
> > > do you really see this in the real life?
> > 
> > console_sem is locked by atomic printk CPU1~CPU10. non-atomic CPU is just
> > sleeping waiting for the console_sem. while atomic printk CPUs just hand
> > off console_sem ownership to each other without ever up()-ing the console_sem.
> > what's the point of hand off here? how is that going to work?
> > 
> > what we need to do is to offload printing from atomic contexts to a
> > non-atomic one - which is CPU0. and that non-atomic CPU is sleeping
> > on the console_sem, ready to continue printing. but it never gets its
> > chance to do so, because CPU0 ~ CPU10 just passing console_sem ownership
> > around, resulting in the same "we print from atomic context" thing.
> 
> Yes, I understand the scenario. The question is how much it is
> realistic.

so it's unlikely to have several CPUs on an SMP system printk-ing
from atomic contexts, while one of the available CPUs does console_lock()
or printk() from non-atomic context?


[..]
> > which is not a justification. we are not looking for a solution that
> > does not make the things worse. we are looking for a solution that
> > does improve the things.
> 
> But it does improve things! The question is if it is enough or not
> in the real life.

console_unlock() is still unbound.


"spreading the printk load" between N CPUs is just 50% of the actual
problem.

console_unlock() has several sides being involved: one is doing the
print out, the other one is sleeping on console_sem waiting for the
first one to up() the console_sem. yes, CPUs that do printk will now
hand over console_sem to each other, but up() still does not happen
for unknown period of time. so all of the processes that sleep in
TASK_UNINTERRUPTIBLE on console_sem - user space process, tty, pty,
drm, frame buffers, PM, etc. etc. - still have unbound TASK_UNINTERRUPTIBLE
sleep. we've been talking about it for years. but there are more
issues.


> Do you see a scenario where it makes things statistically or
> even deterministically worse?

I really don't have that much time right now; but I did a very quick
test on one of my boards today.

NOTE: speaking for my myself only and about my observations only.
      you are free call it unrealistic, impossible, silly, etc.

One more NOTE:
      The board I'm playing with has immediate printk offloading enabled.
      We figured out it's better to have it enabled rather than not, after
      all. It makes a huge difference.

And another NOTE:
      I did NOT alter my setup; I did NOT perform the stupid printk-flood
      type of test (while (1) printk()). I used the "tools" we are using here
      every day, which don't really abuse printk.



// UPD... OK... I actually ended up spending several hours on it, much more
// than I planned. because... I was quite puzzled. I double... no... triple
// checked the backport. It's exactly what v4 posted by Steven does - modulo
// printk_safe stuff [I don't have it backported].


Back to the console_owner patch.

1) it opens both soft and hard lockup vectors

   I see *a lot* of cases when CPU that call printk in a loop does not
   end up flushing its messages. And the problem seems to be - preemption.


  CPU0						CPU1

  for_each_process_thread(g, p)
    printk()
    console_unlock()				printk
    						 console_trylock() fails
    sets console_owner
						 sees console_owner
						 sets console_waiter
    call_console_drivers
    clears console_owner
    sees console_waiter
    hand off					 spins with local_irq disabled
						 sees that it has acquired console_sem ownership

						 enables local_irq
    printk
    ..						 << preemption >>
    printk
    ...         unbound number of printk-s
    printk
    ..
    printk
						back to TASK_RUNNING
						goes to console_unlock()
    printk
						local_irq_save

    ???
						*** if it will see console_waiter [being in any
						context] it will hand off. otherwise, preemption
						again and CPU0 can add even more messages to logbuf

						local_irq_restore

						<< preemption >>



   I added several traces points, trying to understand why this patch
   was so unimpressive on my board.

console_trylock() was successful
   - trace_offloading("vprintk_emit()->trylock OK", " ", 0);

console_trylock() was unsuccessful and we went through `if (!waiter && owner && owner != current)'
under console_owner_lock spin_lock with local_irqs disabled
   - trace_offloading("vprintk_emit()->trylock FAIL", " will spin? ", spin);

task in console_unlock() set console_owner, under console_owner_lock
   - trace_offloading("set console_owner", " ", 0);

task in console_unlock() cleared console_owner, under console_owner_lock
   - trace_offloading("clear console_owner", " waiter != NULL ", !!waiter);

task in console_unlock() saw waiter, break out of printing loop and hand
off printing
   - trace_offloading("hand off", " ", 0);



a *very small* part of the logs. a quick introduction [I changed the
name of processes, so it'll be easier]:

i_do_printks-2997 is the process that does printk() loop. the logs starts
with it being in console_unlock(), just when it set the console_owner.
usertaskA-1167 comes in, fails to trylock, sees the console_owner, sets
itself as console_waiter. kworker/0:0-2990 comes in, fails to trylock
console_sem, fails to set itself as console_waiter, leaves vprintk_emit().
i_do_printks-2997 sees console_waiter, hand off printing to usertaskA-1167.
usertaskA-1167 is getting preempted. i_do_printks-2997 continue printk-s,
which are just log_store(), because console_sem is locked, its owner is
preempted.


   ...
    i_do_printks-2997  [003] d..1   792.616378: offloading: set console_owner  :0
       usertaskA-1167  [001] d..1   792.617560: offloading: vprintk_emit()->trylock FAIL  will spin? :1
     kworker/0:0-2990  [000] d..1   792.618280: offloading: vprintk_emit()->trylock FAIL  will spin? :0
     kworker/0:0-2990  [000] d..1   792.618387: offloading: vprintk_emit()->trylock FAIL  will spin? :0
     kworker/0:0-2990  [000] d..1   792.618470: offloading: vprintk_emit()->trylock FAIL  will spin? :0
     kworker/0:0-2990  [000] d..1   792.618478: offloading: vprintk_emit()->trylock FAIL  will spin? :0
     kworker/0:0-2990  [000] d..1   792.618723: offloading: vprintk_emit()->trylock FAIL  will spin? :0
     kworker/0:0-2990  [000] d..1   792.618902: offloading: vprintk_emit()->trylock FAIL  will spin? :0
     kworker/0:0-2990  [000] d..1   792.619057: offloading: vprintk_emit()->trylock FAIL  will spin? :0
     kworker/0:0-2990  [000] d..1   792.619064: offloading: vprintk_emit()->trylock FAIL  will spin? :0
     kworker/0:0-2990  [000] d..1   792.620077: offloading: vprintk_emit()->trylock FAIL  will spin? :0
     kworker/0:0-2990  [000] d..1   792.620323: offloading: vprintk_emit()->trylock FAIL  will spin? :0
     kworker/0:0-2990  [000] d..1   792.620444: offloading: vprintk_emit()->trylock FAIL  will spin? :0
     kworker/0:0-2990  [000] d..1   792.620927: offloading: vprintk_emit()->trylock FAIL  will spin? :0
     kworker/0:0-2990  [000] d..1   792.620958: offloading: vprintk_emit()->trylock FAIL  will spin? :0
    i_do_printks-2997  [003] d..1   792.629275: offloading: clear console_owner  waiter != NULL :1
    i_do_printks-2997  [003] d..1   792.629280: offloading: hand off  :0
       usertaskA-1167  [001] d..1   792.629281: offloading: new waiter acquired ownership:0
    i_do_printks-2997  [003] d..1   792.630639: offloading: vprintk_emit()->trylock FAIL  will spin? :0
    i_do_printks-2997  [003] d..1   792.630663: offloading: vprintk_emit()->trylock FAIL  will spin? :0
   ...

	    << CUT. printks, printks, printks. boring stuff. just many
	    printks. usertaskA-1167 is preempted, holding the console_sem,
	    so i_do_printks-2997 just log_store >>

   ...
    i_do_printks-2997  [003] d..1   792.645663: offloading: vprintk_emit()->trylock FAIL  will spin? :0
    i_do_printks-2997  [003] d..1   792.645670: offloading: vprintk_emit()->trylock FAIL  will spin? :0
       usertaskA-1167  [001] d..1   792.654176: offloading: set console_owner  :0
   systemd-udevd-672   [003] d..2   792.660132: offloading: vprintk_emit()->trylock FAIL  will spin? :1
       usertaskA-1167  [001] d..1   792.661759: offloading: clear console_owner  waiter != NULL :1
       usertaskA-1167  [001] d..1   792.661761: offloading: hand off  :0
   systemd-udevd-672   [003] d..2   792.661761: offloading: new waiter acquired ownership:0
   systemd-udevd-672   [003] d..2   792.661772: offloading: set console_owner  :0
   systemd-udevd-672   [003] d..2   792.675610: offloading: clear console_owner  waiter != NULL :0
   systemd-udevd-672   [003] d..2   792.675628: offloading: set console_owner  :0
   systemd-udevd-672   [003] d..2   792.689281: offloading: clear console_owner  waiter != NULL :0
   ...

	   << CUT. systemd now actually prints the messages that were added by
	   i_do_printks-2997. usertaskA-1167 was preempted all that time and
	   printed just ONE message before it hand printing duty over to
	   systemd-udevd which had to print lots of pending logbuf messages. >>

	   << then another user-space process was lucky enough to set itself
	   as console_waiter, it got console_sem ownership from systemd-udevd-672
	   and spent in console_unlock() enough time to got killed by user
	   watchdog >>


can do more tests a bit later.
I really see lots of problems even on trivial tests [i_do_printks-2997
is preemptible].


another test. which proves preemption and rescheduling. we start with a
ping-pong between i_do_printks-3616 and usertaskA-370. but soon usertaskA-370
acquires the lock and gets preempted.


 i_do_printks-3616  [001] d..1  1923.711382: offloading: set console_owner  :0
 i_do_printks-3616  [001] d..1  1923.724524: offloading: clear console_owner  waiter != NULL :0
 i_do_printks-3616  [001] d..1  1923.724685: offloading: set console_owner  :0
    usertaskA-370   [000] d..1  1923.734280: offloading: vprintk_emit()->trylock FAIL  will spin? :1
 i_do_printks-3616  [001] d..1  1923.737847: offloading: clear console_owner  waiter != NULL :1
    usertaskA-370   [000] d..1  1923.737850: offloading: new waiter acquired ownership:0
 i_do_printks-3616  [001] d..1  1923.737850: offloading: hand off  :0
    usertaskA-370   [000] d..1  1923.737862: offloading: set console_owner  :0
 i_do_printks-3616  [001] d..1  1923.737934: offloading: vprintk_emit()->trylock FAIL  will spin? :1
    usertaskA-370   [000] d..1  1923.751870: offloading: clear console_owner  waiter != NULL :1
    usertaskA-370   [000] d..1  1923.751875: offloading: hand off  :0
 i_do_printks-3616  [001] d..1  1923.751875: offloading: new waiter acquired ownership:0
 i_do_printks-3616  [001] d..1  1923.751938: offloading: set console_owner  :0
    usertaskA-370   [003] d..1  1923.762570: offloading: vprintk_emit()->trylock FAIL  will spin? :1
 i_do_printks-3616  [001] d..1  1923.765193: offloading: clear console_owner  waiter != NULL :1
 i_do_printks-3616  [001] d..1  1923.765196: offloading: hand off  :0
    usertaskA-370   [003] d..1  1923.765196: offloading: new waiter acquired ownership:0
 i_do_printks-3616  [002] d..1  1923.766057: offloading: vprintk_emit()->trylock FAIL  will spin? :0
 i_do_printks-3616  [002] d..1  1923.766073: offloading: vprintk_emit()->trylock FAIL  will spin? :0
 ...

	   << CUT. printks, printks, printks. usertaskA-370 is preempted, i_do_printks-3616
	   just does log_store(). Notice that the new console_sem owner acquired the ownership
	   on CPU3, while it started to print pending logbuf messages later, being first scheduled
	   on CPU2 and then rescheduled on CPU1. >>

 ...
 i_do_printks-3616  [002] d..1  1923.767162: offloading: vprintk_emit()->trylock FAIL  will spin? :0
 i_do_printks-3616  [002] d..1  1923.767166: offloading: vprintk_emit()->trylock FAIL  will spin? :0
    usertaskA-370   [002] d..1  1923.769225: offloading: set console_owner  :0
    usertaskA-370   [002] d..1  1923.779213: offloading: clear console_owner  waiter != NULL :0
    usertaskA-370   [001] d..1  1923.782622: offloading: set console_owner  :0
    usertaskA-370   [001] d..1  1923.793238: offloading: clear console_owner  waiter != NULL :0
    usertaskA-370   [001] d..1  1923.794991: offloading: set console_owner  :0
    usertaskA-370   [001] d..1  1923.807259: offloading: clear console_owner  waiter != NULL :0
    usertaskA-370   [001] d..1  1923.807403: offloading: set console_owner  :0
    usertaskA-370   [001] d..1  1923.821280: offloading: clear console_owner  waiter != NULL :0
 ...

	  << usertaskA-370 prints all the pending messages. either until logbuf
	     becomes empty, or another task calls printk [possible from atomic]. >>


well, clearly, nothing prevents usertaskA-370 from sleeping really long
on a busy system. so the logbuf can contain a lot of messages to print
out.

and so on. I guess you got the picture by now.



now,
It is absolutely possible that we can have preempted console_sem owner,
   (somewhere in vprintk_emit right after acquiring the console_sem ownership,
    or somewhere in console_unlock(). before the printing loop, or after
    re-enabling local IRQs withing the printing loop);
while other CPU adds messages to the logbuf (printk()-s) and then what
we need it just printk() from IRQ or atomic context to take over printing
from the current console_sem owner and to flush all the pending messages
from atomic context, heading towards the lockup.


2) an unexpected test...

   the scenario which I mentioned last week.

	a) spin lock lockup on several CPUs (not on all CPUs)
	b) simultaneous dump_stack() from locked up CPUs
	c) trigger_all_cpu_backtrace()
	d) immediate WANR_ON from networking irq on one of the lockedup
	   CPU (cpus eventually were able to acquire-release the spin_lock
	        and enable local IRQs).

   last week the board had immediate printk offloading enabled when the
   lockup has happened, so all those printk-s from irqs/etc were as fast
   as
   	lock(log_buf); sprintf(); memcpy(); unlock(log_buf);

        there was printk_kthread to print it.

   the issue all of a sudden has reproduced today. twice. this time
   I had immediate printk disabled and console_owner hand off enabled.
   unfortunately, the board didn't survive the lockup this time. _it looks_
   like printk offloading was really crucial there.



> Why did you completely ignored the paragraph about step by step
> approach? Is there anything wrong about it?

frankly, I don't even understand what our plan is. I don't see how are
we going to verify the patch. over the last 3 years, how many emails
you have from facebook or samsung or linaro, or any other company
reporting the printk-related lockups? I don't have that many in my gmail
inbox, to be honest. and this is not because there were no printk lockups
observed. this is because people usually don't report those issues to
the upstream community. especially vendors that use "outdated" LTS
kernels, which are approx 1-2 year(s) behind the mainline. and I don't see
why it should be different this time. it will take years before vendors
pick the next LTS kernel, which will have that patch in it. but the really
big problem here is that we already know that the patch has some problems.
are we going to conclude that "no emails === no problems"? with all my
respect, it does seem like, in the grand scheme of things, we are going
to do the same thing, yet expecting different result. that's my worry.

may be I'm wrong.



> You are looking for a perfect solution.

gentlemen, once again, you really can name it whatever you like.
the bottom line is [speaking for myself only]:

-  the patch does not fit my needs.
-  it does not address the issues I'm dealing with.
-  it has a significantly worse behaviour compared to old async printk.
-  it keeps sleeping on console_sem tasks in TASK_UNINTERRUPTIBLE
      for a long time.
-  it timeouts user space apps.
-  it re-introduces the lockup vector, by passing console_sem ownership
      to atomic tasks.
-  it doubles the amount of time CPU spins with local IRQs disabled in
      the worst case.
-  I probably need to run more tests [I didn't execute any OOM tests, etc.],
      but, preliminary, I can't see Samsung using this patch.

sorry to tell you that.
maybe it's good enough for the rest of the world, but...

	-ss

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread
  2017-12-18  9:36                     ` Sergey Senozhatsky
@ 2017-12-18 10:36                       ` Sergey Senozhatsky
  2017-12-18 12:35                         ` Sergey Senozhatsky
  2017-12-18 13:51                         ` Petr Mladek
  2017-12-18 13:31                       ` Petr Mladek
  1 sibling, 2 replies; 79+ messages in thread
From: Sergey Senozhatsky @ 2017-12-18 10:36 UTC (permalink / raw)
  To: Petr Mladek, Steven Rostedt
  Cc: Tejun Heo, Sergey Senozhatsky, Sergey Senozhatsky, Jan Kara,
	Andrew Morton, Peter Zijlstra, Rafael Wysocki, Pavel Machek,
	Tetsuo Handa, linux-kernel

On (12/18/17 18:36), Sergey Senozhatsky wrote:
[..]
>    I see *a lot* of cases when CPU that call printk in a loop does not
>    end up flushing its messages. And the problem seems to be - preemption.
> 
> 
>   CPU0						CPU1
> 
>   for_each_process_thread(g, p)
>     printk()
>     console_unlock()				printk
>     						 console_trylock() fails
>     sets console_owner
> 						 sees console_owner
> 						 sets console_waiter
>     call_console_drivers
>     clears console_owner
>     sees console_waiter
>     hand off					 spins with local_irq disabled
> 						 sees that it has acquired console_sem ownership
> 
> 						 enables local_irq
>     printk
>     ..						 << preemption >>
>     printk
>     ...         unbound number of printk-s
>     printk
>     ..
>     printk
> 						back to TASK_RUNNING
> 						goes to console_unlock()
>     printk
> 						local_irq_save
> 
>     ???
> 						*** if it will see console_waiter [being in any
> 						context] it will hand off. otherwise, preemption
> 						again and CPU0 can add even more messages to logbuf
> 
> 						local_irq_restore
> 
> 						<< preemption >>


hm... adding preemption_disable()/preemption_enable() to vprintk_emit()
does not fix the issues on my side.

	preemption_disable();
	...
	if (console_trylock()) {
		console_unlock();
	} else {
		
		// console_owner check and loop if needed
		// console_unlock();
	}
	preemption_enable();


the root cause seems to be the fact that log_store() is significantly
faster than msg_print_text() + call_console_drivers().


looking at this

   systemd-udevd-671   [002] dn.3    66.736432: offloading: set console_owner  :0
   systemd-udevd-671   [002] dn.3    66.749927: offloading: clear console_owner  waiter != NULL :0
   systemd-udevd-671   [002] dn.3    66.749931: offloading: set console_owner  :0
   systemd-udevd-671   [002] dn.3    66.763426: offloading: clear console_owner  waiter != NULL :0
   systemd-udevd-671   [002] dn.3    66.763430: offloading: set console_owner  :0
   systemd-udevd-671   [002] dn.3    66.776925: offloading: clear console_owner  waiter != NULL :0

which is this thing

                len += msg_print_text(msg, console_prev, false,
                                      text + len, sizeof(text) - len);
                console_idx = log_next(console_idx);
                console_seq++;
                console_prev = msg->flags;
                raw_spin_unlock(&logbuf_lock);

                /*
                 * While actively printing out messages, if another printk()
                 * were to occur on another CPU, it may wait for this one to
                 * finish. This task can not be preempted if there is a
                 * waiter waiting to take over.
                 */
                raw_spin_lock(&console_owner_lock);
                console_owner = current;
                raw_spin_unlock(&console_owner_lock);

+                trace_offloading("set console_owner", " ", 0);

                /* The waiter may spin on us after setting console_owner */
                spin_acquire(&console_owner_dep_map, 0, 0, _THIS_IP_);

                stop_critical_timings();        /* don't trace print latency */
                call_console_drivers(level, text, len);
                start_critical_timings();

                raw_spin_lock(&console_owner_lock);
                waiter = READ_ONCE(console_waiter);
                console_owner = NULL;
                raw_spin_unlock(&console_owner_lock);

+                trace_offloading("clear console_owner",
+                                " waiter != NULL ", !!waiter);


it takes call_console_drivers() 0.01+ of a second to print some of
the messages [I think we can ignore raw_spin_lock(&console_owner_lock)
and fully blame call_console_drivers()]. so vprintk_emit() seems to be
gazillion times faster and i_do_printks can add tons of messages while
some other process prints just one.


to be more precise, I see from the traces that i_do_printks can add 1100
messages to the logbuf while call_console_drivers() prints just one.


systemd-udevd-671 owns the lock. sets the console_owner. i_do_printks
keeps adding printks. there kworker/0:1-135 that just ahead of
i_do_printks-1992 and registers itself as the console_sem waiter.

   systemd-udevd-671   [003] d..3    66.334866: offloading: set console_owner  :0
     kworker/0:1-135   [000] d..2    66.335999: offloading: vprintk_emit()->trylock FAIL  will spin? :1
    i_do_printks-1992  [002] d..2    66.345474: offloading: vprintk_emit()->trylock FAIL  will spin? :0    x 1100
   ...
   systemd-udevd-671   [003] d..3    66.345917: offloading: cleaar console_owner  waiter != NULL :1


i_do_printks-1992 finishes printing [it does limited number of printks],
it does not compete for console_sem anymore, so those are other tasks
that have to flush pending messages stored by i_do_printks-1992           :(

	-ss

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread
  2017-12-18 10:36                       ` Sergey Senozhatsky
@ 2017-12-18 12:35                         ` Sergey Senozhatsky
  2017-12-18 13:51                         ` Petr Mladek
  1 sibling, 0 replies; 79+ messages in thread
From: Sergey Senozhatsky @ 2017-12-18 12:35 UTC (permalink / raw)
  To: Petr Mladek, Steven Rostedt
  Cc: Tejun Heo, Sergey Senozhatsky, Sergey Senozhatsky, Jan Kara,
	Andrew Morton, Peter Zijlstra, Rafael Wysocki, Pavel Machek,
	Tetsuo Handa, linux-kernel

On (12/18/17 19:36), Sergey Senozhatsky wrote:
[..]
> it takes call_console_drivers() 0.01+ of a second to print some of
> the messages [I think we can ignore raw_spin_lock(&console_owner_lock)
> and fully blame call_console_drivers()]. so vprintk_emit() seems to be
> gazillion times faster and i_do_printks can add tons of messages while
> some other process prints just one.
> 
> 
> to be more precise, I see from the traces that i_do_printks can add 1100
> messages to the logbuf while call_console_drivers() prints just one.
> 
> 
> systemd-udevd-671 owns the lock. sets the console_owner. i_do_printks
> keeps adding printks. there kworker/0:1-135 that just ahead of
> i_do_printks-1992 and registers itself as the console_sem waiter.
> 
>    systemd-udevd-671   [003] d..3    66.334866: offloading: set console_owner  :0
>      kworker/0:1-135   [000] d..2    66.335999: offloading: vprintk_emit()->trylock FAIL  will spin? :1
>     i_do_printks-1992  [002] d..2    66.345474: offloading: vprintk_emit()->trylock FAIL  will spin? :0    x 1100
>    ...
>    systemd-udevd-671   [003] d..3    66.345917: offloading: cleaar console_owner  waiter != NULL :1
> 
> 
> i_do_printks-1992 finishes printing [it does limited number of printks],
> it does not compete for console_sem anymore, so those are other tasks
> that have to flush pending messages stored by i_do_printks-1992           :(

even in this case the task that took over printing had to flush logbuf
messages worth of 1100 x 0.01s == 10+ seconds of printing. which is
enough to cause problem. if there are 2200 messages in logbuf, then
there will be 2200 x 0.01 == 20+ seconds of printing. if the task is
atomic, then you probably can imagine what will happen. numbers don't lie.
if we have enough tasks competing for console_sem then the tasks that
actually fills up the logbuf buffer may never call_console_drivers().
so the lockups are still very much possible. *in my particular case*

	-ss

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread
  2017-12-18  9:36                     ` Sergey Senozhatsky
  2017-12-18 10:36                       ` Sergey Senozhatsky
@ 2017-12-18 13:31                       ` Petr Mladek
  2017-12-18 13:39                         ` Sergey Senozhatsky
  2017-12-18 14:10                         ` Petr Mladek
  1 sibling, 2 replies; 79+ messages in thread
From: Petr Mladek @ 2017-12-18 13:31 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Steven Rostedt, Tejun Heo, Sergey Senozhatsky, Jan Kara,
	Andrew Morton, Peter Zijlstra, Rafael Wysocki, Pavel Machek,
	Tetsuo Handa, linux-kernel

On Mon 2017-12-18 18:36:15, Sergey Senozhatsky wrote:
> On (12/15/17 10:08), Petr Mladek wrote:
> 1) it opens both soft and hard lockup vectors
> 
>    I see *a lot* of cases when CPU that call printk in a loop does not
>    end up flushing its messages. And the problem seems to be - preemption.
> 
> 
>   CPU0						CPU1
> 
>   for_each_process_thread(g, p)
>     printk()

You print one message for each process in a tight loop.
Is there a code like this?

I think that more realistic is to print a message for
each CPU. But even in this case, the messages are usually
written by each CPU separately and thus less synchronized.

One exception might be the backtraces of all CPUs. These
are printed in NMI and flushed synchronously from a
single CPU. But they were recently optimized by
not printink idle threads.

> > Why did you completely ignored the paragraph about step by step
> > approach? Is there anything wrong about it?
> 
> frankly, I don't even understand what our plan is. I don't see how are
> we going to verify the patch. over the last 3 years, how many emails
> you have from facebook or samsung or linaro, or any other company
> reporting the printk-related lockups? I don't have that many in my gmail
> inbox, to be honest. and this is not because there were no printk lockups
> observed. this is because people usually don't report those issues to
> the upstream community. especially vendors that use "outdated" LTS
> kernels, which are approx 1-2 year(s) behind the mainline. and I don't see
> why it should be different this time. it will take years before vendors
> pick the next LTS kernel, which will have that patch in it. but the really
> big problem here is that we already know that the patch has some problems.
> are we going to conclude that "no emails === no problems"? with all my
> respect, it does seem like, in the grand scheme of things, we are going
> to do the same thing, yet expecting different result. that's my worry.

The patches for offloading printk console work are floating around
about 5 years and nothing went upstream. Therefore 2 more years look
acceptable if we actually could make thigs better a bit now. Well, I
believe that we would know the usefulness earlier than this.

> > You are looking for a perfect solution.
> 
> gentlemen, once again, you really can name it whatever you like.
> the bottom line is [speaking for myself only]:
> 
> -  the patch does not fit my needs.
> -  it does not address the issues I'm dealing with.

I am still missing some real life reproducer of the problems.

> -  it has a significantly worse behaviour compared to old async printk.
> -  it keeps sleeping on console_sem tasks in TASK_UNINTERRUPTIBLE
>       for a long time.
> -  it timeouts user space apps.
> -  it re-introduces the lockup vector, by passing console_sem ownership
>       to atomic tasks.

All this is in the current upstream code as well. Steven's patch
should make it better in compare with the current upstream code.

Sure, the printk offload approach does all these things better.
But there is still the fear that the offload is not reliable
in all situations. The lazy offload should handle this well from
my POV but I am not 100% sure.

If we have hard data (real life reproducers) in hand, we could
start pushing the offloading again.

> -  it doubles the amount of time CPU spins with local IRQs disabled in
>       the worst case.

It happens only during the hand shake. And the other CPU waits only
until a single line is flushed. It is much less that a single CPU
might spend flushing lines.

> -  I probably need to run more tests [I didn't execute any OOM tests, etc.],
>       but, preliminary, I can't see Samsung using this patch.

Does Samsung already use some offload implementation?

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread
  2017-12-18 13:31                       ` Petr Mladek
@ 2017-12-18 13:39                         ` Sergey Senozhatsky
  2017-12-18 14:13                           ` Petr Mladek
  2017-12-18 14:10                         ` Petr Mladek
  1 sibling, 1 reply; 79+ messages in thread
From: Sergey Senozhatsky @ 2017-12-18 13:39 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Tejun Heo,
	Sergey Senozhatsky, Jan Kara, Andrew Morton, Peter Zijlstra,
	Rafael Wysocki, Pavel Machek, Tetsuo Handa, linux-kernel

On (12/18/17 14:31), Petr Mladek wrote:
> On Mon 2017-12-18 18:36:15, Sergey Senozhatsky wrote:
> > On (12/15/17 10:08), Petr Mladek wrote:
> > 1) it opens both soft and hard lockup vectors
> > 
> >    I see *a lot* of cases when CPU that call printk in a loop does not
> >    end up flushing its messages. And the problem seems to be - preemption.
> > 
> > 
> >   CPU0						CPU1
> > 
> >   for_each_process_thread(g, p)
> >     printk()
> 
> You print one message for each process in a tight loop.
> Is there a code like this?

um... show_state() -> show_state_filter()?
which prints million times more info than a single line per-PID.

	-ss

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread
  2017-12-18 10:36                       ` Sergey Senozhatsky
  2017-12-18 12:35                         ` Sergey Senozhatsky
@ 2017-12-18 13:51                         ` Petr Mladek
  1 sibling, 0 replies; 79+ messages in thread
From: Petr Mladek @ 2017-12-18 13:51 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Steven Rostedt, Tejun Heo, Sergey Senozhatsky, Jan Kara,
	Andrew Morton, Peter Zijlstra, Rafael Wysocki, Pavel Machek,
	Tetsuo Handa, linux-kernel

On Mon 2017-12-18 19:36:24, Sergey Senozhatsky wrote:
> it takes call_console_drivers() 0.01+ of a second to print some of
> the messages [I think we can ignore raw_spin_lock(&console_owner_lock)
> and fully blame call_console_drivers()]. so vprintk_emit() seems to be
> gazillion times faster and i_do_printks can add tons of messages while
> some other process prints just one.
> 
> to be more precise, I see from the traces that i_do_printks can add 1100
> messages to the logbuf while call_console_drivers() prints just one.

This sounds interesting.

A solution would be to add some "simple" throttling. We could add
a per-CPU or per-process counter that would count the number
of lines added while other CPU is processing one line.

The counter would be incremented only when the CPU is actively
printing.

One question is how to clear the counter. One possibility
would be to limit it to one scheduling period or so.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread
  2017-12-18 13:31                       ` Petr Mladek
  2017-12-18 13:39                         ` Sergey Senozhatsky
@ 2017-12-18 14:10                         ` Petr Mladek
  2017-12-19  1:09                           ` Sergey Senozhatsky
  1 sibling, 1 reply; 79+ messages in thread
From: Petr Mladek @ 2017-12-18 14:10 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Steven Rostedt, Tejun Heo, Sergey Senozhatsky, Jan Kara,
	Andrew Morton, Peter Zijlstra, Rafael Wysocki, Pavel Machek,
	Tetsuo Handa, linux-kernel

On Mon 2017-12-18 14:31:01, Petr Mladek wrote:
> On Mon 2017-12-18 18:36:15, Sergey Senozhatsky wrote:
> > -  it has a significantly worse behaviour compared to old async printk.
> > -  it keeps sleeping on console_sem tasks in TASK_UNINTERRUPTIBLE
> >       for a long time.
> > -  it timeouts user space apps.
> > -  it re-introduces the lockup vector, by passing console_sem ownership
> >       to atomic tasks.
> 
> All this is in the current upstream code as well. Steven's patch
> should make it better in compare with the current upstream code.
> 
> Sure, the printk offload approach does all these things better.
> But there is still the fear that the offload is not reliable
> in all situations. The lazy offload should handle this well from
> my POV but I am not 100% sure.

BTW: There is one interesting thing. If we rely on the kthread
to handle flood of messages. It might be too slow because it
reschedules. It might cause loosing messages. Note that
the kthread should have rather normal priority to avoid
blocking other processes.

>From this point of view. We get the messages out much faster
in an atomic context. The question is if we want to sacrifice
the atomic context of a random process for this.

Just an idea. The handshake + throttling big sinners might
help to balance the load over the biggest sinners (printk
users).

The nice think about the hand-shake + throttling  based approach
is that it might be built step by step. Each step would just
make things better. While the offloading is rather a big
revolution. We already have many extra patches to avoid
regressions in the reliability.

One thing is that we play with offloading for years. The handshake
might have problems that we just do not know about at the moment.
But it definitely has some interesting aspects.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread
  2017-12-18 13:39                         ` Sergey Senozhatsky
@ 2017-12-18 14:13                           ` Petr Mladek
  2017-12-18 17:46                             ` Steven Rostedt
  0 siblings, 1 reply; 79+ messages in thread
From: Petr Mladek @ 2017-12-18 14:13 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Steven Rostedt, Tejun Heo, Sergey Senozhatsky, Jan Kara,
	Andrew Morton, Peter Zijlstra, Rafael Wysocki, Pavel Machek,
	Tetsuo Handa, linux-kernel

On Mon 2017-12-18 22:39:48, Sergey Senozhatsky wrote:
> On (12/18/17 14:31), Petr Mladek wrote:
> > On Mon 2017-12-18 18:36:15, Sergey Senozhatsky wrote:
> > > On (12/15/17 10:08), Petr Mladek wrote:
> > > 1) it opens both soft and hard lockup vectors
> > > 
> > >    I see *a lot* of cases when CPU that call printk in a loop does not
> > >    end up flushing its messages. And the problem seems to be - preemption.
> > > 
> > > 
> > >   CPU0						CPU1
> > > 
> > >   for_each_process_thread(g, p)
> > >     printk()
> > 
> > You print one message for each process in a tight loop.
> > Is there a code like this?
> 
> um... show_state() -> show_state_filter()?
> which prints million times more info than a single line per-PID.

Good example. Heh, it already somehow deals with this:

void show_state_filter(unsigned long state_filter)
{
	struct task_struct *g, *p;

	rcu_read_lock();
	for_each_process_thread(g, p) {
		/*
		 * reset the NMI-timeout, listing all files on a slow
		 * console might take a lot of time:
		 * Also, reset softlockup watchdogs on all CPUs, because
		 * another CPU might be blocked waiting for us to process
		 * an IPI.
		 */
		touch_nmi_watchdog();
		touch_all_softlockup_watchdogs();
		if (state_filter_match(state_filter, p))
			sched_show_task(p);


One question is if we really want to rely on offloading in
this case. What if this is printed to debug some stalled
system.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread
  2017-12-18 14:13                           ` Petr Mladek
@ 2017-12-18 17:46                             ` Steven Rostedt
  2017-12-19  1:03                               ` Sergey Senozhatsky
  2017-12-19  4:36                               ` Sergey Senozhatsky
  0 siblings, 2 replies; 79+ messages in thread
From: Steven Rostedt @ 2017-12-18 17:46 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Tejun Heo, Sergey Senozhatsky, Jan Kara,
	Andrew Morton, Peter Zijlstra, Rafael Wysocki, Pavel Machek,
	Tetsuo Handa, linux-kernel

On Mon, 18 Dec 2017 15:13:53 +0100
Petr Mladek <pmladek@suse.com> wrote:

> One question is if we really want to rely on offloading in
> this case. What if this is printed to debug some stalled
> system.

Correct, and this is what I call when debugging hard lockups, and I do
it from NMI. Which the new NMI code prevents all the data I want to
print to come out to console.

I had to create a really huge buffer to print it.

show_state_filter() is not a normal printk() call. It is used for
debugging. Not a very good example of issues that happen on production
systems. If anything, this should be disabled on a production system.

Let's just add my patch (I'll respin it if it needs it), and send it
off into the wild. Let's see if there's still reports of issues, and
then come back to solutions. Because, really, I'm still not convinced
that there's anything out there that needs much more "fixing" of
printk().

-- Steve

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread
  2017-12-15 15:19               ` Steven Rostedt
@ 2017-12-19  0:52                 ` Sergey Senozhatsky
  2017-12-19  1:03                   ` Steven Rostedt
  0 siblings, 1 reply; 79+ messages in thread
From: Sergey Senozhatsky @ 2017-12-19  0:52 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Sergey Senozhatsky, Tejun Heo, Sergey Senozhatsky, Petr Mladek,
	Jan Kara, Andrew Morton, Peter Zijlstra, Rafael Wysocki,
	Pavel Machek, Tetsuo Handa, linux-kernel

Hello Steven,

I couldn't reply sooner.


On (12/15/17 10:19), Steven Rostedt wrote:
> > On (12/14/17 22:18), Steven Rostedt wrote:
> > > > Steven, your approach works ONLY when we have the following preconditions:
> > > > 
> > > >  a) there is a CPU that is calling printk() from the 'safe' (non-atomic,
> > > >     etc) context
> > > > 
> > > >         what does guarantee that? what happens if there is NO non-atomic
> > > >         CPU or that non-atomic simplky missses the console_owner != false
> > > >         point? we are going to conclude
> > > > 
> > > >         "if printk() doesn't work for you, it's because you are holding it wrong"?
> > > > 
> > > > 
> > > >         what if that non-atomic CPU does not call printk(), but instead
> > > >         it does console_lock()/console_unlock()? why there is no handoff?
> 
> The case here, you are talking about a CPU doing console_lock() from a
> non printk() case. Which is what I was asking about how often this
> happens.

I'd say often enough. but the point I was trying to make is that we can
have non-atomic CPUs which can do the print out, instead of "sharing the
load" between atomic CPUs.

> As for why there's no handoff. Does the non printk()
> console_lock/unlock ever happen from a critical location? I don't think
> it does (but I haven't checked). Then it is the perfect candidate to do
> all the printing.

that's right. that is the point I was trying making. we can have better
candidates to do all the printing.

[..]
> > deep-stack spin_lock_irqsave() lockup reports from multiple CPUs (3 cpus)
> > happening at the same moment + NMI backtraces from all the CPUs (more
> > than 3 cpus) that follows the lockups, over not-so-fast serial console.
> > exactly the bug report I received two days ago. so which one of the CPUs
> > here is a good candidate to successfully emit all of the pending logbuf
> > entries? none. all of them either have local IRQs disabled, or dump_stack()
> > from either backtrace IPI or backtrace NMI (depending on the configuration).
> > 
> 
> Is the above showing an issue of console_lock happening in the non
> printk() case?
>
> > do we periodically do console_lock() on a running system? yes, we do.
> > add to console_unlock()
> 
> Right, and the non printk() console_lock() should be fine to do all
> printing when it unlocks.

that's right.

> > this argument is really may be applied against your patch as well. I
> > really don't want us to have this type of "technical" discussion.
> 
> Sure, but my patch fixes the unfair approach that printk currently does.

I did tests yesterday, traces are available. I can't conclude that
the patch fixes the unfairness of printk().


> > printk() is a tool for developers. but developers can't use.
> > 
> > 
> > > >  c) the task that is looping in console_unlock() sees non-atomic CPU when
> > > >     console_owner is set.  
> > > 
> > > I haven't looked at the latest code, but my last patch didn't care
> > > about "atomic" and "non-atomic"  
> > 
> > I know. and I think it is sort of a problem.
> 
> Please show me the case that it is. And don't explain where it is.
> Please apply the patch and have the problem occur and show it to me.
> That's all that I'm asking for.

I did some tests yesterday. I posted analysis and traces.

[..]
> No, because it is unrealistic. For example:

right.

> +static void test_noirq_console_unlock(void)
> +{
> +       unsigned long flags;
> +       unsigned long num_messages = 0;
> +
> +       pr_err("=== TEST %s\n", __func__);
> +
> +       num_messages = 0;
> +       console_lock();
> +       while (num_messages++ < max_num_messages)
> +               pr_info("=== %s Append message %lu out of %lu\n",
> +                               __func__,
> +                               num_messages,
> +                               max_num_messages);
> +
> +       local_irq_save(flags);
> +       console_unlock();
> 
> Where in the kernel do we do this?

the funny thing is that we _are going to start doing this_ with
the console_owner hand off enabled.

consider the following case

we have console_lock() from non-atomic context. console_sem owner is
getting preempted, under console_sem. which is totally possible and
happens a lot. in the mean time we have OOM, which can print a lot of
info. by the time console_sem returns back to TASK_RUNNING logbuf
contains some number of pending messages [lets say 10 seconds worth
of printing]. console owner goes to console_unlock(). accidentally
we have printk from IRQ on CPUz. console_owner hands over printing
duty to CPUz. so now we have to print 10 seconds worth of OOM messages
from irq.



CPU0                        CPU1 ~ CPUx                     CPUz

console_lock

 << preempted >>


   OOM                    OOM printouts, lots
                          of OOM traces, etc.

                          OOM end [progress done].

 << back to RUNNING >>

  console_unlock()
    
    for (;;)
      sets console_owner
      call_console_drivers()				  IRQ
                                                         printk
							  sees console_owner
							  sets console_waiter

      clears console_owner
      sees console_waier
      handoff
                                                          for (;;) {
							     call_console_drivers()
							     ??? lockup
							  }
							  up()

this is how we down() from non-atomic and up() from atomic [if we make
it to up(). we might end up in NMI panic]. this scenario is totally possible,
isn't it? the optimistic expectation here is that some other printk() from
non-atomic CPU will jump in and take over printing from atomic CPUz. but I
don't see why we are counting on it.

	-ss

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread
  2017-12-18 17:46                             ` Steven Rostedt
@ 2017-12-19  1:03                               ` Sergey Senozhatsky
  2017-12-19  1:08                                 ` Steven Rostedt
  2017-12-19  4:36                               ` Sergey Senozhatsky
  1 sibling, 1 reply; 79+ messages in thread
From: Sergey Senozhatsky @ 2017-12-19  1:03 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Petr Mladek, Sergey Senozhatsky, Tejun Heo, Sergey Senozhatsky,
	Jan Kara, Andrew Morton, Peter Zijlstra, Rafael Wysocki,
	Pavel Machek, Tetsuo Handa, linux-kernel

On (12/18/17 12:46), Steven Rostedt wrote:
> On Mon, 18 Dec 2017 15:13:53 +0100
> Petr Mladek <pmladek@suse.com> wrote:
> 
> > One question is if we really want to rely on offloading in
> > this case. What if this is printed to debug some stalled
> > system.
> 
> Correct, and this is what I call when debugging hard lockups, and I do
> it from NMI. Which the new NMI code prevents all the data I want to
> print to come out to console.
> 
> I had to create a really huge buffer to print it.
> 
> show_state_filter() is not a normal printk() call. It is used for
> debugging. Not a very good example of issues that happen on production
> systems. If anything, this should be disabled on a production system.
> 
> Let's just add my patch (I'll respin it if it needs it), and send it
> off into the wild. Let's see if there's still reports of issues, and
> then come back to solutions. Because, really, I'm still not convinced
> that there's anything out there that needs much more "fixing" of
> printk().

... do you guys read my emails? which part of the traces I have provided
suggests that there is any improvement?

	-ss

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread
  2017-12-19  0:52                 ` Sergey Senozhatsky
@ 2017-12-19  1:03                   ` Steven Rostedt
  0 siblings, 0 replies; 79+ messages in thread
From: Steven Rostedt @ 2017-12-19  1:03 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Tejun Heo, Sergey Senozhatsky, Petr Mladek, Jan Kara,
	Andrew Morton, Peter Zijlstra, Rafael Wysocki, Pavel Machek,
	Tetsuo Handa, linux-kernel

On Tue, 19 Dec 2017 09:52:48 +0900
Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> wrote:

> > The case here, you are talking about a CPU doing console_lock() from a
> > non printk() case. Which is what I was asking about how often this
> > happens.  
> 
> I'd say often enough. but the point I was trying to make is that we can
> have non-atomic CPUs which can do the print out, instead of "sharing the
> load" between atomic CPUs.

We don't even know if sharing between "atomic" and "non-atomic" is an
issue. Anything that does a printk() in an atomic location, is going to
have latency to begin with.

> 
> > As for why there's no handoff. Does the non printk()
> > console_lock/unlock ever happen from a critical location? I don't think
> > it does (but I haven't checked). Then it is the perfect candidate to do
> > all the printing.  
> 
> that's right. that is the point I was trying making. we can have better
> candidates to do all the printing.

Sure, but we don't even know if we have to. A problem scenario hasn't
come up that wasn't due to the current implementation (which my patch
changes).


> I did tests yesterday, traces are available. I can't conclude that
> the patch fixes the unfairness of printk().

It doesn't fix the "unfairness" it fixes the unboundedness of printk.
That is what has been triggering all the issues from before.



> consider the following case
> 
> we have console_lock() from non-atomic context. console_sem owner is
> getting preempted, under console_sem. which is totally possible and
> happens a lot. in the mean time we have OOM, which can print a lot of
> info. by the time console_sem returns back to TASK_RUNNING logbuf
> contains some number of pending messages [lets say 10 seconds worth
> of printing]. console owner goes to console_unlock(). accidentally
> we have printk from IRQ on CPUz. console_owner hands over printing
> duty to CPUz. so now we have to print 10 seconds worth of OOM messages
> from irq.

Yes that can happen. But printk's from irq context is not nice to have
either, and should only happen when things are going wrong to begin
with.

> 
> 
> 
> CPU0                        CPU1 ~ CPUx                     CPUz
> 
> console_lock
> 
>  << preempted >>
> 
> 
>    OOM                    OOM printouts, lots
>                           of OOM traces, etc.
> 
>                           OOM end [progress done].
> 
>  << back to RUNNING >>
> 
>   console_unlock()
>     
>     for (;;)
>       sets console_owner
>       call_console_drivers()				  IRQ
>                                                          printk
> 							  sees console_owner
> 							  sets console_waiter
> 
>       clears console_owner
>       sees console_waier
>       handoff
>                                                           for (;;) {
> 							     call_console_drivers()
> 							     ??? lockup
> 							  }
> 							  up()
> 
> this is how we down() from non-atomic and up() from atomic [if we make
> it to up(). we might end up in NMI panic]. this scenario is totally possible,

The printk buffer needs to be very big, and bad things have to happen
first. This is a theoretical scenario, and I'd like to see it happen in
the real world before we try to fix it. My patch should make printk
behave *MUCH BETTER* than it currently does.

If you are worried about NMI panics, then we could add a touch nmi
within the printk loop.


> isn't it? the optimistic expectation here is that some other printk() from
> non-atomic CPU will jump in and take over printing from atomic CPUz. but I
> don't see why we are counting on it.

I don't see why we even care. Placing a printk in an atomic context is
a problem to begin with, and should only happen if there's issues in
the system.

-- Steve

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread
  2017-12-19  1:03                               ` Sergey Senozhatsky
@ 2017-12-19  1:08                                 ` Steven Rostedt
  2017-12-19  1:24                                   ` Sergey Senozhatsky
  0 siblings, 1 reply; 79+ messages in thread
From: Steven Rostedt @ 2017-12-19  1:08 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Petr Mladek, Tejun Heo, Sergey Senozhatsky, Jan Kara,
	Andrew Morton, Peter Zijlstra, Rafael Wysocki, Pavel Machek,
	Tetsuo Handa, linux-kernel

On Tue, 19 Dec 2017 10:03:11 +0900
Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> wrote:


> ... do you guys read my emails? which part of the traces I have provided
> suggests that there is any improvement?

The traces I've seen from you were from non-realistic scenarios. But I
have hit issues with printk()s happening that cause one CPU to do all
the work, where my patch would fix that. Those are the scenarios I'm
talking about.

-- Steve

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread
  2017-12-18 14:10                         ` Petr Mladek
@ 2017-12-19  1:09                           ` Sergey Senozhatsky
  0 siblings, 0 replies; 79+ messages in thread
From: Sergey Senozhatsky @ 2017-12-19  1:09 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Tejun Heo,
	Sergey Senozhatsky, Jan Kara, Andrew Morton, Peter Zijlstra,
	Rafael Wysocki, Pavel Machek, Tetsuo Handa, linux-kernel

On (12/18/17 15:10), Petr Mladek wrote:
[..]
> > All this is in the current upstream code as well. Steven's patch
> > should make it better in compare with the current upstream code.
> > 
> > Sure, the printk offload approach does all these things better.
> > But there is still the fear that the offload is not reliable
> > in all situations. The lazy offload should handle this well from
> > my POV but I am not 100% sure.
> 
> BTW: There is one interesting thing. If we rely on the kthread
> to handle flood of messages. It might be too slow because it
> reschedules. It might cause loosing messages. Note that
> the kthread should have rather normal priority to avoid
> blocking other processes.

... and this is why preemption is disabled in console_unlock()
in the patch set I have posted.

obviously you haven't even looked at the patches. wonderful.

	-ss

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread
  2017-12-19  1:08                                 ` Steven Rostedt
@ 2017-12-19  1:24                                   ` Sergey Senozhatsky
  2017-12-19  2:03                                     ` Steven Rostedt
  2017-12-19 14:31                                     ` Michal Hocko
  0 siblings, 2 replies; 79+ messages in thread
From: Sergey Senozhatsky @ 2017-12-19  1:24 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Sergey Senozhatsky, Petr Mladek, Tejun Heo, Sergey Senozhatsky,
	Jan Kara, Andrew Morton, Peter Zijlstra, Rafael Wysocki,
	Pavel Machek, Tetsuo Handa, linux-kernel

On (12/18/17 20:08), Steven Rostedt wrote:
> > ... do you guys read my emails? which part of the traces I have provided
> > suggests that there is any improvement?
> 
> The traces I've seen from you were from non-realistic scenarios.
> But I have hit issues with printk()s happening that cause one CPU to do all
> the work, where my patch would fix that. Those are the scenarios I'm
> talking about.

any hints about what makes your scenario more realistic than mine?
to begin with, what was the scenario?

[..]

> But I have hit issues with printk()s happening that cause one CPU to do all
> the work, where my patch would fix that. Those are the scenarios I'm
> talking about.

and this is exactly what I'm still observing. i_do_printks-1992 stops
printing, while console_sem is owned by another task. Since log_store()
much faster than call_console_drivers() AND console_sem owner is getting
preempted for unknown period of time, we end up having pending messages
in logbuf... and it's kworker/0:1-135 that prints them all.

   systemd-udevd-671   [003] d..3    66.334866: offloading: set console_owner
     kworker/0:1-135   [000] d..2    66.335999: offloading: vprintk_emit()->trylock FAIL  will spin? :1
    i_do_printks-1992  [002] d..2    66.345474: offloading: vprintk_emit()->trylock FAIL  will spin? :0    x 1100
   ...
   systemd-udevd-671   [003] d..3    66.345917: offloading: clear console_owner  waiter != NULL :1

	-ss

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread
  2017-12-19  1:24                                   ` Sergey Senozhatsky
@ 2017-12-19  2:03                                     ` Steven Rostedt
  2017-12-19  2:46                                       ` Sergey Senozhatsky
  2017-12-19 14:31                                     ` Michal Hocko
  1 sibling, 1 reply; 79+ messages in thread
From: Steven Rostedt @ 2017-12-19  2:03 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Petr Mladek, Tejun Heo, Sergey Senozhatsky, Jan Kara,
	Andrew Morton, Peter Zijlstra, Rafael Wysocki, Pavel Machek,
	Tetsuo Handa, linux-kernel

On Tue, 19 Dec 2017 10:24:55 +0900
Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> wrote:

> On (12/18/17 20:08), Steven Rostedt wrote:
> > > ... do you guys read my emails? which part of the traces I have provided
> > > suggests that there is any improvement?  
> > 
> > The traces I've seen from you were from non-realistic scenarios.
> > But I have hit issues with printk()s happening that cause one CPU to do all
> > the work, where my patch would fix that. Those are the scenarios I'm
> > talking about.  
> 
> any hints about what makes your scenario more realistic than mine?
> to begin with, what was the scenario?

It was a while ago when I hit it. I think it was an OOM issue. And it
wasn't contrived. It happened on a production system.

> 
> [..]
> 
> > But I have hit issues with printk()s happening that cause one CPU to do all
> > the work, where my patch would fix that. Those are the scenarios I'm
> > talking about.  
> 
> and this is exactly what I'm still observing. i_do_printks-1992 stops
> printing, while console_sem is owned by another task. Since log_store()
> much faster than call_console_drivers() AND console_sem owner is getting
> preempted for unknown period of time, we end up having pending messages
> in logbuf... and it's kworker/0:1-135 that prints them all.
> 
>    systemd-udevd-671   [003] d..3    66.334866: offloading: set console_owner
>      kworker/0:1-135   [000] d..2    66.335999: offloading: vprintk_emit()->trylock FAIL  will spin? :1
>     i_do_printks-1992  [002] d..2    66.345474: offloading: vprintk_emit()->trylock FAIL  will spin? :0    x 1100
>    ...
>    systemd-udevd-671   [003] d..3    66.345917: offloading: clear console_owner  waiter != NULL :1

And kworker will still be bounded in what it can print. Yes it may end
up being the entire buffer, but that should not take longer than a
watchdog.

If that proves to be an issue in the real world, then we could simply
wake up an offloaded thread, if the current owner does more than one
iteration (more than what it wrote). Then when the thread wakes up, it
simply does a printk, and it will take over by the waiter logic.

But that is only if it still appears to be an issue.

-- Steve

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread
  2017-12-19  2:03                                     ` Steven Rostedt
@ 2017-12-19  2:46                                       ` Sergey Senozhatsky
  2017-12-19  3:38                                         ` Steven Rostedt
  0 siblings, 1 reply; 79+ messages in thread
From: Sergey Senozhatsky @ 2017-12-19  2:46 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Sergey Senozhatsky, Petr Mladek, Tejun Heo, Sergey Senozhatsky,
	Jan Kara, Andrew Morton, Peter Zijlstra, Rafael Wysocki,
	Pavel Machek, Tetsuo Handa, linux-kernel

On (12/18/17 21:03), Steven Rostedt wrote:
> > and this is exactly what I'm still observing. i_do_printks-1992 stops
> > printing, while console_sem is owned by another task. Since log_store()
> > much faster than call_console_drivers() AND console_sem owner is getting
> > preempted for unknown period of time, we end up having pending messages
> > in logbuf... and it's kworker/0:1-135 that prints them all.
> > 
> >    systemd-udevd-671   [003] d..3    66.334866: offloading: set console_owner
> >      kworker/0:1-135   [000] d..2    66.335999: offloading: vprintk_emit()->trylock FAIL  will spin? :1
> >     i_do_printks-1992  [002] d..2    66.345474: offloading: vprintk_emit()->trylock FAIL  will spin? :0    x 1100
> >    ...
> >    systemd-udevd-671   [003] d..3    66.345917: offloading: clear console_owner  waiter != NULL :1
> 
> And kworker will still be bounded in what it can print. Yes it may end
> up being the entire buffer, but that should not take longer than a
> watchdog.

not the case on my setup. "1100 messages" is already longer than watchdog.
consoles don't scale. if anyone's console can keep up with 2 printing CPUs,
then let's see what logbuf size that person will set on a system with 1024
CPUs under OOM. I doubt that will be 128KB.

anyway,
before you guys push the patch to printk.git, can we wait for Tejun to
run his tests against it? (or do we have a preemptive "non realistic
tests" conclusion?)

	-ss

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread
  2017-12-19  2:46                                       ` Sergey Senozhatsky
@ 2017-12-19  3:38                                         ` Steven Rostedt
  2017-12-19  4:58                                           ` Sergey Senozhatsky
  0 siblings, 1 reply; 79+ messages in thread
From: Steven Rostedt @ 2017-12-19  3:38 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Petr Mladek, Tejun Heo, Sergey Senozhatsky, Jan Kara,
	Andrew Morton, Peter Zijlstra, Rafael Wysocki, Pavel Machek,
	Tetsuo Handa, linux-kernel

On Tue, 19 Dec 2017 11:46:10 +0900
Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> wrote:

> anyway,
> before you guys push the patch to printk.git, can we wait for Tejun to
> run his tests against it?

I've been asking for that since day one ;-)

-- Steve

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread
  2017-12-18 17:46                             ` Steven Rostedt
  2017-12-19  1:03                               ` Sergey Senozhatsky
@ 2017-12-19  4:36                               ` Sergey Senozhatsky
  1 sibling, 0 replies; 79+ messages in thread
From: Sergey Senozhatsky @ 2017-12-19  4:36 UTC (permalink / raw)
  To: Steven Rostedt, Petr Mladek
  Cc: Sergey Senozhatsky, Tejun Heo, Sergey Senozhatsky, Jan Kara,
	Andrew Morton, Peter Zijlstra, Rafael Wysocki, Pavel Machek,
	Tetsuo Handa, linux-kernel

On (12/18/17 12:46), Steven Rostedt wrote:
> > One question is if we really want to rely on offloading in
> > this case. What if this is printed to debug some stalled
> > system.
> 
> Correct, and this is what I call when debugging hard lockups, and I do
> it from NMI.
[..]
> show_state_filter() is not a normal printk() call. It is used for
> debugging.

just for the record. a side note.

you guys somehow made spectacularly off-target conclusions from the
traces I have provided and decided NOT to concentrate on demonstrated
behavioural patterns, but on, perhaps, process' names (I really should
have renamed i_do_printks to DONALD_TRUMP ;) ) and on how those printk
lines got into the logbuf. like if it mattered. [seriously, why?]. the
point was not in show_state_filter()... the point was - preemption and
things that hand off does. but somehow filling up logbuf when console_sem
owner is preempted is unrealistic if printks are coming from task A
under normal conditions; and it is a completely different story when
the same task A fills up logbuf from OOM while console_sem owner is
preempted. the end result is the same in both cases: it's not task A
that is going to flush logbuf. it's some other task that will have to
do it, possibly being in atomic context. anyway, anyway.

	-ss

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread
  2017-12-19  3:38                                         ` Steven Rostedt
@ 2017-12-19  4:58                                           ` Sergey Senozhatsky
  2017-12-19 14:40                                             ` Steven Rostedt
  0 siblings, 1 reply; 79+ messages in thread
From: Sergey Senozhatsky @ 2017-12-19  4:58 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Sergey Senozhatsky, Petr Mladek, Tejun Heo, Sergey Senozhatsky,
	Jan Kara, Andrew Morton, Peter Zijlstra, Rafael Wysocki,
	Pavel Machek, Tetsuo Handa, linux-kernel

On (12/18/17 22:38), Steven Rostedt wrote:
> On Tue, 19 Dec 2017 11:46:10 +0900
> Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> wrote:
> 
> > anyway,
> > before you guys push the patch to printk.git, can we wait for Tejun to
> > run his tests against it?
> 
> I've been asking for that since day one ;-)

ok

so you are not convinced that my scenarios real/matter; I'm not
convinced that I have stable and functional boards with this patch ;)
seems that we are coming to a dead end.

for the record,
I'm not going to block the patch if you want it to be merged.

	-ss

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread
  2017-12-19  1:24                                   ` Sergey Senozhatsky
  2017-12-19  2:03                                     ` Steven Rostedt
@ 2017-12-19 14:31                                     ` Michal Hocko
  2017-12-20  7:10                                       ` Sergey Senozhatsky
  1 sibling, 1 reply; 79+ messages in thread
From: Michal Hocko @ 2017-12-19 14:31 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Steven Rostedt, Petr Mladek, Tejun Heo, Sergey Senozhatsky,
	Jan Kara, Andrew Morton, Peter Zijlstra, Rafael Wysocki,
	Pavel Machek, Tetsuo Handa, linux-kernel

On Tue 19-12-17 10:24:55, Sergey Senozhatsky wrote:
> On (12/18/17 20:08), Steven Rostedt wrote:
> > > ... do you guys read my emails? which part of the traces I have provided
> > > suggests that there is any improvement?
> > 
> > The traces I've seen from you were from non-realistic scenarios.
> > But I have hit issues with printk()s happening that cause one CPU to do all
> > the work, where my patch would fix that. Those are the scenarios I'm
> > talking about.
> 
> any hints about what makes your scenario more realistic than mine?

Well, Tetsuo had some semi-realistic scenario where alloc stall messages
managed to stall other printk callers (namely oom handler). I am saying
sem-realistic because he is testing OOM throughput with an unrealistic
workload which itself is not very real on production systems. The
essential thing here is that many processes might be in the allocation
path and any printk there could just swamp some unrelated printk caller
and cause hard to debug problems. Steven's patch should address that
with a relatively simple lock handover. I was pessimistic this would
work sufficiently well but it survived Tetsuo's testing IIRC so it
sounds good enough to me.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread
  2017-12-19  4:58                                           ` Sergey Senozhatsky
@ 2017-12-19 14:40                                             ` Steven Rostedt
  2017-12-20  7:46                                               ` Sergey Senozhatsky
  0 siblings, 1 reply; 79+ messages in thread
From: Steven Rostedt @ 2017-12-19 14:40 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Petr Mladek, Tejun Heo, Sergey Senozhatsky, Jan Kara,
	Andrew Morton, Peter Zijlstra, Rafael Wysocki, Pavel Machek,
	Tetsuo Handa, linux-kernel

On Tue, 19 Dec 2017 13:58:46 +0900
Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> wrote:


> so you are not convinced that my scenarios real/matter; I'm not

Well, not with the test module. I'm looking for actual code in the
upstream kernel.

> convinced that I have stable and functional boards with this patch ;)
> seems that we are coming to a dead end.

Can I ask, is it any worse than what we have today?

> 
> for the record,
> I'm not going to block the patch if you want it to be merged.

Thanks,

-- Steve

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread
  2017-12-19 14:31                                     ` Michal Hocko
@ 2017-12-20  7:10                                       ` Sergey Senozhatsky
  2017-12-20 12:06                                         ` Tetsuo Handa
  0 siblings, 1 reply; 79+ messages in thread
From: Sergey Senozhatsky @ 2017-12-20  7:10 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Sergey Senozhatsky, Steven Rostedt, Petr Mladek, Tejun Heo,
	Sergey Senozhatsky, Jan Kara, Andrew Morton, Peter Zijlstra,
	Rafael Wysocki, Pavel Machek, Tetsuo Handa, linux-kernel

Hello,

not sure if you've been following the whole thread, so I'll try
to summarize it here. apologies if it'll massively repeat the things
that have already been said or will be too long.

On (12/19/17 15:31), Michal Hocko wrote:
> On Tue 19-12-17 10:24:55, Sergey Senozhatsky wrote:
> > On (12/18/17 20:08), Steven Rostedt wrote:
> > > > ... do you guys read my emails? which part of the traces I have provided
> > > > suggests that there is any improvement?
> > > 
> > > The traces I've seen from you were from non-realistic scenarios.
> > > But I have hit issues with printk()s happening that cause one CPU to do all
> > > the work, where my patch would fix that. Those are the scenarios I'm
> > > talking about.
> > 
> > any hints about what makes your scenario more realistic than mine?
> 
> Well, Tetsuo had some semi-realistic scenario where alloc stall messages
> managed to stall other printk callers (namely oom handler). I am saying
> sem-realistic because he is testing OOM throughput with an unrealistic
> workload which itself is not very real on production systems. The
> essential thing here is that many processes might be in the allocation
> path and any printk there could just swamp some unrelated printk caller
> and cause hard to debug problems. Steven's patch should address that
> with a relatively simple lock handover. I was pessimistic this would
> work sufficiently well but it survived Tetsuo's testing IIRC so it
> sounds good enough to me.

sure, no denial. Tetsuo indeed said that Steven's patch passed his
test. and for the note, the most recent printk_kthread patch set passed
Tetsuo's test as well; I asked him privately to run the tests before I
published it. but this is not the point. and to make it clear - this is
not a "Steven's patch" vs. "my patch set + Steven's patch atop of it"
type of thing. not at all.

IMPORTANT DISCLAIMER
   I SPEAK FOR MYSELF ONLY. ABOUT MY OBSERVATION ONLY. THIS IS AN
   ATTEMPT TO ANALYSE WHY THE PATCH DIDN'T WORK ON MY SETUP AND WHY
   MY SETUP NEEDS ANOTHER APPROACH. THIS IS NOT TO WARN ANYONE THAT
   THE PATCH WON'T WORK ON THEIR SETUPS. I MEAN NO OFFENSE AND AM
   NOT TRYING TO LOOK/SOUND SMART. AND, LIKE I SAID, IF STEVEN OR
   PETR WANT TO PUSH THE PATCH, I'M NOT GOING TO BLOCK IT.

so why Steven's patch has not succeeded on my boards?... partly because
of the fact that printk is "multidimensional" in its complexity and
Steven's patch just doesn't touch some of those problematic parts; partly
because the patch has the requirements which can't be 100% true on my
boards.

to begin with,
so the patch works only when the console_sem is contended. IOW, when
multiple sites perform printk-s concurrently frequent enough for the
hand off logic to detect it and to pass the control to another CPU.
but this turned out to be a bit hard to count on. why? several problems.

(1) the really big one:
   console_sem owner can be preempted under console_sem, removing any
   console_sem competition, it's already locked - its owner is preempted.
   this removes any possibility of hand off. and this unleashes CPUs that
   do printk-s, because when console_sem is locked, printk-s from other
   CPUs become, basically, as fast as sprintf+memcpy.

(1.1) the really big one, part two. more on this later. see @PART2

(2) another big one:
   printk() fast path - sprintf+memcpy - can be significantly faster than
   call to console drivers. on my board I see that printk CPU can add 1140
   new messages to the logbuf, while active console_sem owner prints a
   single message to the serial console. not to mention console_sem owner
   preemption.

(1) and (2) combined can do magical things. on my extremely trivial
test -- not insanely huge number of printks (much less than 2000 lines)
from a preemptible context, not so much printk-s from other CPUs - I
can see, and I confirmed that with the traces, that when console_sem is
getting handed over to another CPU and that new console_sem owner is
getting preempted or when it begins to print messages to serial console,
the CPU that actually does most of printk-s finishes its job in almost no
time, because all it has to do is sprintf+memcpy. which basically means
that console_sem is not contended anymore, and thus its either current
console_sem owner or _maybe_ some other task that has to print all of
the pending messages.

and to make it worse, the hand off logic does not distinguish between
contexts it's passing the console_sem ownership to. it will be equally
happy to hand off console_sem to atomic context from a non atomic one,
or from atomic to another atomic (and vice versa), regardless the number
of pending messages in the logbuf. more on this later [see @LATER].

now, Steven and Petr said that my test was non realistic and thus the
observations were invalid. OK, let's take a closer look on OOM. and
let's start with the question - what is so special about OOM that
makes (1) or (1.1) or (2) invalid?
i
can we get preempted when we call out_of_memory()->printk()->console_unlock()?

yes.

lkml.kernel.org/r/201612221927.BGE30207.OSFJMFLFOHQtOV@I-love.SAKURA.ne.jp
>
> somebody who called out_of_memory() is still preempted by other threads consuming
> CPU time due to cond_resched() from console_unlock() as demonstrated by below patch.
>

that is - we are preempted under console_sem. other CPUs keep adding
messages to the logbuf in the meantime.

can we get preempted for a very long time?

yes.

lkml.kernel.org/r/201612150136.GBC13980.FHQFLSOJOFOtVM@I-love.SAKURA.ne.jp
>
> In most cases, "Out of memory: " and "Killed process" lines are printed within 0.1
> second. But sometimes it took a few seconds. Less often it took longer than a minute.
> There was one big stall which lasted for minutes.
>

and yes.

lkml.kernel.org/r/201706060002.FCD65614.OFFLOVQtHSJFOM@I-love.SAKURA.ne.jp
>
> Regarding the OOM killer preempted by console_unlock() from printk()
> problem, it will be mitigated by offloading to the printk kernel thread.
> But offloading solves only the OOM killer preempted by console_unlock()
> case. The OOM killer can still be preempted by schedule_timeout_killable(1).
>

did Tetsuo try to fix that console_unlock() behaviour from the OOM side?
yes.

lkml.kernel.org/r/201706060002.FCD65614.OFFLOVQtHSJFOM@I-love.SAKURA.ne.jp
>
> This change is a subset of enclosing whole oom_kill_process() steps
> with preempt_disable()/preempt_enable(), which was already rejected.
>

and we also already know, that preemption from console_unlock() does
interfere with OOM

lkml.kernel.org/r/201612142037.EED00059.VJMOFLtSOQFFOH@I-love.SAKURA.ne.jp
>
> If we can map all printk() called inside oom_kill_process() to printk_deferred(),
> we can avoid cond_resched() inside console_unlock() with oom_lock held.
>

any other reports?

yes. for example:

lkml.kernel.org/r/201603022101.CAH73907.OVOOMFHFFtQJSL@I-love.SAKURA.ne.jp
>
> I'm trying to dump information of all threads which might be relevant
> to stalling inside memory allocator. But it seems to me that since this
> patch changed to allow calling cond_resched() from printk() if it is
> safe to do so, it is now possible that the thread which invoked the OOM
> killer can sleep for minutes with the oom_lock mutex held when my dump is
> in progress. I want to release oom_lock mutex as soon as possible so
> that other threads can call out_of_memory() to get TIF_MEMDIE and exit
> their allocations.
> 
> So, how can I prevent printk() triggered by out_of_memory() from sleeping
> for minutes with oom_lock mutex held? Guard it with preempt_disable() /
> preempt_enable() ? Guard it with rcu_read_lock() / rcu_read_unlock()?

and many more.

the point I'm trying to make is that - (1) and (1.1) and (2) are still very
true even when the system is under OOM condition. and we still can accumulate
a ton of messages in the logbuf. we expect that there will be a lot of printks
during OOM and that hand off logic will handle it all nicely. no objections.
but the reports that we already have suggest that that expectation may be
slightly over-optimistic. *it might happen so* (or it might not) that none
of those "a lot of printks" will be able to help us, simply because with the
hand off enabled console_sem contention does not matter anything at all as
long as the console_sem owner is preempted/rescheduled. so I guess it's
still possible that once we finally begin printing logbuf messages OOM might
be over and there might be no more concurrent printks from other CPUs, and
it's some poor task that single-handedly will have to flush pending logbuf
messages. but we expect that this won't happen, and that there will be
other printk-s.

and to make it worse. this "one task that prints the entire logbuf"
is possible even on non-preemptible kernels, and even if we would disable
preemption in console_unlock(). and the reason being is - console_sem is
a sleeping lock.

https://marc.info/?l=dri-devel&m=149938825811219
>
> I noticed that printing to consoles can stop forever because console drivers
> can wait for memory allocation when memory allocation cannot make progress.
>

which turned out to be due to this thing
https://marc.info/?l=linux-mm&m=149939515214223&w=2

>
> what you had is a combination of factors
>
>
>	CPU0			CPU1				CPU2
>								console_callback()
>								 console_lock()
>								 ^^^^^^^^^^^^^
>	vprintk_emit()		mutex_lock(&par->bo_mutex)
>				 kzalloc(GFP_KERNEL)
>	 console_trylock()	  kmem_cache_alloc()		  mutex_lock(&par->bo_mutex)
>	 ^^^^^^^^^^^^^^^^	   io_schedule_timeout
>

that is, the console_sem owner being dependent on kmalloc(), under OOM.
and this is what @PART2 "(1.1) the really big one, part two" is all
about. and yes, we may have a ton of printks messages during OOM. but
none of them will worth a penny until we return from kzalloc(GFP_KERNEL).
and in the given scheme of things, it is possible that it's console_sem
that single-handedly will have to print all pending logbuf messages. but
we expect that this won't happen, and that there will be other printk-s.

// not to mention that we even have direct console_sem dependencies on
// MM, like "console_lock(); kmalloc(GFP_KERNEL);". take a look at
// vc_allocate() or vc_do_resize(), etc.

and this is when we can recall @LATER. so let's say we have following:

> we have console_lock() from non-atomic context. console_sem owner is
> getting preempted, under console_sem. in the mean time we have OOM,
> which can print a lot of info. by the time console_sem returns back to
> TASK_RUNNING logbuf contains some number of pending messages. console
> owner goes to console_unlock(). accidentally we have printk from IRQ on
> CPUz. console_owner hands over printing duty to CPUz. so now we have to
> print OOM messages from irq.
>

CPU0                        CPU1 ~ CPUx                     CPUz

console_lock

 << preempted /
    rescheduled >>

              OOM printouts, lots  of OOM traces, etc.

 << back to RUNNING >>

  console_unlock()

    for (;;)
      sets console_owner
      call_console_drivers()                              IRQ
                                                         printk
                                                          sees console_owner
                                                          sets console_waiter

      clears console_owner
      sees console_waier
      hand off
                                                          for (;;) {
                                                             call_console_drivers()
                                                             ??? lockup
                                                          }
                                                          up()

so we are happy to hand off printing from any context to any context.
Steven said that this scenario is possible, but is not of any particular
interest, because printk from IRQ or from any other atomic context is a
bad thing, which should happen only when something wrong is going on in
the system. but we are in OOM or has just returned from the OOM. which _is_
"something bad going on", isn't it? can we instead say - OOM makes that
printk from atomic context more likely? if it does happen, will there be
non-atomic printk-s to take over printing from atomic CPUz? we can't tell.
I don't know much about Tetsuo's test, but I assume that his VM does not
have any networking activities during the test. I probably wouldn't be so
surprised to see a bunch of printk-s from atomic contexts under OOM.

but the thing is, the very moment we do that  "non-atomic -> atomic"  hand
off/transfer, we are on very thin ice. why? because fundamental and crucial
requirements that Steven's patch imposes on workload and system are:

(A) an optimistic one:
   every atomic printk() must be followed by a non-atomic one within
   the watchdog_threshold (10 seconds) interval. and, additionally, that
   non-atomic printk must happen at the right moment, so we can hand
   off the console_sem ownership to it. otherwise, we can lockup the
   system, by looping in console_unlock() from atomic context for more
   than watchdog_threshold seconds.

surely it's rather hard to count on it. thus Steven's patch has the
second fundamental requirement:

(B) a pessimistic one:
   if we have only atomic context to evict logbuf messages, then the
   rule is - any atomic context must be able to flush the entire logbuf
   within the watchdog_threshold (10 seconds) interval.

do we see a problem here?

(B) basically means that the size of the logbuf *must* be limited based
on the performance of the system's slowest console. so I don't know if
Tetsuo's using any network console or serial consoles, or if it's just
fbcon which can be really fast compared to serial console, but I know that
I'm using a serial console on my boards. and under no circumstances I can
ever have the logbuf small enough to flush it to a serial console within
10 seconds. in fact, the slower the console is - the bigger the logbuf is
(of course I'm not speaking about insanely huge buffers). this might sound
counter intuitive, but it makes sense. because, the smaller the buffer
is - the sooner you'll start losing kernel messages. but it's not just
my slow serial console to blame. consoles don't scale. the more CPUs the
system has, the more logbuf messages console potentially has to print (at
the same transfer rate); say 1024 CPUs under OOM.

realistically,
I can't make (B) true. and I can't guarantee (A) to be true.

and that's why Steven's patch is, *UNFORTUNATELY*, not doing very
well on my boards. I simply can't fulfill its pre-conditions. sure,
there might be cases when Steven's patch does the right thing. but
I believe I already saw enough to conclude that on my boards I need
something different.

	-ss

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread
  2017-12-19 14:40                                             ` Steven Rostedt
@ 2017-12-20  7:46                                               ` Sergey Senozhatsky
  0 siblings, 0 replies; 79+ messages in thread
From: Sergey Senozhatsky @ 2017-12-20  7:46 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Sergey Senozhatsky, Petr Mladek, Tejun Heo, Sergey Senozhatsky,
	Jan Kara, Andrew Morton, Peter Zijlstra, Rafael Wysocki,
	Pavel Machek, Tetsuo Handa, linux-kernel

On (12/19/17 09:40), Steven Rostedt wrote:
> On Tue, 19 Dec 2017 13:58:46 +0900
> Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> wrote:
> 
> > so you are not convinced that my scenarios real/matter; I'm not
> 
> Well, not with the test module. I'm looking for actual code in the
> upstream kernel.
> 
> > convinced that I have stable and functional boards with this patch ;)
> > seems that we are coming to a dead end.
> 
> Can I ask, is it any worse than what we have today?

that's a really hard question. both the existing printk() and the
tweaked printk() have the same thing in common - a) preemption from
console_unlock() and b) printk() being way to fast compared to anything
else (call_console_drivers() and to preemption latency of console_sem
owner). your patch puts some requirements that my workload simply cannot
fulfill. so may be if I'll start pushing it towards OOM and so on, then
I'll see some difference (but both (a) and (b) still gonna stay true).

the thing that really changes everything is offloading to printk_kthread.
given that I can't have a tiny logbuf, and that I can't have fast console,
and that I can't have tons of printks to chose from and to get advantage
of hand off algorithm in any reliable way; I need something more to
guarantee that the current console_sem will not be forced to evict all
logbuf messages.

> > for the record,
> > I'm not going to block the patch if you want it to be merged.
> 
> Thanks,

I mean it :)

	-ss

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread
  2017-12-20  7:10                                       ` Sergey Senozhatsky
@ 2017-12-20 12:06                                         ` Tetsuo Handa
  2017-12-21  6:52                                           ` Sergey Senozhatsky
  0 siblings, 1 reply; 79+ messages in thread
From: Tetsuo Handa @ 2017-12-20 12:06 UTC (permalink / raw)
  To: sergey.senozhatsky.work, mhocko
  Cc: rostedt, pmladek, tj, sergey.senozhatsky, jack, akpm, peterz,
	rjw, pavel, linux-kernel

Sergey Senozhatsky wrote:
> Steven said that this scenario is possible, but is not of any particular
> interest, because printk from IRQ or from any other atomic context is a
> bad thing, which should happen only when something wrong is going on in
> the system. but we are in OOM or has just returned from the OOM. which _is_
> "something bad going on", isn't it? can we instead say - OOM makes that
> printk from atomic context more likely? if it does happen, will there be
> non-atomic printk-s to take over printing from atomic CPUz? we can't tell.
> I don't know much about Tetsuo's test, but I assume that his VM does not
> have any networking activities during the test. I probably wouldn't be so
> surprised to see a bunch of printk-s from atomic contexts under OOM.

I'm using VMware Workstation Player, and my VM does not have any network
activity other than ssh login session. Fortunately, VMware's serial console
(written to host's file) is reliable enough to allow console=ttyS0,115200n8
configuration. But there is a virtualization software where serial console is
so weak that I have to choose netconsole instead. Also, there are enterprise
servers where very slow configuration (e.g. 1200 or 9600) has to be used for
serial console because serial device is emulated using system management
interrupts instead of using real hardware. Therefore, while it is true that
any approach would survive my environment, it is dangerous to assume that any
approach is safe for my customer's enterprise servers.

Thanks for summarizing the pointers. The safest way for not overflowing
printk() will be to use mutex_lock(&oom_lock) at __alloc_pagesmay_oom() (and
yield the CPU resource to the thread flushing the logbuf), but so far we
have not came to agreement. Fortunately, since warn_alloc() for reporting
allocation stall was killed in 4.15-rc1, the risk of overflowing printk()
under OOM was reduced a lot. But yes, since my VM has little network
activity, printk() flooding due to allocation failure might happen in
different VMs.

Anyway, the rule that "do not try to printk() faster than the kernel can
write to consoles" will remain no matter how printk() changes. I think that
any printk() users has to be careful not to waste CPU resource. MM's direct
reclaim + back off combination is a user who really love to waste CPU resource
while someone is printk()ing.

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread
  2017-12-20 12:06                                         ` Tetsuo Handa
@ 2017-12-21  6:52                                           ` Sergey Senozhatsky
  0 siblings, 0 replies; 79+ messages in thread
From: Sergey Senozhatsky @ 2017-12-21  6:52 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: sergey.senozhatsky.work, mhocko, rostedt, pmladek, tj,
	sergey.senozhatsky, jack, akpm, peterz, rjw, pavel, linux-kernel

Hi Tetsuo,

On (12/20/17 21:06), Tetsuo Handa wrote:
> Sergey Senozhatsky wrote:
>
[..]
>
> Anyway, the rule that "do not try to printk() faster than the kernel can
> write to consoles" will remain no matter how printk() changes.

and the "faster than the kernel can write to consoles" is tricky.
it does not depend ONLY on the performance of underlying console
devices. but also on the scheduler and IRQs. simply because of the
way we print:

void console_unlock(void)
{
	for (;;) {
		local_irq_save();

		text = pick_the_next_pending_logbuf_message();

		call_console_drivers(text);
		local_irq_enable();
				^^^^ preemption + irqs
	}
}

on my board call_console_drivers() can spend up to 0.01 of a second
printing a _single_ message. which basically means that I'm guaranteed
to have preemption in console_unlock() under console_sem after every
line it prints [if console_unlock() is invoked from preemptible context].
this, surely, has huge impact on "do not try to printk() faster than
this".

but there are more points in my "why the patch doesn't work for me"
email than just "preemption in console_unlock()".

	-ss

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread
  2017-12-14 18:21         ` Steven Rostedt
@ 2017-12-22  0:09           ` Tejun Heo
  2017-12-22  4:19             ` Steven Rostedt
  0 siblings, 1 reply; 79+ messages in thread
From: Tejun Heo @ 2017-12-22  0:09 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Petr Mladek, Sergey Senozhatsky, Jan Kara, Andrew Morton,
	Peter Zijlstra, Rafael Wysocki, Pavel Machek, Tetsuo Handa,
	linux-kernel, Sergey Senozhatsky

Hello,

Sorry about the long delay.

On Thu, Dec 14, 2017 at 01:21:09PM -0500, Steven Rostedt wrote:
> > Yeah, will do, but out of curiosity, Sergey and I already described
> > what the root problem was and you didn't really seem to take that.  Is
> > that because the explanation didn't make sense to you or us
> > misunderstanding what your code does?
> 
> Can you post the message id of the discussion you are referencing.
> Because I've been swamped with other activities and only been skimming
> these threads.

This was already on another reply but just in case.

 http://lkml.kernel.org/r/20171108162813.GA983427@devbig577.frc2.facebook.com

I tried your v4 patch and ran the test module and could easily
reproduce RCU stall and other issues stemming from a CPU getting
pegged down by printk flushing.  I'm attaching the test module code at
the end.

I wrote this before but this isn't a theoretical problem.  We see
these stalls a lot.  Preemption isn't enabled to begin with.  Memory
pressure is high and OOM triggers and printk starts printing out OOM
warnings; then, a network packet comes in which triggers allocations
in the network layer, which fails due to memory pressure, which then
generates memory allocation failure messages which then generates
netconsole packets which then tries to allocate more memory and so on.

It's just that there's no one else to give that flushing duty too, so
the pingpoinging that your patch implements can't really help
anything.

You argue that it isn't made worse by your patch, which may be true
but your patch doesn't solve actual problems and is most likely
unnecessary complication which gets in the way for the actual
solution.  It's a weird argument to push an approach which is
fundamentally broken.  Let's please stop that.

Thanks.

#include <linux/module.h>
#include <linux/delay.h>
#include <linux/sched.h>
#include <linux/mutex.h>
#include <linux/workqueue.h>
#include <linux/hrtimer.h>

static bool in_printk;
static bool stop_testing;
static struct hrtimer printk_timer;
static ktime_t timer_interval;

static enum hrtimer_restart printk_timerfn(struct hrtimer *timer)
{
	int i;

	if (READ_ONCE(in_printk))
		for (i = 0; i < 10000; i++)
			printk("%-80s\n", "XXX TIMER");

	hrtimer_forward_now(&printk_timer, timer_interval);
	return READ_ONCE(stop_testing) ? HRTIMER_NORESTART : HRTIMER_RESTART;
}	

static void preempt_printk_workfn(struct work_struct *work)
{
	int i;

	hrtimer_init(&printk_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
	printk_timer.function = printk_timerfn;
	timer_interval = ktime_set(0, NSEC_PER_MSEC);
	hrtimer_start(&printk_timer, timer_interval, HRTIMER_MODE_REL);

	while (!READ_ONCE(stop_testing)) {
		preempt_disable();
		WRITE_ONCE(in_printk, true);
		for (i = 0; i < 100; i++)
			printk("%-80s\n", "XXX PREEMPT");
		WRITE_ONCE(in_printk, false);
		preempt_enable();
		msleep(1);
	}
}
static DECLARE_WORK(preempt_printk_work, preempt_printk_workfn);

static int __init test_init(void)
{
	queue_work_on(0, system_wq, &preempt_printk_work);
	return 0;
}

static void __exit test_exit(void)
{
	WRITE_ONCE(stop_testing, true);
	flush_work(&preempt_printk_work);
	hrtimer_cancel(&printk_timer);
}

module_init(test_init);
module_exit(test_exit);
MODULE_LICENSE("GPL");

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread
  2017-12-22  0:09           ` Tejun Heo
@ 2017-12-22  4:19             ` Steven Rostedt
  2017-12-28  6:48               ` Sergey Senozhatsky
  2018-01-09 20:06               ` Tejun Heo
  0 siblings, 2 replies; 79+ messages in thread
From: Steven Rostedt @ 2017-12-22  4:19 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Petr Mladek, Sergey Senozhatsky, Jan Kara, Andrew Morton,
	Peter Zijlstra, Rafael Wysocki, Pavel Machek, Tetsuo Handa,
	linux-kernel, Sergey Senozhatsky

On Thu, 21 Dec 2017 16:09:32 -0800
Tejun Heo <tj@kernel.org> wrote:
> 
> I tried your v4 patch and ran the test module and could easily
> reproduce RCU stall and other issues stemming from a CPU getting
> pegged down by printk flushing.  I'm attaching the test module code at
> the end.

Thanks, I'll take a look.

> 
> I wrote this before but this isn't a theoretical problem.  We see
> these stalls a lot.  Preemption isn't enabled to begin with.  Memory
> pressure is high and OOM triggers and printk starts printing out OOM
> warnings; then, a network packet comes in which triggers allocations
> in the network layer, which fails due to memory pressure, which then
> generates memory allocation failure messages which then generates
> netconsole packets which then tries to allocate more memory and so on.

It doesn't matter if preemption is enabled or not. The hand off should
happen either way.


> 
> It's just that there's no one else to give that flushing duty too, so
> the pingpoinging that your patch implements can't really help
> anything.
> 
> You argue that it isn't made worse by your patch, which may be true
> but your patch doesn't solve actual problems and is most likely
> unnecessary complication which gets in the way for the actual
> solution.  It's a weird argument to push an approach which is
> fundamentally broken.  Let's please stop that.

BULLSHIT!

It's not a complex solution, and coming from the cgroup and workqueue
maintainer, that's a pretty ironic comment.

It has already been proven that it can solve problems:

  http://lkml.kernel.org/r/20171219143147.GB15210@dhcp22.suse.cz

You don't think handing off printks to an offloaded thread isn't more
complex nor can it cause even more issues (like more likely to lose
relevant information on kernel crashes)?

> 
> Thanks.
> 
> #include <linux/module.h>
> #include <linux/delay.h>
> #include <linux/sched.h>
> #include <linux/mutex.h>
> #include <linux/workqueue.h>
> #include <linux/hrtimer.h>
> 
> static bool in_printk;
> static bool stop_testing;
> static struct hrtimer printk_timer;
> static ktime_t timer_interval;
> 
> static enum hrtimer_restart printk_timerfn(struct hrtimer *timer)
> {
> 	int i;
> 
> 	if (READ_ONCE(in_printk))
> 		for (i = 0; i < 10000; i++)
> 			printk("%-80s\n", "XXX TIMER");

WTF!

You are printing 10,000 printk messages from an interrupt context???
And to top it off, I ran this on my box, switching printk() to
trace_printk() (which is extremely low overhead). And it is triggered
on the same CPU that did the printk() itself on. Yeah, there is no hand
off, because you are doing a shitload of printks on one CPU and nothing
on any of the other CPUs. This isn't the problem that my patch was set
out to solve, nor is it a very realistic problem. I added a counter to
the printk as well, to keep track of how many printks there were:

 # trace-cmd record -e printk -e irq

kworker/-1603    0...1  1571.783182: print:                (nil)s: start 0
kworker/-1603    0d..1  1571.783189: console:              [ 1571.540656] XXX PREEMPT                                                                     
kworker/-1603    0d.h1  1571.791953: softirq_raise:        vec=1 [action=TIMER]
kworker/-1603    0d.h1  1571.791953: softirq_raise:        vec=9 [action=RCU]
kworker/-1603    0d.h1  1571.791957: softirq_raise:        vec=7 [action=SCHED]
kworker/-1603    0d.h1  1571.791959: print:                (nil)s: XXX TIMER                                                                        (0)
kworker/-1603    0d.h1  1571.791960: print:                (nil)s: XXX TIMER                                                                        (1)


Let's look at the above. My trace_printk() is the "start 0" which is
the first iteration of the 100 printk()s you are doing. The "console"
is the printk() tracepoint of your worker thread printk.

The other "print:" events are my trace_printk()s that replaced the
printk() from your interrupt handler. I added a counter to see which
iteration it is. Note, trace_printk() is much faster than printk()
writing to the log buffer (let alone writing to the console).


kworker/-1603    0d.h1  1571.791960: print:                (nil)s: XXX TIMER                                                                        (2)
kworker/-1603    0d.h1  1571.791961: print:                (nil)s: XXX TIMER                                                                        (3)
kworker/-1603    0d.h1  1571.791961: print:                (nil)s: XXX TIMER                                                                        (4)
kworker/-1603    0d.h1  1571.791962: print:                (nil)s: XXX TIMER                                                                        (5)

[..]

kworker/-1603    0d.h1  1571.794473: print:                (nil)s: XXX TIMER                                                                        (9992)
kworker/-1603    0d.h1  1571.794474: print:                (nil)s: XXX TIMER                                                                        (9993)
kworker/-1603    0d.h1  1571.794474: print:                (nil)s: XXX TIMER                                                                        (9994)
kworker/-1603    0d.h1  1571.794474: print:                (nil)s: XXX TIMER                                                                        (9995)
kworker/-1603    0d.h1  1571.794474: print:                (nil)s: XXX TIMER                                                                        (9996)
kworker/-1603    0d.h1  1571.794475: print:                (nil)s: XXX TIMER                                                                        (9997)
kworker/-1603    0d.h1  1571.794475: print:                (nil)s: XXX TIMER                                                                        (9998)
kworker/-1603    0d.h1  1571.794475: print:                (nil)s: XXX TIMER                                                                        (9999)
kworker/-1603    0d.h1  1571.794477: softirq_raise:        vec=1 [action=TIMER]
kworker/-1603    0d.h1  1571.794477: softirq_raise:        vec=9 [action=RCU]
kworker/-1603    0d.h1  1571.794478: softirq_raise:        vec=7 [action=SCHED]
kworker/-1603    0..s1  1571.794481: softirq_entry:        vec=1 [action=TIMER]
kworker/-1603    0..s1  1571.794482: softirq_exit:         vec=1 [action=TIMER]
kworker/-1603    0..s1  1571.794482: softirq_entry:        vec=7 [action=SCHED]
kworker/-1603    0..s1  1571.794484: softirq_exit:         vec=7 [action=SCHED]
kworker/-1603    0..s1  1571.794484: softirq_entry:        vec=9 [action=RCU]
kworker/-1603    0..s1  1571.794494: softirq_exit:         vec=9 [action=RCU]
kworker/-1603    0...1  1571.794495: print:                (nil)s: end 0

Here we finish the first printk() from the worker thread. The print
started at 1571.783182 and finished at 1571.794495. That's over 11
milliseconds! And with preemption disabled. No offloading this to
another printk thread will fix that. And this printing isn't even done
yet. This is the first of 100 loops!

Let's look at the interrupt too. Remember, trace_printk() is faster
than printk() writing to its log buffers. No lock is needed as
trace_printk() is per_cpu and lockless. That said, your example started
at 1571.791953 and finished at 1571.794494 from interrupt context.
That's 2.5 milliseconds! Worse yet, this is repeated for 99 more
times!!!


kworker/-1603    0...1  1571.794495: print:                (nil)s: start 1
kworker/-1603    0d..1  1571.794496: console:              [ 1571.551967] XXX PREEMPT                                                                     
kworker/-1603    0d.h1  1571.803089: softirq_raise:        vec=1 [action=TIMER]
kworker/-1603    0d.h1  1571.803089: softirq_raise:        vec=9 [action=RCU]
kworker/-1603    0d.h1  1571.803094: print:                (nil)s: XXX TIMER                                                                        (0)
kworker/-1603    0d.h1  1571.803095: print:                (nil)s: XXX TIMER                                                                        (1)
kworker/-1603    0d.h1  1571.803095: print:                (nil)s: XXX TIMER                                                                        (2)
kworker/-1603    0d.h1  1571.803095: print:                (nil)s: XXX TIMER                                                                        (3)
kworker/-1603    0d.h1  1571.803095: print:                (nil)s: XXX TIMER                                                                        (4)

[...]

kworker/-1603    0dNh1  1572.860286: print:                (nil)s: XXX TIMER                                                                        (9996)
kworker/-1603    0dNh1  1572.860286: print:                (nil)s: XXX TIMER                                                                        (9997)
kworker/-1603    0dNh1  1572.860287: print:                (nil)s: XXX TIMER                                                                        (9998)
kworker/-1603    0dNh1  1572.860287: print:                (nil)s: XXX TIMER                                                                        (9999)
kworker/-1603    0dNh1  1572.860288: softirq_raise:        vec=1 [action=TIMER]
kworker/-1603    0dNh1  1572.860288: softirq_raise:        vec=9 [action=RCU]
kworker/-1603    0dNh1  1572.860289: softirq_raise:        vec=7 [action=SCHED]
kworker/-1603    0.N.1  1572.860291: print:                (nil)s: end 99

Now I find the last loop, and here we are at 1572.860291, which is over
1 second! from the time it started. I did have a serial console set up
for this test, so the printks were slow. Let me try this with no serial
console...

It does fair better...

It got to 19 prints before the interrupt triggered:

kworker/-196     0...1    70.474589: print:                (nil)s: end 18
kworker/-196     0...1    70.474597: print:                (nil)s: start 19
kworker/-196     0d..1    70.474599: console:              [   70.472301] XXX PREEMPT                                                                     
kworker/-196     0d.h1    70.474646: print:                (nil)s: XXX TIMER                                                                        (0)
kworker/-196     0d.h1    70.474646: print:                (nil)s: XXX TIMER                                                                        (1)

But 10,000 printks in interrupt context is still a very long time. That
doesn't improve.

kworker/-196     0d.h1    70.479854: print:                (nil)s: XXX TIMER                                                                        (9997)
kworker/-196     0d.h1    70.479855: print:                (nil)s: XXX TIMER                                                                        (9998)
kworker/-196     0d.h1    70.479855: print:                (nil)s: XXX TIMER                                                                        (9999)
kworker/-196     0d.h1    70.479858: softirq_raise:        vec=1 [action=TIMER]
kworker/-196     0d.h1    70.479859: softirq_raise:        vec=9 [action=RCU]
kworker/-196     0d.H1    70.479867: irq_handler_entry:    irq=27 name=em1
kworker/-196     0d.H1    70.479871: softirq_raise:        vec=3 [action=NET_RX]
kworker/-196     0d.H1    70.479871: irq_handler_exit:     irq=27 ret=handled
kworker/-196     0..s1    70.479872: softirq_entry:        vec=1 [action=TIMER]
kworker/-196     0..s1    70.479874: softirq_exit:         vec=1 [action=TIMER]
kworker/-196     0..s1    70.479874: softirq_entry:        vec=9 [action=RCU]
kworker/-196     0..s1    70.479875: softirq_exit:         vec=9 [action=RCU]
kworker/-196     0..s1    70.479876: softirq_entry:        vec=3 [action=NET_RX]
kworker/-196     0..s1    70.479970: softirq_exit:         vec=3 [action=NET_RX]
kworker/-196     0...1    70.479972: print:                (nil)s: end 19

Interrupt context started at 70.474646 and ended at 70.479970. For some
reason, this one took over 5 milliseconds in interrupt context!

Looking at the next interrupt, which happened at 29:

kworker/-196     0...1    70.480300: print:                (nil)s: end 28
kworker/-196     0...1    70.480300: print:                (nil)s: start 29
kworker/-196     0d..1    70.480301: console:              [   70.478004] XXX PREEMPT                                                                     
kworker/-196     0d.h1    70.480830: print:                (nil)s: XXX TIMER                                                                        (0)
[..]
kworker/-196     0d.h1    70.484455: print:                (nil)s: XXX TIMER                                                                        (9998)
kworker/-196     0d.h1    70.484455: print:                (nil)s: XXX TIMER                                                                        (9999)
kworker/-196     0d.h1    70.484456: softirq_raise:        vec=1 [action=TIMER]
kworker/-196     0d.h1    70.484456: softirq_raise:        vec=9 [action=RCU]
kworker/-196     0d.h1    70.484457: softirq_raise:        vec=7 [action=SCHED]
kworker/-196     0d.H1    70.484458: irq_handler_entry:    irq=25 name=ahci[0000:00:1f.2]
kworker/-196     0d.H2    70.484466: softirq_raise:        vec=4 [action=BLOCK]
kworker/-196     0d.H1    70.484466: irq_handler_exit:     irq=25 ret=handled
kworker/-196     0..s1    70.484466: softirq_entry:        vec=1 [action=TIMER]
kworker/-196     0..s1    70.484467: softirq_exit:         vec=1 [action=TIMER]
kworker/-196     0..s1    70.484467: softirq_entry:        vec=7 [action=SCHED]
kworker/-196     0..s1    70.484469: softirq_exit:         vec=7 [action=SCHED]
kworker/-196     0..s1    70.484469: softirq_entry:        vec=9 [action=RCU]
kworker/-196     0d.s1    70.484484: softirq_raise:        vec=9 [action=RCU]
kworker/-196     0..s1    70.484484: softirq_exit:         vec=9 [action=RCU]
kworker/-196     0..s1    70.484484: softirq_entry:        vec=4 [action=BLOCK]
kworker/-196     0.Ns1    70.484518: softirq_exit:         vec=4 [action=BLOCK]
kworker/-196     0.Ns1    70.484519: softirq_entry:        vec=9 [action=RCU]
kworker/-196     0dNs1    70.484532: softirq_raise:        vec=9 [action=RCU]
kworker/-196     0.Ns1    70.484532: softirq_exit:         vec=9 [action=RCU]
kworker/-196     0.N.1    70.484534: print:                (nil)s: end 29

It took 3 milliseconds. Of course I'm also counting the softirqs that
need to catch up.

Finally, it ends at:

kworker/-196     0dNh1    70.525877: print:                (nil)s: XXX TIMER                                                                        (9997)
kworker/-196     0dNh1    70.525877: print:                (nil)s: XXX TIMER                                                                        (9998)
kworker/-196     0dNh1    70.525878: print:                (nil)s: XXX TIMER                                                                        (9999)
kworker/-196     0dNh1    70.525879: softirq_raise:        vec=1 [action=TIMER]
kworker/-196     0dNh1    70.525879: softirq_raise:        vec=9 [action=RCU]
kworker/-196     0.N.1    70.525880: print:                (nil)s: end 98
kworker/-196     0.N.1    70.525880: print:                (nil)s: start 99
kworker/-196     0dN.1    70.525881: console:              [   70.523584] XXX PREEMPT                                                                     
kworker/-196     0.N.1    70.525998: print:                (nil)s: end 99

With the start at:

kworker/-196     0...1    70.473685: print:                (nil)s: start 1

So, without a slow console, we only had 52 milliseconds with preemption
disabled. As this is all happening only on a single CPU, and I haven't
seen issues where one CPU caused problems except for doing sysrq-t and
other debugging problems. I highly doubt any OOMs could trigger this.
If OOM started to spam the machine, it would most certainly do it on
multiple CPUs, where my patch would indeed have an effect.

This is a very extreme example (of how not to write code), and I believe
it is totally unrealistic. PLEASE STOP THAT.

My version of the module I used is at the bottom of this email.

-- Steve

> 
> 	hrtimer_forward_now(&printk_timer, timer_interval);
> 	return READ_ONCE(stop_testing) ? HRTIMER_NORESTART : HRTIMER_RESTART;
> }	
> 
> static void preempt_printk_workfn(struct work_struct *work)
> {
> 	int i;
> 
> 	hrtimer_init(&printk_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
> 	printk_timer.function = printk_timerfn;
> 	timer_interval = ktime_set(0, NSEC_PER_MSEC);
> 	hrtimer_start(&printk_timer, timer_interval, HRTIMER_MODE_REL);
> 
> 	while (!READ_ONCE(stop_testing)) {
> 		preempt_disable();
> 		WRITE_ONCE(in_printk, true);
> 		for (i = 0; i < 100; i++)
> 			printk("%-80s\n", "XXX PREEMPT");
> 		WRITE_ONCE(in_printk, false);
> 		preempt_enable();
> 		msleep(1);
> 	}
> }
> static DECLARE_WORK(preempt_printk_work, preempt_printk_workfn);
> 
> static int __init test_init(void)
> {
> 	queue_work_on(0, system_wq, &preempt_printk_work);
> 	return 0;
> }
> 
> static void __exit test_exit(void)
> {
> 	WRITE_ONCE(stop_testing, true);
> 	flush_work(&preempt_printk_work);
> 	hrtimer_cancel(&printk_timer);
> }
> 
> module_init(test_init);
> module_exit(test_exit);
> MODULE_LICENSE("GPL");

#include <linux/module.h>
#include <linux/delay.h>
#include <linux/sched.h>
#include <linux/mutex.h>
#include <linux/workqueue.h>
#include <linux/hrtimer.h>

static bool in_printk;
static bool stop_testing;
static struct hrtimer printk_timer;
static ktime_t timer_interval;

static enum hrtimer_restart printk_timerfn(struct hrtimer *timer)
{
	int i;

	if (READ_ONCE(in_printk))
		for (i = 0; i < 10000; i++)
			__trace_printk(0,"%-80s (%d)\n", "XXX TIMER",i);

	hrtimer_forward_now(&printk_timer, timer_interval);
	return READ_ONCE(stop_testing) ? HRTIMER_NORESTART : HRTIMER_RESTART;
}	

static void preempt_printk_workfn(struct work_struct *work)
{
	int i;

	hrtimer_init(&printk_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
	printk_timer.function = printk_timerfn;
	timer_interval = ktime_set(0, NSEC_PER_MSEC);
	hrtimer_start(&printk_timer, timer_interval, HRTIMER_MODE_REL);

	while (!READ_ONCE(stop_testing)) {
		preempt_disable();
		WRITE_ONCE(in_printk, true);
		for (i = 0; i < 100; i++) {
			__trace_printk(0,"start %i\n", i);
			printk("%-80s\n", "XXX PREEMPT");
			__trace_printk(0,"end %i\n", i);
		}
		WRITE_ONCE(in_printk, false);
		preempt_enable();
		msleep(1);
	}
}
static DECLARE_WORK(preempt_printk_work, preempt_printk_workfn);

static void finish(void)
{
	WRITE_ONCE(stop_testing, true);
	flush_work(&preempt_printk_work);
	hrtimer_cancel(&printk_timer);
}
static int __init test_init(void)
{
	queue_work_on(0, system_wq, &preempt_printk_work);

	return 0;
}

static void __exit test_exit(void)
{
	finish();
}

module_init(test_init);
module_exit(test_exit);
MODULE_LICENSE("GPL");

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread
  2017-12-22  4:19             ` Steven Rostedt
@ 2017-12-28  6:48               ` Sergey Senozhatsky
  2017-12-28 10:07                 ` Sergey Senozhatsky
  2018-01-09 20:06               ` Tejun Heo
  1 sibling, 1 reply; 79+ messages in thread
From: Sergey Senozhatsky @ 2017-12-28  6:48 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Tejun Heo, Petr Mladek, Sergey Senozhatsky, Jan Kara,
	Andrew Morton, Peter Zijlstra, Rafael Wysocki, Pavel Machek,
	Tetsuo Handa, linux-kernel, Sergey Senozhatsky

Hello,

On (12/21/17 23:19), Steven Rostedt wrote:
[..]
> > I wrote this before but this isn't a theoretical problem.  We see
> > these stalls a lot.  Preemption isn't enabled to begin with.  Memory
> > pressure is high and OOM triggers and printk starts printing out OOM
> > warnings; then, a network packet comes in which triggers allocations
> > in the network layer, which fails due to memory pressure, which then
> > generates memory allocation failure messages which then generates
> > netconsole packets which then tries to allocate more memory and so on.
> 
> It doesn't matter if preemption is enabled or not. The hand off should
> happen either way.

preemption does matter. it matters so much, that it makes the first
thing your patch depends on to be questionable - if CPUA stuck
   in console_unlock() printing other CPU's messages, it's
   because there are currently printk()-s happening on CPUB-CPUZ.

which is not always true. with preemption enabled CPUA can stuck in
console_unlock() because printk()-s from CPUB-CPUZ could have happened
seconds or even minutes ago. I have demonstrated it with the traces; it
didn't convenience you. OK, I sat down and went trough Tetsuo's reports
starting from 2016:

	https://marc.info/?l=linux-kernel&m=151375384500555

and Tetsuo has reported several times that printing CPU can sleep for
minutes with console_sem being locked; while other CPUs are happy to
printk()->log_store() as much as they want. and that's why I believe
that that "The hand off should happen either way" is dangerous, especially
when we hand off from a preemptible context to an atomic context. you
don't think that this is a problem, because your patch requires logbuf to
be small enough for any atomic context (irq, under spin_lock, etc) to be
able to print out all the pending messages to all registered consoles [we
print to consoles sequentially] within watchdog_threshold seconds (10
seconds by default). who does adjust the size of the logbuf in a way that
console_unlock() can print the entire logbuf under 10 seconds?

> > It's just that there's no one else to give that flushing duty too, so
> > the pingpoinging that your patch implements can't really help
> > anything.
> > 
> > You argue that it isn't made worse by your patch, which may be true
> > but your patch doesn't solve actual problems and is most likely
> > unnecessary complication which gets in the way for the actual
> > solution.  It's a weird argument to push an approach which is
> > fundamentally broken.  Let's please stop that.
> 
> BULLSHIT!
> 
> It's not a complex solution, and coming from the cgroup and workqueue
> maintainer, that's a pretty ironic comment.
> 
> It has already been proven that it can solve problems:
> 
>   http://lkml.kernel.org/r/20171219143147.GB15210@dhcp22.suse.cz

and Tetsuo also said that his test does not cover all the scenarios;
and he has sent us all a pretty clear warning:

 : Therefore, while it is true that any approach would survive my
 : environment, it is dangerous to assume that any approach is safe
 : for my customer's enterprise servers.

https://marc.info/?l=linux-kernel&m=151377161705573

when it comes to lockups, printk() design is so flexible, that one can
justify nearly every patch no matter how many regressions it introduces,
by simply saying:
	"but the same lockup scenario could have happened even before my
	 patch. it's just printk() was designed this way. you were lucky
	 enough not to hit the problem before; now you are less lucky".

been there, done that. it's a trap.

> You don't think handing off printks to an offloaded thread isn't more
> complex nor can it cause even more issues (like more likely to lose
> relevant information on kernel crashes)?

printk_kthread *used* to be way too independent. basically what we had
before was

	bad_function()
	 printk()
	  vprintk_emit()
	  {
	    if (oops_in_progress)
	       can_offload_to_printk = false;

	    if (can_offload_to_printk)
	       wake_up(printk_kthread);
	    else if (console_trylock())
	       console_unlock();
	  }

the bad things:

- first, we do not take into account the fact that printk_kthread can
  already be active.

- second, we do not take into account the fact that printk_kthread can
  be way-way behind - a huge gap between `console_seq' and `log_next_seq'.

- third, we do not take into account the fact that printk_kthread can
  be preempted under console_sem.

so, IOW, the CPU which was in trouble declared the "oh, we should not
offload to printk kthread" emergency only for itself, basically. the CPU
which printk_kthread was running on or sleeping on [preemption] did not
care, neither did printk_kthread.

what we have now:

- printk_kthread cannot be preempted under console_sem. if it acquires
  the console_sem it stays RUNNING, printing the messages.

- printk_kthread does check for emergency on other CPUs. right before
  it starts printing the next pending logbuf line (every line).
  IOW, instead of that:

  console_unlock()
  {
      again:
           for (;;) {
              call_console_drivers();
           }

           up();

           if (more_pending_messages && down_trylock())
               goto again;
  }

  it now does this:

  console_unlock()
  {
          preempt_disable();
     again:
          for (;;) {
              if (something_is_not_right)
                 break;
              if (printing_for_too_long)
                 break;
              call_console_drivers();
          }

          up();
          preempt_enable();

          if (!something_is_not_right && !printing_for_too_long &&
                           more_pending_messages && down_trylock())
              goto again;
  }

so if CPUA declares emergency (oops_in_progress, etc) then printk_kthread,
if it's running, will get out of CPUA's way as soon as possible. so I
don't think we are still *more likely* to lose relevant information on
kernel crashes because of printk_kthread.

and I'm actually thinking about returning back the old vprintk_emit()
behavior

       vprintk_emit()
       {
+         preempt_disable();
         if (console_trylock())
             console_unlock();
+         preempt_enable();
       }

and letting console_unlock() to offload when needed. preemption in
vprintk_emit() is no good.

	-ss

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread
  2017-12-28  6:48               ` Sergey Senozhatsky
@ 2017-12-28 10:07                 ` Sergey Senozhatsky
  2017-12-29 13:59                   ` Tetsuo Handa
  0 siblings, 1 reply; 79+ messages in thread
From: Sergey Senozhatsky @ 2017-12-28 10:07 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Steven Rostedt, Tejun Heo, Petr Mladek, Sergey Senozhatsky,
	Jan Kara, Andrew Morton, Peter Zijlstra, Rafael Wysocki,
	Pavel Machek, Tetsuo Handa, linux-kernel

On (12/28/17 15:48), Sergey Senozhatsky wrote:
[..]
> and I'm actually thinking about returning back the old vprintk_emit()
> behavior
> 
>        vprintk_emit()
>        {
> +         preempt_disable();
>          if (console_trylock())
>              console_unlock();
> +         preempt_enable();
>        }

but am not going to.
it's outside of printk_kthread scope. and, besides, not every CPU which
is looping on console_unlock() came there via printk(). so by disabling
preemption in console_unlock() we cover more cases.

	-ss

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread
  2017-12-28 10:07                 ` Sergey Senozhatsky
@ 2017-12-29 13:59                   ` Tetsuo Handa
  2017-12-31  1:44                     ` Sergey Senozhatsky
  0 siblings, 1 reply; 79+ messages in thread
From: Tetsuo Handa @ 2017-12-29 13:59 UTC (permalink / raw)
  To: sergey.senozhatsky.work
  Cc: rostedt, tj, pmladek, sergey.senozhatsky, jack, akpm, peterz,
	rjw, pavel, linux-kernel

Sergey Senozhatsky wrote:
> On (12/28/17 15:48), Sergey Senozhatsky wrote:
> [..]
> > and I'm actually thinking about returning back the old vprintk_emit()
> > behavior
> > 
> >        vprintk_emit()
> >        {
> > +         preempt_disable();
> >          if (console_trylock())
> >              console_unlock();
> > +         preempt_enable();
> >        }
> 
> but am not going to.
> it's outside of printk_kthread scope. and, besides, not every CPU which
> is looping on console_unlock() came there via printk(). so by disabling
> preemption in console_unlock() we cover more cases.

Just an idea: Do we really need to use a semaphore for console_sem?

Is it possible to replace it with a spinlock? Then, I feel that we can write
to consoles from non-process context (i.e. soft or hard IRQ context), with
write only one log (or even one byte) at a time (i.e. write one log from one
context, and defer all remaining logs by "somehow" scheduling for calling
that context again).

Since process context might fail to allow printk kernel thread to run for
long period due to many threads waiting for run, I thought that interrupt
context might fit better if we can "somehow" chain interrupt contexts.

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread
  2017-12-29 13:59                   ` Tetsuo Handa
@ 2017-12-31  1:44                     ` Sergey Senozhatsky
  0 siblings, 0 replies; 79+ messages in thread
From: Sergey Senozhatsky @ 2017-12-31  1:44 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: sergey.senozhatsky.work, rostedt, tj, pmladek,
	sergey.senozhatsky, jack, akpm, peterz, rjw, pavel, linux-kernel

Hello,

On (12/29/17 22:59), Tetsuo Handa wrote:
[..]
> Just an idea: Do we really need to use a semaphore for console_sem?
> 
> Is it possible to replace it with a spinlock? Then, I feel that we can write
> to consoles from non-process context (i.e. soft or hard IRQ context), with
> write only one log (or even one byte) at a time (i.e. write one log from one
> context, and defer all remaining logs by "somehow" scheduling for calling
> that context again).
> 
> Since process context might fail to allow printk kernel thread to run for
> long period due to many threads waiting for run, I thought that interrupt
> context might fit better if we can "somehow" chain interrupt contexts.

that's a good question. printk(), indeed, does not care that much. but
the whole thing is more complex. I can copy-paste (sorry for that) one
of my previous emails to give a brief (I'm sure the description is
incomplete) idea.

====================

the real purpose of console_sem is to synchronize all events that can
happen to VT, fbcon, TTY, video, etc. and there are many events that
can happen to VT/fbcon. and some of those events can sleep - that's
where printk() can suffer. and this is why printk() is not different
from any  other console_sem users -- printk() uses that lock in order
to synchronize its own events: to have only one printing CPU, to prevent
concurrent console drivers list modification, to prevent concurrent consoles
modification, and so on.

let's take VT and fbcon for simplicity.

the events are.

1) IOCTL from user space
   they may involve things like resizing, scrolling, rotating,

   take a look at drivers/tty/vt/vt_ioctl.c  vt_ioctl().
   we need to take console_sem there because we modify the very
   important things - size, font maps, etc. we don't want those changes
   to mess with possibly active print outs happening from another CPUs.

2) timer events and workqueue events
   even cursor blinking must take console_sem. because it modifies the
   state of console/screen. take a look at drivers/video/fbdev/core/fbcon.c
   show_cursor_blink() for example.

   and take a look at fbcon_add_cursor_timer() in drivers/video/fbdev/core/fbcon.c

3) foreground console may change. video driver may be be initialized and
   registered.

4) PM events
   for exaple, drivers/video/fbdev/aty/radeon_pm.c   radeonfb_pci_suspend()

5) TTY write from user space
   when user space wants to write anything to console it goes through
   nTTY -> con_write() -> do_con_write().

 CPU: 1 PID: 1 Comm: systemd
 Call Trace:
  do_con_write+0x4c/0x1a5f
  con_write+0xa/0x1d
  n_tty_write+0xdb/0x3c5
  tty_write+0x191/0x223
  n_tty_receive_buf+0x8/0x8
  do_loop_readv_writev.part.23+0x58/0x89
  do_iter_write+0x98/0xb1
  vfs_writev+0x62/0x89

take a look at drivers/tty/vt/vt.c do_con_write()

it does a ton of things. why - because we need to scroll the console;
we need to wrap around the lines; we need to process control characters
- like \r or \n and so on and need to modify the console state accordingly;
we need to do UTF8/ASCII/etc. all of this things cannot run concurrently with
IOCTL that modify the font map or resize the console, or flip it, or rotate
it.

take a look at lf() -> con_scroll() -> fbcon_scroll() // drivers/video/fbdev/core/fbcon.c

we also don't want printk() to mess with do_con_write(). including
printk() from IRQ.

6) even more TTY
   I suspect that TTY may be invoked from IRQ.

7) printk() write  (and occasional ksmg_dump dumpers, e.g. arch/um/kernel/kmsg_dump)

   printk() goes through console_unlock()->vt_console_print().
   and it, basically, must handle all the things that TTY write does.
   handle console chars properly, do scrolling, wrapping, etc. and we
   don't want anthing else to jump in and mess with us at this stage.
   that's why we user console_sem in printk.c - to serialize all the
   events... including concurrent printk() from other CPUs. that's why
   we do console_trylock() in vprintk_emit().

8) even more printk()
   printk() can be called from IRQ. console_sem stops it if some of
   the consoles can't work in IRQ context right now.

9) consoles have notifiers

/*
 * We defer the timer blanking to work queue so it can take the console mutex
 * (console operations can still happen at irq time, but only from printk which
 * has the console mutex. Not perfect yet, but better than no locking
 */
static void blank_screen_t(unsigned long dummy)
{
        blank_timer_expired = 1;
        schedule_work(&console_work);
}

so console_sem is also used to, basically, synchronize IRQs/etc.

10) I suspect that some consoles can do things with console_sem from IRQ
   context.

and so on. we really use console_sem as a big-kernel-lock.

so where console_sem users might sleep? in tons of places...

like ark_pci_suspend()   console_lock(); mutex_lock(par);
or ark_pci_resume()       console_lock(); mutex_lock();
or con_install()          console_lock(); vc_allocate() -> kzalloc(GFP_KERNEL)

and so on and on and on.

and then there are paths that do

        mutex_lock(); schedule();
and another CPU does
        console_lock(); mutex_lock();

so it sleeps on mutex, with locked console_sem, and we can't even print
anything. printk() has to start losing messages at some point and nothing
can it help. except for flush on panic -- we don't care about console_sem
there.

printk() on its own can sleep with console_sem locked:
    - preemption in console_unlock() printing loop.

vprintk_emit()
 console_unlock()
  for (;;) {
    call_console_drivers();
    local_irq_restore()
    <<<<< preemption or cond_resched() >>>>>
  }

if the system is not healthy (OOM, etc.) then preemption in
console_unlock() can block printk messages for a very long
time.

====================

	-ss

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread
  2017-12-04 13:48 [RFC][PATCHv6 00/12] printk: introduce printing kernel thread Sergey Senozhatsky
                   ` (13 preceding siblings ...)
  2017-12-14 14:27 ` [RFC][PATCHv6 00/12] printk: introduce printing kernel thread Petr Mladek
@ 2018-01-05  2:54 ` Sergey Senozhatsky
  14 siblings, 0 replies; 79+ messages in thread
From: Sergey Senozhatsky @ 2018-01-05  2:54 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Steven Rostedt, Petr Mladek, linux-kernel, Sergey Senozhatsky

On (12/04/17 22:48), Sergey Senozhatsky wrote:
> 	A new version, yet another rework. Lots of changes, e.g. hand off
> control based on Steven's patch. Another change is that this time around
> we finally have a kernel module to test printk offloading (YAYY!). The
> module tests a bunch use cases; we also have trace printk events to...
> trace offloading. I'll post the testing script and test module in reply
> to this mail. So... let's have some progress ;) The code is not completely
> awesome, but not tremendously difficult at the same time. We can verify
> the approach/design (we have tests and traces now) first and then start
> improving the code.

a quick note:

the patch set and test_printk module are obsolete.
I have "reworked" both.

	-ss

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread
  2017-12-22  4:19             ` Steven Rostedt
  2017-12-28  6:48               ` Sergey Senozhatsky
@ 2018-01-09 20:06               ` Tejun Heo
  2018-01-09 22:08                 ` Tetsuo Handa
  2018-01-09 22:08                 ` Steven Rostedt
  1 sibling, 2 replies; 79+ messages in thread
From: Tejun Heo @ 2018-01-09 20:06 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Petr Mladek, Sergey Senozhatsky, Jan Kara, Andrew Morton,
	Peter Zijlstra, Rafael Wysocki, Pavel Machek, Tetsuo Handa,
	linux-kernel, Sergey Senozhatsky

Hello, Steven.

My apologies for the late reply.  Was traveling and then got sick.

On Thu, Dec 21, 2017 at 11:19:32PM -0500, Steven Rostedt wrote:
> You don't think handing off printks to an offloaded thread isn't more
> complex nor can it cause even more issues (like more likely to lose
> relevant information on kernel crashes)?

Sergey's patch seems more complex (and probably handles more
requirements) but my original patch was pretty simple.

http://lkml.kernel.org/r/20171102135258.GO3252168@devbig577.frc2.facebook.com

> > static enum hrtimer_restart printk_timerfn(struct hrtimer *timer)
> > {
> > 	int i;
> > 
> > 	if (READ_ONCE(in_printk))
> > 		for (i = 0; i < 10000; i++)
> > 			printk("%-80s\n", "XXX TIMER");
> 
> WTF!
> 
> You are printing 10,000 printk messages from an interrupt context???
> And to top it off, I ran this on my box, switching printk() to
> trace_printk() (which is extremely low overhead). And it is triggered
> on the same CPU that did the printk() itself on. Yeah, there is no hand
> off, because you are doing a shitload of printks on one CPU and nothing
> on any of the other CPUs. This isn't the problem that my patch was set
> out to solve, nor is it a very realistic problem. I added a counter to
> the printk as well, to keep track of how many printks there were:

The code might suck but I think this does replicate what we've been
seeing regularly in the fleet.  The console side is pretty slow - IPMI
faithfully emulating serial console.  I don't know it's doing 115200
or even slower.  Please consider something like the following.

* The kernel isn't preemptible.  Machine runs out of memory, hits OOM
  condition.  It starts printing OOM information.

* Netconsole tries to send out OOM messages and tries memory
  allocation which fails which then prints allocation failed messages.
  Because this happens while already printing, it just queues the
  messages to the buffer.  This repeats.

* We're still in the middle of OOM and hasn't killed nobody, so memory
  keeps being short and the printk ring buffer is continuously getting
  filled by the above.  Also, after a bit, RCU stall warnings kick in
  too producing more messages.

What's happening is that the OOM killer is trapped flushing printk
failing to clear the memory condition and that leads irq / softirq
contexts to produce messages faster than can be flushed.  I don't see
how we'd be able to clear the condition without introducing an
independent context to flush the ring buffer.

Again, this is an actual problem that we've been seeing fairly
regularly in production machines.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread
  2018-01-09 20:06               ` Tejun Heo
@ 2018-01-09 22:08                 ` Tetsuo Handa
  2018-01-09 22:17                   ` Tejun Heo
  2018-01-09 22:08                 ` Steven Rostedt
  1 sibling, 1 reply; 79+ messages in thread
From: Tetsuo Handa @ 2018-01-09 22:08 UTC (permalink / raw)
  To: tj, rostedt
  Cc: pmladek, sergey.senozhatsky, jack, akpm, peterz, rjw, pavel,
	linux-kernel, sergey.senozhatsky.work

Tejun Heo wrote:
> The code might suck but I think this does replicate what we've been
> seeing regularly in the fleet.  The console side is pretty slow - IPMI
> faithfully emulating serial console.  I don't know it's doing 115200
> or even slower.  Please consider something like the following.

Emulated serial consoles tend to be slow.

> 
> * The kernel isn't preemptible.  Machine runs out of memory, hits OOM
>   condition.  It starts printing OOM information.
> 
> * Netconsole tries to send out OOM messages and tries memory
>   allocation which fails which then prints allocation failed messages.
>   Because this happens while already printing, it just queues the
>   messages to the buffer.  This repeats.

What? Does netconsole need to allocate memory when sending? I assume it does not.

> 
> * We're still in the middle of OOM and hasn't killed nobody, so memory
>   keeps being short and the printk ring buffer is continuously getting
>   filled by the above.  Also, after a bit, RCU stall warnings kick in
>   too producing more messages.

And mutex_trylock(&oom_lock) keeps wasting CPU. :-(

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread
  2018-01-09 20:06               ` Tejun Heo
  2018-01-09 22:08                 ` Tetsuo Handa
@ 2018-01-09 22:08                 ` Steven Rostedt
  2018-01-09 22:17                   ` Tejun Heo
  1 sibling, 1 reply; 79+ messages in thread
From: Steven Rostedt @ 2018-01-09 22:08 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Petr Mladek, Sergey Senozhatsky, Jan Kara, Andrew Morton,
	Peter Zijlstra, Rafael Wysocki, Pavel Machek, Tetsuo Handa,
	linux-kernel, Sergey Senozhatsky

On Tue, 9 Jan 2018 12:06:20 -0800
Tejun Heo <tj@kernel.org> wrote:

> What's happening is that the OOM killer is trapped flushing printk
> failing to clear the memory condition and that leads irq / softirq
> contexts to produce messages faster than can be flushed.  I don't see
> how we'd be able to clear the condition without introducing an
> independent context to flush the ring buffer.
> 
> Again, this is an actual problem that we've been seeing fairly
> regularly in production machines.

But your test case is pinned to a single CPU. You have a work queue
that does a printk and triggers an timer interrupt to go off on that
same CPU. Then the timer interrupt does a 10,000 printks, over and over
on the same CPU. Of course that will be an issue, and it is NOT similar
to the scenario that you listed above.

The scenario you listed would affect multiple CPUs and multiple CPUs
would be flooding printk. In that case my patch WILL help. Because the
current method, the first CPU to do the printk will get stuck doing the
printk for ALL OTHER CPUs. With my patch, the printk load will migrate
around and there will not be a single CPU that is stuck.

So no, your test is not realistic.

-- Steve

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread
  2018-01-09 22:08                 ` Steven Rostedt
@ 2018-01-09 22:17                   ` Tejun Heo
  2018-01-09 22:47                     ` Steven Rostedt
  0 siblings, 1 reply; 79+ messages in thread
From: Tejun Heo @ 2018-01-09 22:17 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Petr Mladek, Sergey Senozhatsky, Jan Kara, Andrew Morton,
	Peter Zijlstra, Rafael Wysocki, Pavel Machek, Tetsuo Handa,
	linux-kernel, Sergey Senozhatsky

Hello, Steven.

On Tue, Jan 09, 2018 at 05:08:47PM -0500, Steven Rostedt wrote:
> The scenario you listed would affect multiple CPUs and multiple CPUs
> would be flooding printk. In that case my patch WILL help. Because the
> current method, the first CPU to do the printk will get stuck doing the
> printk for ALL OTHER CPUs. With my patch, the printk load will migrate
> around and there will not be a single CPU that is stuck.

Maybe it can break out eventually but that can take a really long
time.  It's OOM.  Most of userland is waiting for reclaim.  There
isn't all that much going on outside that and there can only be one
CPU which is OOMing.  The kernel isn't gonna be all that chatty.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread
  2018-01-09 22:08                 ` Tetsuo Handa
@ 2018-01-09 22:17                   ` Tejun Heo
  2018-01-11 11:14                     ` Tetsuo Handa
  0 siblings, 1 reply; 79+ messages in thread
From: Tejun Heo @ 2018-01-09 22:17 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: rostedt, pmladek, sergey.senozhatsky, jack, akpm, peterz, rjw,
	pavel, linux-kernel, sergey.senozhatsky.work

On Wed, Jan 10, 2018 at 07:08:30AM +0900, Tetsuo Handa wrote:
> > * Netconsole tries to send out OOM messages and tries memory
> >   allocation which fails which then prints allocation failed messages.
> >   Because this happens while already printing, it just queues the
> >   messages to the buffer.  This repeats.
> 
> What? Does netconsole need to allocate memory when sending? I assume it does not.

A lot of network drivers do, unfortunatley.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread
  2018-01-09 22:17                   ` Tejun Heo
@ 2018-01-09 22:47                     ` Steven Rostedt
  2018-01-09 22:53                       ` Tejun Heo
  0 siblings, 1 reply; 79+ messages in thread
From: Steven Rostedt @ 2018-01-09 22:47 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Petr Mladek, Sergey Senozhatsky, Jan Kara, Andrew Morton,
	Peter Zijlstra, Rafael Wysocki, Pavel Machek, Tetsuo Handa,
	linux-kernel, Sergey Senozhatsky

On Tue, 9 Jan 2018 14:17:05 -0800
Tejun Heo <tj@kernel.org> wrote:

> Hello, Steven.
> 
> On Tue, Jan 09, 2018 at 05:08:47PM -0500, Steven Rostedt wrote:
> > The scenario you listed would affect multiple CPUs and multiple CPUs
> > would be flooding printk. In that case my patch WILL help. Because the
> > current method, the first CPU to do the printk will get stuck doing the
> > printk for ALL OTHER CPUs. With my patch, the printk load will migrate
> > around and there will not be a single CPU that is stuck.  
> 
> Maybe it can break out eventually but that can take a really long
> time.  It's OOM.  Most of userland is waiting for reclaim.  There
> isn't all that much going on outside that and there can only be one
> CPU which is OOMing.  The kernel isn't gonna be all that chatty.

Are you saying that the OOM is stuck printing over and over on a single
CPU. Perhaps we should fix THAT.

-- Steve

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread
  2018-01-09 22:47                     ` Steven Rostedt
@ 2018-01-09 22:53                       ` Tejun Heo
  2018-01-10  7:18                         ` Steven Rostedt
  0 siblings, 1 reply; 79+ messages in thread
From: Tejun Heo @ 2018-01-09 22:53 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Petr Mladek, Sergey Senozhatsky, Jan Kara, Andrew Morton,
	Peter Zijlstra, Rafael Wysocki, Pavel Machek, Tetsuo Handa,
	linux-kernel, Sergey Senozhatsky

Hello, Steven.

On Tue, Jan 09, 2018 at 05:47:50PM -0500, Steven Rostedt wrote:
> > Maybe it can break out eventually but that can take a really long
> > time.  It's OOM.  Most of userland is waiting for reclaim.  There
> > isn't all that much going on outside that and there can only be one
> > CPU which is OOMing.  The kernel isn't gonna be all that chatty.
> 
> Are you saying that the OOM is stuck printing over and over on a single
> CPU. Perhaps we should fix THAT.

I'm not sure what you meant but OOM code isn't doing anything bad
other than excluding others from doing OOM kills simultaneously, which
is what we want, and printing a lot of messages and then gets caught
up in a positive feedback loop.

To me, the whole point of this effort is preventing printk messages
from causing significant or critical disruptions to overall system
operation.  IOW, it's rather dumb if the machine goes down because
somebody printk'd wrong or just failed to foresee the combinations of
events which could lead to such conditions.

It's not like we don't know how to fix this either.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread
  2018-01-09 22:53                       ` Tejun Heo
@ 2018-01-10  7:18                         ` Steven Rostedt
  2018-01-10 14:04                           ` Tejun Heo
  0 siblings, 1 reply; 79+ messages in thread
From: Steven Rostedt @ 2018-01-10  7:18 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Petr Mladek, Sergey Senozhatsky, Jan Kara, Andrew Morton,
	Peter Zijlstra, Rafael Wysocki, Pavel Machek, Tetsuo Handa,
	linux-kernel, Sergey Senozhatsky

On Tue, 9 Jan 2018 14:53:56 -0800
Tejun Heo <tj@kernel.org> wrote:

> Hello, Steven.
> 
> On Tue, Jan 09, 2018 at 05:47:50PM -0500, Steven Rostedt wrote:
> > > Maybe it can break out eventually but that can take a really long
> > > time.  It's OOM.  Most of userland is waiting for reclaim.  There
> > > isn't all that much going on outside that and there can only be one
> > > CPU which is OOMing.  The kernel isn't gonna be all that chatty.  
> > 
> > Are you saying that the OOM is stuck printing over and over on a single
> > CPU. Perhaps we should fix THAT.  
> 
> I'm not sure what you meant but OOM code isn't doing anything bad

My point is, that your test is only hammering at a single CPU. You say
it is the scenario you see, which means that the OOM is printing out
more than it should, because if it prints it out once, it should not
print it out again for the same process, or go into a loop doing it
over and over on a single CPU. That would be a bug in the
implementation.

> other than excluding others from doing OOM kills simultaneously, which
> is what we want, and printing a lot of messages and then gets caught
> up in a positive feedback loop.
> 
> To me, the whole point of this effort is preventing printk messages
> from causing significant or critical disruptions to overall system
> operation.

I agree, and my patch helps with this tremendously, if we are not doing
something stupid like printk thousands of times in an interrupt
handler, over and over on a single CPU.

>  IOW, it's rather dumb if the machine goes down because
> somebody printk'd wrong or just failed to foresee the combinations of
> events which could lead to such conditions.

I still like to see a trace of a real situation.

> 
> It's not like we don't know how to fix this either.

But we don't want the fix to introduce regressions, and offloading
printk does. Heck, the current fixes to printk has causes issues for me
in my own debugging. Like we can no longer do large dumps of printk from
NMI context. Which I use to do when detecting a lock up and then doing
a task list dump of all tasks. Or even a ftrace_dump_on_oops.

http://lkml.kernel.org/r/20180109162019.GL3040@hirez.programming.kicks-ass.net

-- Steve

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread
  2018-01-10  7:18                         ` Steven Rostedt
@ 2018-01-10 14:04                           ` Tejun Heo
  0 siblings, 0 replies; 79+ messages in thread
From: Tejun Heo @ 2018-01-10 14:04 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Petr Mladek, Sergey Senozhatsky, Jan Kara, Andrew Morton,
	Peter Zijlstra, Rafael Wysocki, Pavel Machek, Tetsuo Handa,
	linux-kernel, Sergey Senozhatsky

Hello, Steven.

On Wed, Jan 10, 2018 at 02:18:27AM -0500, Steven Rostedt wrote:
> My point is, that your test is only hammering at a single CPU. You say
> it is the scenario you see, which means that the OOM is printing out
> more than it should, because if it prints it out once, it should not
> print it out again for the same process, or go into a loop doing it
> over and over on a single CPU. That would be a bug in the
> implementation.

That's not what's happening.  You're not actually reading what I'm
writing.  Can you please go back and re-read the scenario I've been
describing over and over again.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread
  2018-01-09 22:17                   ` Tejun Heo
@ 2018-01-11 11:14                     ` Tetsuo Handa
  0 siblings, 0 replies; 79+ messages in thread
From: Tetsuo Handa @ 2018-01-11 11:14 UTC (permalink / raw)
  To: tj
  Cc: rostedt, pmladek, sergey.senozhatsky, jack, akpm, peterz, rjw,
	pavel, linux-kernel, sergey.senozhatsky.work

Tejun Heo wrote:
> On Wed, Jan 10, 2018 at 07:08:30AM +0900, Tetsuo Handa wrote:
> > > * Netconsole tries to send out OOM messages and tries memory
> > >   allocation which fails which then prints allocation failed messages.
> > >   Because this happens while already printing, it just queues the
> > >   messages to the buffer.  This repeats.
> > 
> > What? Does netconsole need to allocate memory when sending? I assume it does not.
> 
> A lot of network drivers do, unfortunatley.
> 

Excuse me, but can you show me an example of such traces?

Any path which are called from printk() must not (directly or indirectly)
depend on __GFP_DIRECT_RECLAIM && !__GFP_NORETRY memory allocation.
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/mm/page_alloc.c?id=e746bf730a76fe53b82c9e6b6da72d58e9ae3565
If it depends on such memory allocation, it can cause OOM lockup.

^ permalink raw reply	[flat|nested] 79+ messages in thread

end of thread, other threads:[~2018-01-11 11:14 UTC | newest]

Thread overview: 79+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-12-04 13:48 [RFC][PATCHv6 00/12] printk: introduce printing kernel thread Sergey Senozhatsky
2017-12-04 13:48 ` [RFC][PATCHv6 01/12] printk: move printk_pending out of per-cpu Sergey Senozhatsky
2017-12-04 13:48 ` [RFC][PATCHv6 02/12] printk: introduce printing kernel thread Sergey Senozhatsky
2017-12-04 13:48 ` [RFC][PATCHv6 03/12] printk: consider watchdogs thresholds for offloading Sergey Senozhatsky
2017-12-04 13:48 ` [RFC][PATCHv6 04/12] printk: add sync printk_emergency API Sergey Senozhatsky
2017-12-04 13:48 ` [RFC][PATCHv6 05/12] printk: enable printk offloading Sergey Senozhatsky
2017-12-04 13:48 ` [RFC][PATCHv6 06/12] PM: switch between printk emergency modes Sergey Senozhatsky
2017-12-04 13:48 ` [RFC][PATCHv6 07/12] printk: register syscore notifier Sergey Senozhatsky
2017-12-04 13:48 ` [RFC][PATCHv6 08/12] printk: force printk_kthread to offload printing Sergey Senozhatsky
2017-12-04 13:48 ` [RFC][PATCHv6 09/12] printk: do not cond_resched() when we can offload Sergey Senozhatsky
2017-12-04 13:48 ` [RFC][PATCHv6 10/12] printk: move offloading logic to per-cpu Sergey Senozhatsky
2017-12-04 13:48 ` [RFC][PATCHv6 11/12] printk: add offloading watchdog API Sergey Senozhatsky
2017-12-04 13:48 ` [RFC][PATCHv6 12/12] printk: improve printk offloading mechanism Sergey Senozhatsky
2017-12-04 13:53 ` [PATCH 0/4] printk: offloading testing module/trace events Sergey Senozhatsky
2017-12-04 13:53   ` [PATCH 1/4] printk/lib: add offloading trace events and test_printk module Sergey Senozhatsky
2017-12-04 13:53   ` [PATCH 2/4] printk/lib: simulate slow consoles Sergey Senozhatsky
2017-12-04 13:53   ` [PATCH 3/4] printk: add offloading takeover traces Sergey Senozhatsky
2017-12-04 13:53   ` [PATCH 4/4] printk: add task name and CPU to console messages Sergey Senozhatsky
2017-12-14 14:27 ` [RFC][PATCHv6 00/12] printk: introduce printing kernel thread Petr Mladek
2017-12-14 14:39   ` Sergey Senozhatsky
2017-12-15 15:55     ` Steven Rostedt
2017-12-14 15:25   ` Tejun Heo
2017-12-14 17:55     ` Steven Rostedt
2017-12-14 18:11       ` Tejun Heo
2017-12-14 18:21         ` Steven Rostedt
2017-12-22  0:09           ` Tejun Heo
2017-12-22  4:19             ` Steven Rostedt
2017-12-28  6:48               ` Sergey Senozhatsky
2017-12-28 10:07                 ` Sergey Senozhatsky
2017-12-29 13:59                   ` Tetsuo Handa
2017-12-31  1:44                     ` Sergey Senozhatsky
2018-01-09 20:06               ` Tejun Heo
2018-01-09 22:08                 ` Tetsuo Handa
2018-01-09 22:17                   ` Tejun Heo
2018-01-11 11:14                     ` Tetsuo Handa
2018-01-09 22:08                 ` Steven Rostedt
2018-01-09 22:17                   ` Tejun Heo
2018-01-09 22:47                     ` Steven Rostedt
2018-01-09 22:53                       ` Tejun Heo
2018-01-10  7:18                         ` Steven Rostedt
2018-01-10 14:04                           ` Tejun Heo
2017-12-15  2:10         ` Sergey Senozhatsky
2017-12-15  3:18           ` Steven Rostedt
2017-12-15  5:06             ` Sergey Senozhatsky
2017-12-15  6:52               ` Sergey Senozhatsky
2017-12-15 15:39                 ` Steven Rostedt
2017-12-15  8:31               ` Petr Mladek
2017-12-15  8:42                 ` Sergey Senozhatsky
2017-12-15  9:08                   ` Petr Mladek
2017-12-15 15:47                     ` Steven Rostedt
2017-12-18  9:36                     ` Sergey Senozhatsky
2017-12-18 10:36                       ` Sergey Senozhatsky
2017-12-18 12:35                         ` Sergey Senozhatsky
2017-12-18 13:51                         ` Petr Mladek
2017-12-18 13:31                       ` Petr Mladek
2017-12-18 13:39                         ` Sergey Senozhatsky
2017-12-18 14:13                           ` Petr Mladek
2017-12-18 17:46                             ` Steven Rostedt
2017-12-19  1:03                               ` Sergey Senozhatsky
2017-12-19  1:08                                 ` Steven Rostedt
2017-12-19  1:24                                   ` Sergey Senozhatsky
2017-12-19  2:03                                     ` Steven Rostedt
2017-12-19  2:46                                       ` Sergey Senozhatsky
2017-12-19  3:38                                         ` Steven Rostedt
2017-12-19  4:58                                           ` Sergey Senozhatsky
2017-12-19 14:40                                             ` Steven Rostedt
2017-12-20  7:46                                               ` Sergey Senozhatsky
2017-12-19 14:31                                     ` Michal Hocko
2017-12-20  7:10                                       ` Sergey Senozhatsky
2017-12-20 12:06                                         ` Tetsuo Handa
2017-12-21  6:52                                           ` Sergey Senozhatsky
2017-12-19  4:36                               ` Sergey Senozhatsky
2017-12-18 14:10                         ` Petr Mladek
2017-12-19  1:09                           ` Sergey Senozhatsky
2017-12-15 15:42                 ` Steven Rostedt
2017-12-15 15:19               ` Steven Rostedt
2017-12-19  0:52                 ` Sergey Senozhatsky
2017-12-19  1:03                   ` Steven Rostedt
2018-01-05  2:54 ` Sergey Senozhatsky

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.