[RFC][PATCHv3 0/5] printk: introduce printing kernel thread

All of lore.kernel.org
 help / color / mirror / Atom feed

* [RFC][PATCHv3 0/5] printk: introduce printing kernel thread
@ 2017-05-09  8:28 Sergey Senozhatsky
  2017-05-09  8:28 ` [RFC][PATCHv3 1/5] printk: move printk_pending out of per-cpu Sergey Senozhatsky
                   ` (4 more replies)
  0 siblings, 5 replies; 51+ messages in thread
From: Sergey Senozhatsky @ 2017-05-09  8:28 UTC (permalink / raw)
  To: Petr Mladek, Steven Rostedt
  Cc: Jan Kara, Andrew Morton, Peter Zijlstra, Rafael J . Wysocki,
	Eric Biederman, Greg Kroah-Hartman, Jiri Slaby, Pavel Machek,
	Andreas Mohr, Tetsuo Handa, linux-kernel, Sergey Senozhatsky,
	Sergey Senozhatsky

Hello,


	RFC


	This patch set adds a printk() kernel thread which lets us to
print kernel messages to the console from a non-atomic/schedule-able
context, avoiding different sort of lockups, stalls, etc.


	A completely reworked version, for more details please
see 0003 commit message and code comments/documentation.


v2->v3 (Petr, Pavel, Andreas):
-- rework offloading
-- use PM notifiers
-- dropped some patches, etc. etc.

v1->v2:
-- introduce printk_emergency mode and API to switch it on/off
-- move printk_pending out of per-CPU memory
-- add printk emergency_mode sysfs node
-- switch sysrq handlers (some of them) to printk_emergency
-- cleanus/etc.

Sergey Senozhatsky (5):
  printk: move printk_pending out of per-cpu
  printk: introduce printing kernel thread
  printk: add enforce_emergency parameter
  printk: enable printk offloading
  printk: register PM notifier

 include/linux/console.h |   3 +
 kernel/printk/printk.c  | 296 ++++++++++++++++++++++++++++++++++++++++++++----
 2 files changed, 279 insertions(+), 20 deletions(-)

-- 
2.12.2

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [RFC][PATCHv3 1/5] printk: move printk_pending out of per-cpu
  2017-05-09  8:28 [RFC][PATCHv3 0/5] printk: introduce printing kernel thread Sergey Senozhatsky
@ 2017-05-09  8:28 ` Sergey Senozhatsky
  2017-05-25 12:11   ` Petr Mladek
  2017-05-09  8:28 ` [RFC][PATCHv3 2/5] printk: introduce printing kernel thread Sergey Senozhatsky
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 51+ messages in thread
From: Sergey Senozhatsky @ 2017-05-09  8:28 UTC (permalink / raw)
  To: Petr Mladek, Steven Rostedt
  Cc: Jan Kara, Andrew Morton, Peter Zijlstra, Rafael J . Wysocki,
	Eric Biederman, Greg Kroah-Hartman, Jiri Slaby, Pavel Machek,
	Andreas Mohr, Tetsuo Handa, linux-kernel, Sergey Senozhatsky,
	Sergey Senozhatsky

Do not keep `printk_pending' in per-CPU area. We set the following bits
of printk_pending:
a) PRINTK_PENDING_WAKEUP
	when we need to wakeup klogd
b) PRINTK_PENDING_OUTPUT
	when there is a pending output from deferred printk and we need
	to call console_unlock().

So none of the bits control/represent a state of a particular CPU and,
basically, they should be global instead.

Besides we will use `printk_pending' to control printk kthread, so this
patch is also a preparation work.

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Suggested-by: Petr Mladek <pmladek@suse.com>
---
 kernel/printk/printk.c | 26 ++++++++++++--------------
 1 file changed, 12 insertions(+), 14 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index a1aecf44ab07..2cb7f4753b76 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -401,6 +401,14 @@ DEFINE_RAW_SPINLOCK(logbuf_lock);
 		printk_safe_exit_irqrestore(flags);	\
 	} while (0)
 
+/*
+ * Delayed printk version, for scheduler-internal messages:
+ */
+#define PRINTK_PENDING_WAKEUP	0x01
+#define PRINTK_PENDING_OUTPUT	0x02
+
+static unsigned long printk_pending;
+
 #ifdef CONFIG_PRINTK
 DECLARE_WAIT_QUEUE_HEAD(log_wait);
 /* the next printk record to read by syslog(READ) or /proc/kmsg */
@@ -2709,25 +2717,15 @@ static int __init printk_late_init(void)
 late_initcall(printk_late_init);
 
 #if defined CONFIG_PRINTK
-/*
- * Delayed printk version, for scheduler-internal messages:
- */
-#define PRINTK_PENDING_WAKEUP	0x01
-#define PRINTK_PENDING_OUTPUT	0x02
-
-static DEFINE_PER_CPU(int, printk_pending);
-
 static void wake_up_klogd_work_func(struct irq_work *irq_work)
 {
-	int pending = __this_cpu_xchg(printk_pending, 0);
-
-	if (pending & PRINTK_PENDING_OUTPUT) {
+	if (test_and_clear_bit(PRINTK_PENDING_OUTPUT, &printk_pending)) {
 		/* If trylock fails, someone else is doing the printing */
 		if (console_trylock())
 			console_unlock();
 	}
 
-	if (pending & PRINTK_PENDING_WAKEUP)
+	if (test_and_clear_bit(PRINTK_PENDING_WAKEUP, &printk_pending))
 		wake_up_interruptible(&log_wait);
 }
 
@@ -2740,7 +2738,7 @@ void wake_up_klogd(void)
 {
 	preempt_disable();
 	if (waitqueue_active(&log_wait)) {
-		this_cpu_or(printk_pending, PRINTK_PENDING_WAKEUP);
+		set_bit(PRINTK_PENDING_WAKEUP, &printk_pending);
 		irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
 	}
 	preempt_enable();
@@ -2756,7 +2754,7 @@ int printk_deferred(const char *fmt, ...)
 	r = vprintk_emit(0, LOGLEVEL_SCHED, NULL, 0, fmt, args);
 	va_end(args);
 
-	__this_cpu_or(printk_pending, PRINTK_PENDING_OUTPUT);
+	set_bit(PRINTK_PENDING_OUTPUT, &printk_pending);
 	irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
 	preempt_enable();
 
-- 
2.12.2

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [RFC][PATCHv3 2/5] printk: introduce printing kernel thread
  2017-05-09  8:28 [RFC][PATCHv3 0/5] printk: introduce printing kernel thread Sergey Senozhatsky
  2017-05-09  8:28 ` [RFC][PATCHv3 1/5] printk: move printk_pending out of per-cpu Sergey Senozhatsky
@ 2017-05-09  8:28 ` Sergey Senozhatsky
  2017-05-10  5:59   ` Sergey Senozhatsky
  2017-05-09  8:28 ` [RFC][PATCHv3 3/5] printk: add enforce_emergency parameter Sergey Senozhatsky
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 51+ messages in thread
From: Sergey Senozhatsky @ 2017-05-09  8:28 UTC (permalink / raw)
  To: Petr Mladek, Steven Rostedt
  Cc: Jan Kara, Andrew Morton, Peter Zijlstra, Rafael J . Wysocki,
	Eric Biederman, Greg Kroah-Hartman, Jiri Slaby, Pavel Machek,
	Andreas Mohr, Tetsuo Handa, linux-kernel, Sergey Senozhatsky,
	Sergey Senozhatsky

printk() is quite complex internally and, basically, it does two
slightly independent things:
 a) adds a new message to a kernel log buffer (log_store())
 b) prints kernel log messages to serial consoles (console_unlock())

while (a) is guaranteed to be executed by printk(), (b) is not, for a
variety of reasons, and, unlike log_store(), it comes at a price:

 1) console_unlock() attempts to flush all pending kernel log messages
to the console and it can loop indefinitely.

 2) while console_unlock() is executed on one particular CPU, printing
pending kernel log messages, other CPUs can simultaneously append new
messages to the kernel log buffer.

 3) the time it takes console_unlock() to print kernel messages also
depends on the speed of the console -- which may not be fast at all.

 4) console_unlock() is executed in the same context as printk(), so
it may be non-preemptible/atomic, which makes 1)-3) dangerous.

As a result, nobody knows how long a printk() call will take, so
it's not really safe to call printk() in a number of situations,
including atomic context, RCU critical sections, interrupt context,
etc.

This patch introduces a '/sys/module/printk/parameters/atomic_print_limit'
sysfs param, which sets the limit on number of lines a process can print
from console_unlock(). Value 0 corresponds to the current behavior (no
limitation). The printing offloading is happening from console_unlock()
function and, briefly, looks as follows: as soon as process prints more
than `atomic_print_limit' lines it attempts to offload printing to another
process. Since nothing guarantees that there will another process sleeping
on the console_sem or calling printk() on another CPU simultaneously, the
patch also introduces an auxiliary kernel thread - printk_kthread, the
main purpose of which is to take over printing duty. The workflow is, thus,
turns into: as soon as process prints more than `atomic_print_limit' lines
it wakes up printk_kthread and unlocks the console_sem. So in the best case
at this point there will be at least 1 processes trying to lock the
console_sem: printk_kthread. (There can also be a process that was sleeping
on the console_sem and that was woken up by console semaphore up(); and
concurrent printk() invocations from other CPUs). But in the worst case
there won't be any processes ready to take over the printing duty: it
may take printk_kthread some time to become running; or printk_kthread
may even never become running (a misbehaving scheduler, or other critical
condition). That's why after we wake_up() printk_kthread we can't
immediately leave the printing loop, we must ensure that the console_sem
has a new owner before we do so. Therefore, `atomic_print_limit' is a soft
limit, not the hard one: we let task to overrun `atomic_print_limit'. But,
at the same time, the console_unlock() printing loop behaves differently
for tasks that have exceeded `atomic_print_limit': after every printed
logbuf entry (call_console_drivers()) such a process wakes up printk_kthread,
unlocks the console_sem and attempts to console_trylock() a bit later (if
there any are pending messages in the logbuf, of course). In the best case
scenario either printk_kthread or some other tasks will lock the console_sem,
so current printing task will see failed console_trylock(), which will
indicate a successful printing offloading. In the worst case, however,
current will successfully console_trylock(), which will indicate that
offloading did not take place and we can't return from console_unlock(),
so the printing task will print one more line from the logbuf and attempt
to offload printing once again; and it will continue doing so until another
process locks the console_sem or until there are pending messages in the
logbuf. So if everything goes wrong - we can't wakeup printk_kthread and
there are no other processes sleeping on the console_sem or trying to down()
it - then we will have the existing console_unlock() behavior: print all
pending messages in one shot.

We track the number of unsuccessful offloading attempts and after some
point we declare a `printk_emergency' condition and give up trying to
offload. `printk_emergency' is a new name for printk mode in which printk
does not attempt to offload printing, but instead flushes all the pending
logbuf messages (basically, the existing behaviour).

There also other cases when we can't (or should avoid) offloading. For
example, we can't call into the scheduler from panic(), because this may
cause deadlock. Therefore printk() has some additional places where it
switches to printk_emergency mode: for instance, once a EMERG log level
message appears in the log buffer. We also provide two new functions
that can be used when a path needs to declare a temporal
`printk_emergency' mode:

 -- printk_emergency_begin()
    Disables printk offloading. All printk() calls will attempt
    to lock the console_sem and, if successful, flush kernel log
    messages.

 -- printk_emergency_end()
    Enables printk offloading.

Offloading is not possible yet, it will be enabled in a later patch.

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
---
 include/linux/console.h |   3 +
 kernel/printk/printk.c  | 203 ++++++++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 198 insertions(+), 8 deletions(-)

diff --git a/include/linux/console.h b/include/linux/console.h
index b8920a031a3e..07005db4c788 100644
--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -187,6 +187,9 @@ extern bool console_suspend_enabled;
 extern void suspend_console(void);
 extern void resume_console(void);
 
+extern void printk_emergency_begin(void);
+extern void printk_emergency_end(void);
+
 int mda_console_init(void);
 void prom_con_init(void);
 
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 2cb7f4753b76..a113f684066c 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -48,6 +48,7 @@
 #include <linux/sched/clock.h>
 #include <linux/sched/debug.h>
 #include <linux/sched/task_stack.h>
+#include <linux/kthread.h>
 
 #include <linux/uaccess.h>
 #include <asm/sections.h>
@@ -402,7 +403,8 @@ DEFINE_RAW_SPINLOCK(logbuf_lock);
 	} while (0)
 
 /*
- * Delayed printk version, for scheduler-internal messages:
+ * Used both for deferred printk version (scheduler-internal messages)
+ * and printk_kthread control.
  */
 #define PRINTK_PENDING_WAKEUP	0x01
 #define PRINTK_PENDING_OUTPUT	0x02
@@ -445,6 +447,53 @@ static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN);
 static char *log_buf = __log_buf;
 static u32 log_buf_len = __LOG_BUF_LEN;
 
+static struct task_struct *printk_kthread __read_mostly;
+/*
+ * We can't call into the scheduler (wake_up() printk kthread) during
+ * suspend/kexec/etc. This temporarily switches printk to old behaviour.
+ */
+static atomic_t printk_emergency __read_mostly;
+/*
+ * Disable printk_kthread permanently. Unlike `oops_in_progress'
+ * it doesn't go back to 0.
+ */
+static bool printk_enforce_emergency __read_mostly;
+/*
+ * The number of lines a task can print before offloading printing
+ * job to printk_kthread. 0 indicates 'no limit'.
+ */
+static unsigned int atomic_print_limit;
+
+module_param_named(atomic_print_limit, atomic_print_limit, uint, 0644);
+MODULE_PARM_DESC(atomic_print_limit,
+		 "max lines to print before offloading to printk kthread");
+
+static inline bool printk_offloading_enabled(void)
+{
+	return atomic_print_limit &&
+		printk_kthread &&
+		!printk_enforce_emergency &&
+		atomic_read(&printk_emergency) == 0;
+}
+
+/*
+ * This disables printing offloading and instead attempts
+ * to do the usual console_trylock()->console_unlock().
+ *
+ * Note, this does not stop the printk_kthread if it's already
+ * printing logbuf messages.
+ */
+void printk_emergency_begin(void)
+{
+	atomic_inc(&printk_emergency);
+}
+
+/* This re-enables printk_kthread offloading. */
+void printk_emergency_end(void)
+{
+	atomic_dec(&printk_emergency);
+}
+
 /* Return log buffer address */
 char *log_buf_addr_get(void)
 {
@@ -1765,6 +1814,15 @@ asmlinkage int vprintk_emit(int facility, int level,
 
 	printed_len += log_output(facility, level, lflags, dict, dictlen, text, text_len);
 
+	/*
+	 * Emergency level indicates that the system is unstable and, thus,
+	 * we better stop relying on wake_up(printk_kthread) and try to do
+	 * a direct printing.
+	 */
+	if (level == LOGLEVEL_EMERG)
+		printk_enforce_emergency = true;
+
+	set_bit(PRINTK_PENDING_OUTPUT, &printk_pending);
 	logbuf_unlock_irqrestore(flags);
 
 	/* If called from the scheduler, we can not call up(). */
@@ -1882,6 +1940,9 @@ static size_t msg_print_text(const struct printk_log *msg,
 			     bool syslog, char *buf, size_t size) { return 0; }
 static bool suppress_message_printing(int level) { return false; }
 
+void printk_emergency_begin(void) {}
+void printk_emergency_end(void) {}
+
 #endif /* CONFIG_PRINTK */
 
 #ifdef CONFIG_EARLY_PRINTK
@@ -2155,6 +2216,85 @@ static inline int can_use_console(void)
 	return cpu_online(raw_smp_processor_id()) || have_callable_console();
 }
 
+/*
+ * Under heavy printing load or with a slow serial console (or both)
+ * console_unlock() can stall CPUs, which can result in soft/hard-lockups,
+ * lost interrupts, RCU stalls, etc. Therefore we attempt to limit the
+ * number of lines a process can print from console_unlock().
+ *
+ * There is one more reason to do offloading - there might be other processes
+ * (including user space tasks) sleeping on the console_sem in uninterruptible
+ * state; and keeping the console_sem locked for long time may have a negative
+ * impact on them.
+ *
+ * This function must be called from 'printk_safe' context under
+ * console_sem lock.
+ */
+static inline bool console_offload_printing(void)
+{
+	static struct task_struct *printing_task;
+	static unsigned long long lines_printed;
+	static unsigned long saved_csw;
+
+	if (!printk_offloading_enabled())
+		return false;
+
+	if (system_state != SYSTEM_RUNNING || oops_in_progress)
+		return false;
+
+	/* A new task - reset the counters. */
+	if (printing_task != current) {
+		lines_printed = 0;
+		printing_task = current;
+		saved_csw = current->nvcsw + current->nivcsw;
+		return false;
+	}
+
+	lines_printed++;
+	if (lines_printed < atomic_print_limit)
+		return false;
+
+	if (current == printk_kthread) {
+		/*
+		 * Reset the `lines_printed' counter, just in case if
+		 * printk_kthread is the only process left that would
+		 * down() console_sem and call console_unlock().
+		 */
+		lines_printed = 0;
+		return true;
+	}
+
+	/*
+	 * A trivial emergency enforcement - give up on printk_kthread if
+	 * we can't wake it up. This assumes that `atomic_print_limit' is
+	 * reasonably and sufficiently large.
+	 */
+	if (lines_printed > 10 * (unsigned long long)atomic_print_limit &&
+			saved_csw == (current->nvcsw + current->nivcsw)) {
+		printk_enforce_emergency = true;
+		pr_crit("Declaring printk emergency mode.\n");
+		return false;
+	}
+
+	/*
+	 * Wakeup `printk_kthread' but don't reset the `lines_printed'
+	 * counter, let task overrun the limit. The idea is that
+	 *
+	 *   a) woken up printk_kthread (if succeeded)
+	 * or
+	 *   b) concurrent printk from another CPU or a task from console_sem
+	 *      wait list (if any)
+	 *
+	 * will take over, change `printing_task' and reset the counter.
+	 * If neither a) nor b) happens - we continue printing from
+	 * current process. Which is bad and can be risky, but since
+	 * we couldn't wake_up() printk_kthread, the things already
+	 * don't look normal.
+	 */
+	wake_up_process(printk_kthread);
+	return true;
+}
+
 /**
  * console_unlock - unlock the console system
  *
@@ -2177,8 +2317,17 @@ void console_unlock(void)
 	unsigned long flags;
 	bool wake_klogd = false;
 	bool do_cond_resched, retry;
+	bool did_offload;
 
 	if (console_suspended) {
+		/*
+		 * Here and later, we need to clear the PENDING_OUTPUT bit
+		 * in order to avoid an infinite loop in printk_kthread
+		 * function when console_unlock() cannot flush messages
+		 * because we suspended consoles. Someone else will print
+		 * the messages from resume_console().
+		 */
+		clear_bit(PRINTK_PENDING_OUTPUT, &printk_pending);
 		up_console_sem();
 		return;
 	}
@@ -2200,6 +2349,7 @@ void console_unlock(void)
 	do_cond_resched = console_may_schedule;
 again:
 	console_may_schedule = 0;
+	did_offload = 0;
 
 	/*
 	 * We released the console_sem lock, so we need to recheck if
@@ -2207,6 +2357,7 @@ void console_unlock(void)
 	 * console.
 	 */
 	if (!can_use_console()) {
+		clear_bit(PRINTK_PENDING_OUTPUT, &printk_pending);
 		console_locked = 0;
 		up_console_sem();
 		return;
@@ -2235,7 +2386,7 @@ void console_unlock(void)
 			len = 0;
 		}
 skip:
-		if (console_seq == log_next_seq)
+		if (did_offload || console_seq == log_next_seq)
 			break;
 
 		msg = log_from_idx(console_idx);
@@ -2267,9 +2418,38 @@ void console_unlock(void)
 		stop_critical_timings();	/* don't trace print latency */
 		call_console_drivers(ext_text, ext_len, text, len);
 		start_critical_timings();
+
+		/*
+		 * Sometimes current task may re-acquire console_sem ahead of
+		 * just woken up printk_kthread, because we let processes to
+		 * compete for console_sem. In this case current task will jump
+		 * to `again' label (if there are any pending messages), print
+		 * one more line, break out of printing loop (we don't reset
+		 * `lines_printed' after wake_up_process() exactly for this
+		 * purpose) and do up_console_sem(); this will be repeated as
+		 * long as current task successfully console_trylock()-s.
+		 *
+		 * Eventually and hopefully, either printk_kthread or some
+		 * other task will acquire console_sem and current process will
+		 * be able to leave console_unlock(). Until this happens we
+		 * cannot be sure that anyone will take over and flush the
+		 * `logbuf' messages. So in the worst case scenario we,
+		 * basically, have old printk()/console_unlock() behaviour:
+		 * a task prints all pending logbuf messages.
+		 *
+		 * If printk_kthread never wakes up (which may indicate that
+		 * the system is unstable or something weird is going on),
+		 * then we keep jumping to `again' label printing one message
+		 * from the logbuf each time. This is a bit ugly, but at least
+		 * we will print out the logbuf. If such condition occurs,
+		 * console_offload_printing() should declare `printk_emergency'
+		 * at some point and we will stop all attempts to offload
+		 * printing.
+		 */
+		did_offload = console_offload_printing();
 		printk_safe_exit_irqrestore(flags);
 
-		if (do_cond_resched)
+		if (!did_offload && do_cond_resched)
 			cond_resched();
 	}
 	console_locked = 0;
@@ -2284,12 +2464,16 @@ void console_unlock(void)
 
 	/*
 	 * Someone could have filled up the buffer again, so re-check if there's
-	 * something to flush. In case we cannot trylock the console_sem again,
-	 * there's a new owner and the console_unlock() from them will do the
-	 * flush, no worries.
+	 * something to flush. Try to lock the console_sem even if we
+	 * `did_offload', because we need to know for sure that some other task
+	 * has acquired the console_sem. In case we cannot trylock the
+	 * console_sem again, there's a new owner and the console_unlock() from
+	 * them will do the flush, no worries.
 	 */
 	raw_spin_lock(&logbuf_lock);
 	retry = console_seq != log_next_seq;
+	if (!retry)
+		clear_bit(PRINTK_PENDING_OUTPUT, &printk_pending);
 	raw_spin_unlock(&logbuf_lock);
 	printk_safe_exit_irqrestore(flags);
 
@@ -2719,8 +2903,11 @@ late_initcall(printk_late_init);
 #if defined CONFIG_PRINTK
 static void wake_up_klogd_work_func(struct irq_work *irq_work)
 {
-	if (test_and_clear_bit(PRINTK_PENDING_OUTPUT, &printk_pending)) {
-		/* If trylock fails, someone else is doing the printing */
+	if (test_bit(PRINTK_PENDING_OUTPUT, &printk_pending)) {
+		/*
+		 * If trylock fails, someone else is doing the printing.
+		 * PRINTK_PENDING_OUTPUT bit is cleared by console_unlock().
+		 */
 		if (console_trylock())
 			console_unlock();
 	}
-- 
2.12.2

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [RFC][PATCHv3 3/5] printk: add enforce_emergency parameter
  2017-05-09  8:28 [RFC][PATCHv3 0/5] printk: introduce printing kernel thread Sergey Senozhatsky
  2017-05-09  8:28 ` [RFC][PATCHv3 1/5] printk: move printk_pending out of per-cpu Sergey Senozhatsky
  2017-05-09  8:28 ` [RFC][PATCHv3 2/5] printk: introduce printing kernel thread Sergey Senozhatsky
@ 2017-05-09  8:28 ` Sergey Senozhatsky
  2017-05-09  8:28 ` [RFC][PATCHv3 4/5] printk: enable printk offloading Sergey Senozhatsky
  2017-05-09  8:28 ` [RFC][PATCHv3 5/5] printk: register PM notifier Sergey Senozhatsky
  4 siblings, 0 replies; 51+ messages in thread
From: Sergey Senozhatsky @ 2017-05-09  8:28 UTC (permalink / raw)
  To: Petr Mladek, Steven Rostedt
  Cc: Jan Kara, Andrew Morton, Peter Zijlstra, Rafael J . Wysocki,
	Eric Biederman, Greg Kroah-Hartman, Jiri Slaby, Pavel Machek,
	Andreas Mohr, Tetsuo Handa, linux-kernel, Sergey Senozhatsky,
	Sergey Senozhatsky

This param permits user-space to forcibly on/off printk emergency
mode via /sys/module/printk/parameters/enforce_emergency node.

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
---
 kernel/printk/printk.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index a113f684066c..ba640fbcee7c 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -455,9 +455,15 @@ static struct task_struct *printk_kthread __read_mostly;
 static atomic_t printk_emergency __read_mostly;
 /*
  * Disable printk_kthread permanently. Unlike `oops_in_progress'
- * it doesn't go back to 0.
+ * it doesn't go back to 0 (unless set by user-space).
  */
 static bool printk_enforce_emergency __read_mostly;
+
+module_param_named(enforce_emergency, printk_enforce_emergency,
+		    bool, 0644);
+MODULE_PARM_DESC(printk_enforce_emergency,
+		 "don't offload message printing to printk kthread");
+
 /*
  * The number of lines a task can print before offloading printing
  * job to printk_kthread. 0 indicates 'no limit'.
-- 
2.12.2

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [RFC][PATCHv3 4/5] printk: enable printk offloading
  2017-05-09  8:28 [RFC][PATCHv3 0/5] printk: introduce printing kernel thread Sergey Senozhatsky
                   ` (2 preceding siblings ...)
  2017-05-09  8:28 ` [RFC][PATCHv3 3/5] printk: add enforce_emergency parameter Sergey Senozhatsky
@ 2017-05-09  8:28 ` Sergey Senozhatsky
  2017-05-09  8:28 ` [RFC][PATCHv3 5/5] printk: register PM notifier Sergey Senozhatsky
  4 siblings, 0 replies; 51+ messages in thread
From: Sergey Senozhatsky @ 2017-05-09  8:28 UTC (permalink / raw)
  To: Petr Mladek, Steven Rostedt
  Cc: Jan Kara, Andrew Morton, Peter Zijlstra, Rafael J . Wysocki,
	Eric Biederman, Greg Kroah-Hartman, Jiri Slaby, Pavel Machek,
	Andreas Mohr, Tetsuo Handa, linux-kernel, Sergey Senozhatsky,
	Sergey Senozhatsky

Initialize kernel printing thread and make printk offloading
possible. By default `atomic_print_limit' is set to 0, so no
offloading will take place, unless requested by user.

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
---
 kernel/printk/printk.c | 38 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 38 insertions(+)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index ba640fbcee7c..81ea575728b9 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -49,6 +49,7 @@
 #include <linux/sched/debug.h>
 #include <linux/sched/task_stack.h>
 #include <linux/kthread.h>
+#include <uapi/linux/sched/types.h>
 
 #include <linux/uaccess.h>
 #include <asm/sections.h>
@@ -2927,6 +2928,43 @@ static DEFINE_PER_CPU(struct irq_work, wake_up_klogd_work) = {
 	.flags = IRQ_WORK_LAZY,
 };
 
+static int printk_kthread_func(void *data)
+{
+	while (1) {
+		set_current_state(TASK_INTERRUPTIBLE);
+		if (!test_bit(PRINTK_PENDING_OUTPUT, &printk_pending))
+			schedule();
+
+		__set_current_state(TASK_RUNNING);
+
+		console_lock();
+		console_unlock();
+	}
+
+	return 0;
+}
+
+/*
+ * Init printk kthread at late_initcall stage, after core/arch/device/etc.
+ * initialization.
+ */
+static int __init init_printk_kthread(void)
+{
+	struct task_struct *thread;
+	struct sched_param param = { .sched_priority = MAX_USER_RT_PRIO / 2 };
+
+	thread = kthread_run(printk_kthread_func, NULL, "printk");
+	if (IS_ERR(thread)) {
+		pr_err("printk: unable to create printing thread\n");
+		return PTR_ERR(thread);
+	}
+
+	sched_setscheduler(thread, SCHED_FIFO, &param);
+	printk_kthread = thread;
+	return 0;
+}
+late_initcall(init_printk_kthread);
+
 void wake_up_klogd(void)
 {
 	preempt_disable();
-- 
2.12.2

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [RFC][PATCHv3 5/5] printk: register PM notifier
  2017-05-09  8:28 [RFC][PATCHv3 0/5] printk: introduce printing kernel thread Sergey Senozhatsky
                   ` (3 preceding siblings ...)
  2017-05-09  8:28 ` [RFC][PATCHv3 4/5] printk: enable printk offloading Sergey Senozhatsky
@ 2017-05-09  8:28 ` Sergey Senozhatsky
  2017-05-30  9:55   ` Petr Mladek
  4 siblings, 1 reply; 51+ messages in thread
From: Sergey Senozhatsky @ 2017-05-09  8:28 UTC (permalink / raw)
  To: Petr Mladek, Steven Rostedt
  Cc: Jan Kara, Andrew Morton, Peter Zijlstra, Rafael J . Wysocki,
	Eric Biederman, Greg Kroah-Hartman, Jiri Slaby, Pavel Machek,
	Andreas Mohr, Tetsuo Handa, linux-kernel, Sergey Senozhatsky,
	Sergey Senozhatsky

It's not always possible/safe to wake_up() printk kernel
thread. For example, late suspend/early resume may printk()
while timekeeping is not initialized yet, so calling into the
scheduler may result in recursive warnings.

Another thing to notice is the fact PM at some point
freezes user space and kernel threads: freeze_processes()
and freeze_kernel_threads(), correspondingly. Thus we need
printk() to operate in emergency mode there and attempt to
immediately flush pending kernel message to the console.

This patch registers PM notifier, so PM can switch printk
to emergency mode from PM_FOO_PREPARE notifiers and return
back to printk threaded mode from PM_POST_FOO notifiers.

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Suggested-by: Andreas Mohr <andi@lisas.de>
---
 kernel/printk/printk.c | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 81ea575728b9..6aae36a29aca 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -50,6 +50,7 @@
 #include <linux/sched/task_stack.h>
 #include <linux/kthread.h>
 #include <uapi/linux/sched/types.h>
+#include <linux/suspend.h>
 
 #include <linux/uaccess.h>
 #include <asm/sections.h>
@@ -2928,6 +2929,30 @@ static DEFINE_PER_CPU(struct irq_work, wake_up_klogd_work) = {
 	.flags = IRQ_WORK_LAZY,
 };
 
+static int printk_pm_notify(struct notifier_block *notify_block,
+			    unsigned long mode, void *unused)
+{
+	switch (mode) {
+	case PM_HIBERNATION_PREPARE:
+	case PM_SUSPEND_PREPARE:
+	case PM_RESTORE_PREPARE:
+		printk_emergency_begin();
+		break;
+
+	case PM_POST_SUSPEND:
+	case PM_POST_HIBERNATION:
+	case PM_POST_RESTORE:
+		printk_emergency_end();
+		break;
+	}
+
+	return 0;
+}
+
+static struct notifier_block printk_pm_nb = {
+	.notifier_call = printk_pm_notify,
+};
+
 static int printk_kthread_func(void *data)
 {
 	while (1) {
@@ -2961,6 +2986,8 @@ static int __init init_printk_kthread(void)
 
 	sched_setscheduler(thread, SCHED_FIFO, &param);
 	printk_kthread = thread;
+
+	WARN_ON(register_pm_notifier(&printk_pm_nb) != 0);
 	return 0;
 }
 late_initcall(init_printk_kthread);
-- 
2.12.2

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCHv3 2/5] printk: introduce printing kernel thread
  2017-05-09  8:28 ` [RFC][PATCHv3 2/5] printk: introduce printing kernel thread Sergey Senozhatsky
@ 2017-05-10  5:59   ` Sergey Senozhatsky
  2017-05-29  9:29     ` Petr Mladek
  0 siblings, 1 reply; 51+ messages in thread
From: Sergey Senozhatsky @ 2017-05-10  5:59 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Petr Mladek, Steven Rostedt, Jan Kara, Andrew Morton,
	Peter Zijlstra, Rafael J . Wysocki, Eric Biederman,
	Greg Kroah-Hartman, Jiri Slaby, Pavel Machek, Andreas Mohr,
	Tetsuo Handa, linux-kernel, Sergey Senozhatsky

fix !PRINTK config build.
(Reported-by: kbuild test robot <lkp@intel.com>).

====================

From: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Subject: [PATCH 2/5] printk: introduce printing kernel thread

printk() is quite complex internally and, basically, it does two
slightly independent things:
 a) adds a new message to a kernel log buffer (log_store())
 b) prints kernel log messages to serial consoles (console_unlock())

while (a) is guaranteed to be executed by printk(), (b) is not, for a
variety of reasons, and, unlike log_store(), it comes at a price:

 1) console_unlock() attempts to flush all pending kernel log messages
to the console and it can loop indefinitely.

 2) while console_unlock() is executed on one particular CPU, printing
pending kernel log messages, other CPUs can simultaneously append new
messages to the kernel log buffer.

 3) the time it takes console_unlock() to print kernel messages also
depends on the speed of the console -- which may not be fast at all.

 4) console_unlock() is executed in the same context as printk(), so
it may be non-preemptible/atomic, which makes 1)-3) dangerous.

As a result, nobody knows how long a printk() call will take, so
it's not really safe to call printk() in a number of situations,
including atomic context, RCU critical sections, interrupt context,
etc.

This patch introduces a '/sys/module/printk/parameters/atomic_print_limit'
sysfs param, which sets the limit on number of lines a process can print
from console_unlock(). Value 0 corresponds to the current behavior (no
limitation). The printing offloading is happening from console_unlock()
function and, briefly, looks as follows: as soon as process prints more
than `atomic_print_limit' lines it attempts to offload printing to another
process. Since nothing guarantees that there will another process sleeping
on the console_sem or calling printk() on another CPU simultaneously, the
patch also introduces an auxiliary kernel thread - printk_kthread, the
main purpose of which is to take over printing duty. The workflow is, thus,
turns into: as soon as process prints more than `atomic_print_limit' lines
it wakes up printk_kthread and unlocks the console_sem. So in the best case
at this point there will be at least 1 processes trying to lock the
console_sem: printk_kthread. (There can also be a process that was sleeping
on the console_sem and that was woken up by console semaphore up(); and
concurrent printk() invocations from other CPUs). But in the worst case
there won't be any processes ready to take over the printing duty: it
may take printk_kthread some time to become running; or printk_kthread
may even never become running (a misbehaving scheduler, or other critical
condition). That's why after we wake_up() printk_kthread we can't
immediately leave the printing loop, we must ensure that the console_sem
has a new owner before we do so. Therefore, `atomic_print_limit' is a soft
limit, not the hard one: we let task to overrun `atomic_print_limit'. But,
at the same time, the console_unlock() printing loop behaves differently
for tasks that have exceeded `atomic_print_limit': after every printed
logbuf entry (call_console_drivers()) such a process wakes up printk_kthread,
unlocks the console_sem and attempts to console_trylock() a bit later (if
there any are pending messages in the logbuf, of course). In the best case
scenario either printk_kthread or some other tasks will lock the console_sem,
so current printing task will see failed console_trylock(), which will
indicate a successful printing offloading. In the worst case, however,
current will successfully console_trylock(), which will indicate that
offloading did not take place and we can't return from console_unlock(),
so the printing task will print one more line from the logbuf and attempt
to offload printing once again; and it will continue doing so until another
process locks the console_sem or until there are pending messages in the
logbuf. So if everything goes wrong - we can't wakeup printk_kthread and
there are no other processes sleeping on the console_sem or trying to down()
it - then we will have the existing console_unlock() behavior: print all
pending messages in one shot.

We track the number of unsuccessful offloading attempts and after some
point we declare a `printk_emergency' condition and give up trying to
offload. `printk_emergency' is a new name for printk mode in which printk
does not attempt to offload printing, but instead flushes all the pending
logbuf messages (basically, the existing behaviour).

There also other cases when we can't (or should avoid) offloading. For
example, we can't call into the scheduler from panic(), because this may
cause deadlock. Therefore printk() has some additional places where it
switches to printk_emergency mode: for instance, once a EMERG log level
message appears in the log buffer. We also provide two new functions
that can be used when a path needs to declare a temporal
`printk_emergency' mode:

 -- printk_emergency_begin()
    Disables printk offloading. All printk() calls will attempt
    to lock the console_sem and, if successful, flush kernel log
    messages.

 -- printk_emergency_end()
    Enables printk offloading.

Offloading is not possible yet, it will be enabled in a later patch.

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
---
 include/linux/console.h |   3 +
 kernel/printk/printk.c  | 204 ++++++++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 199 insertions(+), 8 deletions(-)

diff --git a/include/linux/console.h b/include/linux/console.h
index b8920a031a3e..07005db4c788 100644
--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -187,6 +187,9 @@ extern bool console_suspend_enabled;
 extern void suspend_console(void);
 extern void resume_console(void);
 
+extern void printk_emergency_begin(void);
+extern void printk_emergency_end(void);
+
 int mda_console_init(void);
 void prom_con_init(void);
 
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 2cb7f4753b76..d78a01fa959b 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -48,6 +48,7 @@
 #include <linux/sched/clock.h>
 #include <linux/sched/debug.h>
 #include <linux/sched/task_stack.h>
+#include <linux/kthread.h>
 
 #include <linux/uaccess.h>
 #include <asm/sections.h>
@@ -402,7 +403,8 @@ DEFINE_RAW_SPINLOCK(logbuf_lock);
 	} while (0)
 
 /*
- * Delayed printk version, for scheduler-internal messages:
+ * Used both for deferred printk version (scheduler-internal messages)
+ * and printk_kthread control.
  */
 #define PRINTK_PENDING_WAKEUP	0x01
 #define PRINTK_PENDING_OUTPUT	0x02
@@ -445,6 +447,132 @@ static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN);
 static char *log_buf = __log_buf;
 static u32 log_buf_len = __LOG_BUF_LEN;
 
+static struct task_struct *printk_kthread __read_mostly;
+/*
+ * We can't call into the scheduler (wake_up() printk kthread) during
+ * suspend/kexec/etc. This temporarily switches printk to old behaviour.
+ */
+static atomic_t printk_emergency __read_mostly;
+/*
+ * Disable printk_kthread permanently. Unlike `oops_in_progress'
+ * it doesn't go back to 0.
+ */
+static bool printk_enforce_emergency __read_mostly;
+/*
+ * The number of lines a task can print before offloading printing
+ * job to printk_kthread. 0 indicates 'no limit'.
+ */
+static unsigned int atomic_print_limit;
+
+module_param_named(atomic_print_limit, atomic_print_limit, uint, 0644);
+MODULE_PARM_DESC(atomic_print_limit,
+		 "max lines to print before offloading to printk kthread");
+
+static inline bool printk_offloading_enabled(void)
+{
+	return atomic_print_limit &&
+		printk_kthread &&
+		!printk_enforce_emergency &&
+		atomic_read(&printk_emergency) == 0;
+}
+
+/*
+ * This disables printing offloading and instead attempts
+ * to do the usual console_trylock()->console_unlock().
+ *
+ * Note, this does not stop the printk_kthread if it's already
+ * printing logbuf messages.
+ */
+void printk_emergency_begin(void)
+{
+	atomic_inc(&printk_emergency);
+}
+
+/* This re-enables printk_kthread offloading. */
+void printk_emergency_end(void)
+{
+	atomic_dec(&printk_emergency);
+}
+
+/*
+ * Under heavy printing load or with a slow serial console (or both)
+ * console_unlock() can stall CPUs, which can result in soft/hard-lockups,
+ * lost interrupts, RCU stalls, etc. Therefore we attempt to limit the
+ * number of lines a process can print from console_unlock().
+ *
+ * There is one more reason to do offloading - there might be other processes
+ * (including user space tasks) sleeping on the console_sem in uninterruptible
+ * state; and keeping the console_sem locked for long time may have a negative
+ * impact on them.
+ *
+ * This function must be called from 'printk_safe' context under
+ * console_sem lock.
+ */
+static inline bool console_offload_printing(void)
+{
+	static struct task_struct *printing_task;
+	static unsigned long long lines_printed;
+	static unsigned long saved_csw;
+
+	if (!printk_offloading_enabled())
+		return false;
+
+	if (system_state != SYSTEM_RUNNING || oops_in_progress)
+		return false;
+
+	/* A new task - reset the counters. */
+	if (printing_task != current) {
+		lines_printed = 0;
+		printing_task = current;
+		saved_csw = current->nvcsw + current->nivcsw;
+		return false;
+	}
+
+	lines_printed++;
+	if (lines_printed < atomic_print_limit)
+		return false;
+
+	if (current == printk_kthread) {
+		/*
+		 * Reset the `lines_printed' counter, just in case if
+		 * printk_kthread is the only process left that would
+		 * down() console_sem and call console_unlock().
+		 */
+		lines_printed = 0;
+		return true;
+	}
+
+	/*
+	 * A trivial emergency enforcement - give up on printk_kthread if
+	 * we can't wake it up. This assumes that `atomic_print_limit' is
+	 * reasonably and sufficiently large.
+	 */
+	if (lines_printed > 10 * (unsigned long long)atomic_print_limit &&
+			saved_csw == (current->nvcsw + current->nivcsw)) {
+		printk_enforce_emergency = true;
+		pr_crit("Declaring printk emergency mode.\n");
+		return false;
+	}
+
+	/*
+	 * Wakeup `printk_kthread' but don't reset the `lines_printed'
+	 * counter, let task overrun the limit. The idea is that
+	 *
+	 *   a) woken up printk_kthread (if succeeded)
+	 * or
+	 *   b) concurrent printk from another CPU or a task from console_sem
+	 *      wait list (if any)
+	 *
+	 * will take over, change `printing_task' and reset the counter.
+	 * If neither a) nor b) happens - we continue printing from
+	 * current process. Which is bad and can be risky, but since
+	 * we couldn't wake_up() printk_kthread, the things already
+	 * don't look normal.
+	 */
+	wake_up_process(printk_kthread);
+	return true;
+}
+
 /* Return log buffer address */
 char *log_buf_addr_get(void)
 {
@@ -1765,6 +1893,15 @@ asmlinkage int vprintk_emit(int facility, int level,
 
 	printed_len += log_output(facility, level, lflags, dict, dictlen, text, text_len);
 
+	/*
+	 * Emergency level indicates that the system is unstable and, thus,
+	 * we better stop relying on wake_up(printk_kthread) and try to do
+	 * a direct printing.
+	 */
+	if (level == LOGLEVEL_EMERG)
+		printk_enforce_emergency = true;
+
+	set_bit(PRINTK_PENDING_OUTPUT, &printk_pending);
 	logbuf_unlock_irqrestore(flags);
 
 	/* If called from the scheduler, we can not call up(). */
@@ -1882,6 +2019,10 @@ static size_t msg_print_text(const struct printk_log *msg,
 			     bool syslog, char *buf, size_t size) { return 0; }
 static bool suppress_message_printing(int level) { return false; }
 
+void printk_emergency_begin(void) {}
+void printk_emergency_end(void) {}
+static inline bool console_offload_printing(void) { return false; }
+
 #endif /* CONFIG_PRINTK */
 
 #ifdef CONFIG_EARLY_PRINTK
@@ -2177,8 +2318,17 @@ void console_unlock(void)
 	unsigned long flags;
 	bool wake_klogd = false;
 	bool do_cond_resched, retry;
+	bool did_offload;
 
 	if (console_suspended) {
+		/*
+		 * Here and later, we need to clear the PENDING_OUTPUT bit
+		 * in order to avoid an infinite loop in printk_kthread
+		 * function when console_unlock() cannot flush messages
+		 * because we suspended consoles. Someone else will print
+		 * the messages from resume_console().
+		 */
+		clear_bit(PRINTK_PENDING_OUTPUT, &printk_pending);
 		up_console_sem();
 		return;
 	}
@@ -2200,6 +2350,7 @@ void console_unlock(void)
 	do_cond_resched = console_may_schedule;
 again:
 	console_may_schedule = 0;
+	did_offload = 0;
 
 	/*
 	 * We released the console_sem lock, so we need to recheck if
@@ -2207,6 +2358,7 @@ void console_unlock(void)
 	 * console.
 	 */
 	if (!can_use_console()) {
+		clear_bit(PRINTK_PENDING_OUTPUT, &printk_pending);
 		console_locked = 0;
 		up_console_sem();
 		return;
@@ -2235,7 +2387,7 @@ void console_unlock(void)
 			len = 0;
 		}
 skip:
-		if (console_seq == log_next_seq)
+		if (did_offload || console_seq == log_next_seq)
 			break;
 
 		msg = log_from_idx(console_idx);
@@ -2267,9 +2419,38 @@ void console_unlock(void)
 		stop_critical_timings();	/* don't trace print latency */
 		call_console_drivers(ext_text, ext_len, text, len);
 		start_critical_timings();
+
+		/*
+		 * Sometimes current task may re-acquire console_sem ahead of
+		 * just woken up printk_kthread, because we let processes to
+		 * compete for console_sem. In this case current task will jump
+		 * to `again' label (if there are any pending messages), print
+		 * one more line, break out of printing loop (we don't reset
+		 * `lines_printed' after wake_up_process() exactly for this
+		 * purpose) and do up_console_sem(); this will be repeated as
+		 * long as current task successfully console_trylock()-s.
+		 *
+		 * Eventually and hopefully, either printk_kthread or some
+		 * other task will acquire console_sem and current process will
+		 * be able to leave console_unlock(). Until this happens we
+		 * cannot be sure that anyone will take over and flush the
+		 * `logbuf' messages. So in the worst case scenario we,
+		 * basically, have old printk()/console_unlock() behaviour:
+		 * a task prints all pending logbuf messages.
+		 *
+		 * If printk_kthread never wakes up (which may indicate that
+		 * the system is unstable or something weird is going on),
+		 * then we keep jumping to `again' label printing one message
+		 * from the logbuf each time. This is a bit ugly, but at least
+		 * we will print out the logbuf. If such condition occurs,
+		 * console_offload_printing() should declare `printk_emergency'
+		 * at some point and we will stop all attempts to offload
+		 * printing.
+		 */
+		did_offload = console_offload_printing();
 		printk_safe_exit_irqrestore(flags);
 
-		if (do_cond_resched)
+		if (!did_offload && do_cond_resched)
 			cond_resched();
 	}
 	console_locked = 0;
@@ -2284,12 +2465,16 @@ void console_unlock(void)
 
 	/*
 	 * Someone could have filled up the buffer again, so re-check if there's
-	 * something to flush. In case we cannot trylock the console_sem again,
-	 * there's a new owner and the console_unlock() from them will do the
-	 * flush, no worries.
+	 * something to flush. Try to lock the console_sem even if we
+	 * `did_offload', because we need to know for sure that some other task
+	 * has acquired the console_sem. In case we cannot trylock the
+	 * console_sem again, there's a new owner and the console_unlock() from
+	 * them will do the flush, no worries.
 	 */
 	raw_spin_lock(&logbuf_lock);
 	retry = console_seq != log_next_seq;
+	if (!retry)
+		clear_bit(PRINTK_PENDING_OUTPUT, &printk_pending);
 	raw_spin_unlock(&logbuf_lock);
 	printk_safe_exit_irqrestore(flags);
 
@@ -2719,8 +2904,11 @@ late_initcall(printk_late_init);
 #if defined CONFIG_PRINTK
 static void wake_up_klogd_work_func(struct irq_work *irq_work)
 {
-	if (test_and_clear_bit(PRINTK_PENDING_OUTPUT, &printk_pending)) {
-		/* If trylock fails, someone else is doing the printing */
+	if (test_bit(PRINTK_PENDING_OUTPUT, &printk_pending)) {
+		/*
+		 * If trylock fails, someone else is doing the printing.
+		 * PRINTK_PENDING_OUTPUT bit is cleared by console_unlock().
+		 */
 		if (console_trylock())
 			console_unlock();
 	}
-- 
2.12.2

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCHv3 1/5] printk: move printk_pending out of per-cpu
  2017-05-09  8:28 ` [RFC][PATCHv3 1/5] printk: move printk_pending out of per-cpu Sergey Senozhatsky
@ 2017-05-25 12:11   ` Petr Mladek
  2017-05-25 12:36     ` Sergey Senozhatsky
  0 siblings, 1 reply; 51+ messages in thread
From: Petr Mladek @ 2017-05-25 12:11 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Steven Rostedt, Jan Kara, Andrew Morton, Peter Zijlstra,
	Rafael J . Wysocki, Eric Biederman, Greg Kroah-Hartman,
	Jiri Slaby, Pavel Machek, Andreas Mohr, Tetsuo Handa,
	linux-kernel, Sergey Senozhatsky

On Tue 2017-05-09 17:28:55, Sergey Senozhatsky wrote:
> Do not keep `printk_pending' in per-CPU area. We set the following bits
> of printk_pending:
> a) PRINTK_PENDING_WAKEUP
> 	when we need to wakeup klogd
> b) PRINTK_PENDING_OUTPUT
> 	when there is a pending output from deferred printk and we need
> 	to call console_unlock().
> 
> So none of the bits control/represent a state of a particular CPU and,
> basically, they should be global instead.
> 
> Besides we will use `printk_pending' to control printk kthread, so this
> patch is also a preparation work.
> 
> Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
> Suggested-by: Petr Mladek <pmladek@suse.com>
> ---
>  kernel/printk/printk.c | 26 ++++++++++++--------------
>  1 file changed, 12 insertions(+), 14 deletions(-)
> 
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index a1aecf44ab07..2cb7f4753b76 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -401,6 +401,14 @@ DEFINE_RAW_SPINLOCK(logbuf_lock);
>  		printk_safe_exit_irqrestore(flags);	\
>  	} while (0)
>  
> +/*
> + * Delayed printk version, for scheduler-internal messages:
> + */
> +#define PRINTK_PENDING_WAKEUP	0x01
> +#define PRINTK_PENDING_OUTPUT	0x02
> +
> +static unsigned long printk_pending;
> +
>  #ifdef CONFIG_PRINTK

The variable is used only when CONFIG_PRINTK is defined at
the moment, so this produces a warning about an unused variable.

Otherwise, the patch looks fine to me.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCHv3 1/5] printk: move printk_pending out of per-cpu
  2017-05-25 12:11   ` Petr Mladek
@ 2017-05-25 12:36     ` Sergey Senozhatsky
  2017-05-25 12:43       ` Petr Mladek
  0 siblings, 1 reply; 51+ messages in thread
From: Sergey Senozhatsky @ 2017-05-25 12:36 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Jan Kara, Andrew Morton,
	Peter Zijlstra, Rafael J . Wysocki, Eric Biederman,
	Greg Kroah-Hartman, Jiri Slaby, Pavel Machek, Andreas Mohr,
	Tetsuo Handa, linux-kernel, Sergey Senozhatsky

On (05/25/17 14:11), Petr Mladek wrote:
[..]
> The variable is used only when CONFIG_PRINTK is defined at
> the moment, so this produces a warning about an unused variable.
> 

interesting... `printk_pending' is used in console_unlock(), which is
always available.

	-ss

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCHv3 1/5] printk: move printk_pending out of per-cpu
  2017-05-25 12:36     ` Sergey Senozhatsky
@ 2017-05-25 12:43       ` Petr Mladek
  0 siblings, 0 replies; 51+ messages in thread
From: Petr Mladek @ 2017-05-25 12:43 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Sergey Senozhatsky, Steven Rostedt, Jan Kara, Andrew Morton,
	Peter Zijlstra, Rafael J . Wysocki, Eric Biederman,
	Greg Kroah-Hartman, Jiri Slaby, Pavel Machek, Andreas Mohr,
	Tetsuo Handa, linux-kernel

On Thu 2017-05-25 21:36:33, Sergey Senozhatsky wrote:
> On (05/25/17 14:11), Petr Mladek wrote:
> [..]
> > The variable is used only when CONFIG_PRINTK is defined at
> > the moment, so this produces a warning about an unused variable.
> > 
> 
> interesting... `printk_pending' is used in console_unlock(), which is
> always available.

Yeah, I see. But this usage is introduced in the 2nd patch. I started
the review sequentially ;-)

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCHv3 2/5] printk: introduce printing kernel thread
  2017-05-10  5:59   ` Sergey Senozhatsky
@ 2017-05-29  9:29     ` Petr Mladek
  2017-05-29 12:12       ` Jan Kara
  2017-05-31  7:22       ` Sergey Senozhatsky
  0 siblings, 2 replies; 51+ messages in thread
From: Petr Mladek @ 2017-05-29  9:29 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Sergey Senozhatsky, Steven Rostedt, Jan Kara, Andrew Morton,
	Peter Zijlstra, Rafael J . Wysocki, Eric Biederman,
	Greg Kroah-Hartman, Jiri Slaby, Pavel Machek, Andreas Mohr,
	Tetsuo Handa, linux-kernel

On Wed 2017-05-10 14:59:35, Sergey Senozhatsky wrote:
> This patch introduces a '/sys/module/printk/parameters/atomic_print_limit'
> sysfs param, which sets the limit on number of lines a process can print
> from console_unlock(). Value 0 corresponds to the current behavior (no
> limitation). The printing offloading is happening from console_unlock()
> function and, briefly, looks as follows: as soon as process prints more
> than `atomic_print_limit' lines it attempts to offload printing to another
> process. Since nothing guarantees that there will another process sleeping
> on the console_sem or calling printk() on another CPU simultaneously, the
> patch also introduces an auxiliary kernel thread - printk_kthread, the
> main purpose of which is to take over printing duty. The workflow is, thus,
> turns into: as soon as process prints more than `atomic_print_limit' lines
> it wakes up printk_kthread and unlocks the console_sem. So in the best case
> at this point there will be at least 1 processes trying to lock the
> console_sem: printk_kthread. (There can also be a process that was sleeping
> on the console_sem and that was woken up by console semaphore up(); and
> concurrent printk() invocations from other CPUs). But in the worst case
> there won't be any processes ready to take over the printing duty: it
> may take printk_kthread some time to become running; or printk_kthread
> may even never become running (a misbehaving scheduler, or other critical
> condition). That's why after we wake_up() printk_kthread we can't
> immediately leave the printing loop, we must ensure that the console_sem
> has a new owner before we do so. Therefore, `atomic_print_limit' is a soft
> limit, not the hard one: we let task to overrun `atomic_print_limit'.
> But, at the same time, the console_unlock() printing loop behaves differently
> for tasks that have exceeded `atomic_print_limit': after every printed
> logbuf entry (call_console_drivers()) such a process wakes up printk_kthread,
> unlocks the console_sem and attempts to console_trylock() a bit later
> (if there any are pending messages in the logbuf, of course). In the best case
> scenario either printk_kthread or some other tasks will lock the console_sem,
> so current printing task will see failed console_trylock(), which will
> indicate a successful printing offloading. In the worst case, however,
> current will successfully console_trylock(), which will indicate that
> offloading did not take place and we can't return from console_unlock(),
> so the printing task will print one more line from the logbuf and attempt
> to offload printing once again; and it will continue doing so until another
> process locks the console_sem or until there are pending messages in the
> logbuf. So if everything goes wrong - we can't wakeup printk_kthread and
> there are no other processes sleeping on the console_sem or trying to down()
> it - then we will have the existing console_unlock() behavior: print all
> pending messages in one shot.

Please, try to avoid such long paragraphs ;-) It looks fine when you
read it for the first time. But it becomes problematic when you try
to go back and re-read some detail.

> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 2cb7f4753b76..a113f684066c 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -2155,6 +2216,85 @@ static inline int can_use_console(void)
>  	return cpu_online(raw_smp_processor_id()) || have_callable_console();
>  }
>  
> +/*
> + * Under heavy printing load or with a slow serial console (or both)
> + * console_unlock() can stall CPUs, which can result in soft/hard-lockups,
> + * lost interrupts, RCU stalls, etc. Therefore we attempt to limit the
> + * number of lines a process can print from console_unlock().
> + *
> + * There is one more reason to do offloading - there might be other processes
> + * (including user space tasks) sleeping on the console_sem in uninterruptible
> + * state; and keeping the console_sem locked for long time may have a negative
> + * impact on them.
> + *
> + * This function must be called from 'printk_safe' context under
> + * console_sem lock.
> + */
> +static inline bool console_offload_printing(void)
> +{
> +	static struct task_struct *printing_task;
> +	static unsigned long long lines_printed;
> +	static unsigned long saved_csw;
> +
> +	if (!printk_offloading_enabled())
> +		return false;
> +
> +	if (system_state != SYSTEM_RUNNING || oops_in_progress)
> +		return false;

Wow, I really like this test. I was not aware of this variable.

Well, I would allow to use the offload also during boot.
We have reports when softlockups happened during boot.
The offloading can be easily disabled via the command line
but not the other way around.


> +	/* A new task - reset the counters. */
> +	if (printing_task != current) {
> +		lines_printed = 0;
> +		printing_task = current;
> +		saved_csw = current->nvcsw + current->nivcsw;
> +		return false;
> +	}
> +
> +	lines_printed++;
> +	if (lines_printed < atomic_print_limit)
> +		return false;
> +
> +	if (current == printk_kthread) {
> +		/*
> +		 * Reset the `lines_printed' counter, just in case if
> +		 * printk_kthread is the only process left that would
> +		 * down() console_sem and call console_unlock().
> +		 */
> +		lines_printed = 0;
> +		return true;
> +	}
> +
> +	/*
> +	 * A trivial emergency enforcement - give up on printk_kthread if
> +	 * we can't wake it up. This assumes that `atomic_print_limit' is
> +	 * reasonably and sufficiently large.
> +	 */
> +	if (lines_printed > 10 * (unsigned long long)atomic_print_limit &&
> +			saved_csw == (current->nvcsw + current->nivcsw)) {
> +		printk_enforce_emergency = true;
> +		pr_crit("Declaring printk emergency mode.\n");
> +		return false;
> +	}

This is interesting way how to detect that the system is in the
emergency state. The question is how reliable it is.

It might work only if the waken printk kthread is scheduled in a predictable
time period. You try to achieve this by setting read time priority
(4th patch of this patchset). The question is how this would work:


First, the real time priority is questionable on its own. Logging
is important but the real time priority is dangerous. Any "flood"
of messages will starve all other processes with normal priority.
It is better than a softlockup but it might cause problems as well.


Second, it might be pretty hard to tune the prediction of the
emergency situation. The above formula might depend on:

  + the speed of active consoles

  + length of the printed lines

  + the number and priorities of other real time tasks
    on the system and the related average time when
    printk kthread is scheduled

  + HZ size; the lower HZ the more messages might be
    handled in one jiffy

  + the current CPU speed, ...


Third, if the console_trylock() fails below, it is not guaranteed that
the other waiter will really continue printing. The console_trylock()
would fail because someone else called console_lock(), failed to get it,
and went into sleep. There is no guarantee that it will be waken
once again.

I have discussed this with my colleagues. They think that we are
lost anyway if the scheduling does not work. It is because many
console drivers use sleeping locks and depends on scheduling.

They suggested me to write down what we really wanted to achieve
first. I thought that I knew it. But it actually might be
interesting to write it down:


Ideal printk:

  + show and store all messages to know what's going on


Current printk does the best effort to:

  + store "all" messages into logbuf_log and

      + show important ones on console
      + allow to see/store them in userspace
      + allow to find them in crashdump


Current printk has limitations:

  + storing the messages:

      + limited buffer size => loosing messages
      + possible deadlocks => temporary passing via alternative
			buffers in NMI, recursion


  + userspace access ?:

      + on demand => more behind the reality that it might be


  + console output:

      + slow => async =>
		+ might be far behind
		+ loosing messages
		+ showing only ones with high log level
		+ possible softlockup
		+ not guaranteed when dying

      + possible deadlock => explicitly async in sched, NMI


Now, this patchset is trying to avoid the softlockup caused by console
output by making it even more async. This makes the other problems of
the async output more visible.

The patch tries to reduce the negative impact by detecting when
the negative impact is visible and switching back to the sync mode.

The fact is that we need to handle the emergency situation well.
This is where the system has troubles and the messages might
be crucial to debug the problem and fix it.


<repeating myself>

The question is if we are able to detect the emergency situation
reliable and if the solution does not bring other problems.

If the detection has too false positives, it will not help much
with the original problem (softlockup). If it has too false negatives,
it will increase the negative impact of the limits.

Also adding kthread with real time priority might have strange
side effects on its own.

</repeating myself>


Let me to look at it from the other side. What are the emergency
situations?

  + oops/panic() - easy to detect
  + suspend, shutdown, halt - rather easy to detect as well
  + flood of messages => console far behind logbuf_log
  + sudden dead without triggering panic()

Did I miss anything?

Anyway, we could detect the first three situations rather well:

  + oops/panic by a check for oops_in_progress

  + suspend, shutdown, ... by the check of system_state
    and the notifiers in the 5th patch

  + I would suggest to detect flood of messages by a difference
    between log_next_seq and console_seq. We could take into
    account the number of messages in the buffer
    (log_next_seq - log_first_seq)

  + I do not see a reliable way how to detect the sudden dead.


Please note, that we do not really need that questionable
trick with console_unlock(), console_trylock(), counting handled
messages over the limit, and realtime kthread.


In addition, I think that a semi-sync-async mode might better
handle the sudden dead. What is it? I have already talked about
something similar last time.

Normal printk() should do something like:

  printk()
    vprintk_emit()
      log_store()
      if (!deferred)
	console_trylock()
	console_unlock()
	  while (num_lines < atomic_print_limit)
	    handle message
	  up_console_sem()
	  if (pending_messages)
	    wakeup(printk_kthread)


and printk_kthread should do:

   while() {

       if (!pending_messages)
	 sleep()

       console_lock()

       preempt_disable()
       console_unlock()
	 while (pending_message && !need_resched)
	   handle message
	 up_console_sem()
       preempt_enable()
    }


By other words:

  + printk() should try to get console and handle few lines
    synchronously. It will handle its own message in most cases.

  + printk_kthread is only a fallback. It is waken when there are
    unhandled messages. It handles as many messages as
    possible in its assigned time slot. But it releases
    console_sem before it goes into sleep.


The advantages:

   + all messages are handled synchronously if there is not
     a flood of messages

   + if there is the flood of messages; kthread works only
     as a fallback; processes that produce the messages
     are involved into handling the console (nature throttling);
     the messages appear on the console even when printk_kthread
     is sleeping; but the softlockup is prevented

=> it should work pretty well in all situations, including the flood
   of messages and sudden dead.

What do you think, please?

Did I made a mistake in the logic?
Did I miss some important reason for an emergency situation?

I am sorry for the long mail. It took me long time to sort ideas.
I hope that it makes sense.


Best Regards,
Petr

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCHv3 2/5] printk: introduce printing kernel thread
  2017-05-29  9:29     ` Petr Mladek
@ 2017-05-29 12:12       ` Jan Kara
  2017-05-31  7:30         ` Sergey Senozhatsky
  2017-05-31  7:22       ` Sergey Senozhatsky
  1 sibling, 1 reply; 51+ messages in thread
From: Jan Kara @ 2017-05-29 12:12 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Steven Rostedt, Jan Kara,
	Andrew Morton, Peter Zijlstra, Rafael J . Wysocki,
	Eric Biederman, Greg Kroah-Hartman, Jiri Slaby, Pavel Machek,
	Andreas Mohr, Tetsuo Handa, linux-kernel

On Mon 29-05-17 11:29:06, Petr Mladek wrote:
> On Wed 2017-05-10 14:59:35, Sergey Senozhatsky wrote:
> > This patch introduces a '/sys/module/printk/parameters/atomic_print_limit'
> > sysfs param, which sets the limit on number of lines a process can print
> > from console_unlock(). Value 0 corresponds to the current behavior (no
> > limitation). The printing offloading is happening from console_unlock()
> > function and, briefly, looks as follows: as soon as process prints more
> > than `atomic_print_limit' lines it attempts to offload printing to another
> > process. Since nothing guarantees that there will another process sleeping
> > on the console_sem or calling printk() on another CPU simultaneously, the
> > patch also introduces an auxiliary kernel thread - printk_kthread, the
> > main purpose of which is to take over printing duty. The workflow is, thus,
> > turns into: as soon as process prints more than `atomic_print_limit' lines
> > it wakes up printk_kthread and unlocks the console_sem. So in the best case
> > at this point there will be at least 1 processes trying to lock the
> > console_sem: printk_kthread. (There can also be a process that was sleeping
> > on the console_sem and that was woken up by console semaphore up(); and
> > concurrent printk() invocations from other CPUs). But in the worst case
> > there won't be any processes ready to take over the printing duty: it
> > may take printk_kthread some time to become running; or printk_kthread
> > may even never become running (a misbehaving scheduler, or other critical
> > condition). That's why after we wake_up() printk_kthread we can't
> > immediately leave the printing loop, we must ensure that the console_sem
> > has a new owner before we do so. Therefore, `atomic_print_limit' is a soft
> > limit, not the hard one: we let task to overrun `atomic_print_limit'.
> > But, at the same time, the console_unlock() printing loop behaves differently
> > for tasks that have exceeded `atomic_print_limit': after every printed
> > logbuf entry (call_console_drivers()) such a process wakes up printk_kthread,
> > unlocks the console_sem and attempts to console_trylock() a bit later
> > (if there any are pending messages in the logbuf, of course). In the best case
> > scenario either printk_kthread or some other tasks will lock the console_sem,
> > so current printing task will see failed console_trylock(), which will
> > indicate a successful printing offloading. In the worst case, however,
> > current will successfully console_trylock(), which will indicate that
> > offloading did not take place and we can't return from console_unlock(),
> > so the printing task will print one more line from the logbuf and attempt
> > to offload printing once again; and it will continue doing so until another
> > process locks the console_sem or until there are pending messages in the
> > logbuf. So if everything goes wrong - we can't wakeup printk_kthread and
> > there are no other processes sleeping on the console_sem or trying to down()
> > it - then we will have the existing console_unlock() behavior: print all
> > pending messages in one shot.

Actually I had something very similar in old versions of my patch set. And
it didn't work very well. The problem was that e.g. sometimes scheduler
decided that printk kthread should run on the same CPU as the process
currently doing printing and in such case printk kthread never took over
printing and the machine locked up due to heavy printing.

> First, the real time priority is questionable on its own. Logging
> is important but the real time priority is dangerous. Any "flood"
> of messages will starve all other processes with normal priority.
> It is better than a softlockup but it might cause problems as well.

Processes with real-time priority should have well bounded runtime (in
miliseconds). Printk kthread doesn't have such bounded runtime so it should
not be a real time process as it could hog the CPU it is running on...

So I think what Petr suggests below is better. Keep normal priority, print
something to console from the process doing printk() and just wake up
printk kthread and hope it can print the rest. It is not ideal but unless
there's a flood of messages there is no regression to current state.

> In addition, I think that a semi-sync-async mode might better
> handle the sudden dead. What is it? I have already talked about
> something similar last time.
> 
> Normal printk() should do something like:
> 
>   printk()
>     vprintk_emit()
>       log_store()
>       if (!deferred)
> 	console_trylock()
> 	console_unlock()
> 	  while (num_lines < atomic_print_limit)
> 	    handle message
> 	  up_console_sem()
> 	  if (pending_messages)
> 	    wakeup(printk_kthread)
> 
> 
> and printk_kthread should do:
> 
>    while() {
> 
>        if (!pending_messages)
> 	 sleep()
> 
>        console_lock()
> 
>        preempt_disable()
>        console_unlock()
> 	 while (pending_message && !need_resched)
> 	   handle message
> 	 up_console_sem()
>        preempt_enable()
>     }
> 
> 
> By other words:
> 
>   + printk() should try to get console and handle few lines
>     synchronously. It will handle its own message in most cases.
> 
>   + printk_kthread is only a fallback. It is waken when there are
>     unhandled messages. It handles as many messages as
>     possible in its assigned time slot. But it releases
>     console_sem before it goes into sleep.
> 
> 
> The advantages:
> 
>    + all messages are handled synchronously if there is not
>      a flood of messages
> 
>    + if there is the flood of messages; kthread works only
>      as a fallback; processes that produce the messages
>      are involved into handling the console (nature throttling);
>      the messages appear on the console even when printk_kthread
>      is sleeping; but the softlockup is prevented
> 
> => it should work pretty well in all situations, including the flood
>    of messages and sudden dead.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCHv3 5/5] printk: register PM notifier
  2017-05-09  8:28 ` [RFC][PATCHv3 5/5] printk: register PM notifier Sergey Senozhatsky
@ 2017-05-30  9:55   ` Petr Mladek
  2017-05-31  6:42     ` Sergey Senozhatsky
  0 siblings, 1 reply; 51+ messages in thread
From: Petr Mladek @ 2017-05-30  9:55 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Steven Rostedt, Jan Kara, Andrew Morton, Peter Zijlstra,
	Rafael J . Wysocki, Eric Biederman, Greg Kroah-Hartman,
	Jiri Slaby, Pavel Machek, Andreas Mohr, Tetsuo Handa,
	linux-kernel, Sergey Senozhatsky

On Tue 2017-05-09 17:28:59, Sergey Senozhatsky wrote:
> It's not always possible/safe to wake_up() printk kernel
> thread. For example, late suspend/early resume may printk()
> while timekeeping is not initialized yet, so calling into the
> scheduler may result in recursive warnings.
> 
> Another thing to notice is the fact PM at some point
> freezes user space and kernel threads: freeze_processes()
> and freeze_kernel_threads(), correspondingly. Thus we need
> printk() to operate in emergency mode there and attempt to
> immediately flush pending kernel message to the console.
> 
> This patch registers PM notifier, so PM can switch printk
> to emergency mode from PM_FOO_PREPARE notifiers and return
> back to printk threaded mode from PM_POST_FOO notifiers.
> 
> Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
> Suggested-by: Andreas Mohr <andi@lisas.de>
> ---
>  kernel/printk/printk.c | 27 +++++++++++++++++++++++++++
>  1 file changed, 27 insertions(+)
> 
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 81ea575728b9..6aae36a29aca 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -2928,6 +2929,30 @@ static DEFINE_PER_CPU(struct irq_work, wake_up_klogd_work) = {
>  	.flags = IRQ_WORK_LAZY,
>  };
>  
> +static int printk_pm_notify(struct notifier_block *notify_block,
> +			    unsigned long mode, void *unused)
> +{
> +	switch (mode) {
> +	case PM_HIBERNATION_PREPARE:
> +	case PM_SUSPEND_PREPARE:
> +	case PM_RESTORE_PREPARE:
> +		printk_emergency_begin();
> +		break;
> +
> +	case PM_POST_SUSPEND:
> +	case PM_POST_HIBERNATION:
> +	case PM_POST_RESTORE:
> +		printk_emergency_end();
> +		break;
> +	}
> +
> +	return 0;

Heh, it seems that the meaning of return values
is a bit unclear and messy. For example see an older problem
with android memory handlers at https://lkml.org/lkml/2012/4/9/177

My understanding is that notifiers should return NOTIFY_OK
when they handled it and NOTIFY_DONE when they did nothing.
For example, see cpu_hotplug_pm_callback().

In reality, there does not seem to be a big difference,
see notifier_to_errno() in __pm_notifier_call_chain().

Anyway, I would try to follow the cpu_hotplug_pm_callback()
example.


> +}
> +
> +static struct notifier_block printk_pm_nb = {
> +	.notifier_call = printk_pm_notify,
> +};
> +
>  static int printk_kthread_func(void *data)
>  {
>  	while (1) {
> @@ -2961,6 +2986,8 @@ static int __init init_printk_kthread(void)
>  
>  	sched_setscheduler(thread, SCHED_FIFO, &param);
>  	printk_kthread = thread;
> +
> +	WARN_ON(register_pm_notifier(&printk_pm_nb) != 0);

I think that a simple error message might be enough.

Also we might want to force the emergency mode in case of error.
Otherwise, the messages will not appear during suspend and such
operations.


BTW: Do you know about some locations that would need to be
patched explicitly, for example, kexec?

Some old versions of this patch touched console_suspend().
This brought me to snapshot_ioctl(). It looks like an
API that allows to create snapshots from user space.
I wonder if we should switch to the emergency mode
there are well, probably in the SNAPSHOT_FREEZE
stage.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCHv3 5/5] printk: register PM notifier
  2017-05-30  9:55   ` Petr Mladek
@ 2017-05-31  6:42     ` Sergey Senozhatsky
  2017-06-27 14:45       ` Petr Mladek
  0 siblings, 1 reply; 51+ messages in thread
From: Sergey Senozhatsky @ 2017-05-31  6:42 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Jan Kara, Andrew Morton,
	Peter Zijlstra, Rafael J . Wysocki, Eric Biederman,
	Greg Kroah-Hartman, Jiri Slaby, Pavel Machek, Andreas Mohr,
	Tetsuo Handa, linux-kernel, Sergey Senozhatsky

Hello Petr,

thanks for taking a look.

On (05/30/17 11:55), Petr Mladek wrote:
[..]
> My understanding is that notifiers should return NOTIFY_OK
> when they handled it and NOTIFY_DONE when they did nothing.
> For example, see cpu_hotplug_pm_callback().
> 
> In reality, there does not seem to be a big difference,
> see notifier_to_errno() in __pm_notifier_call_chain().
> 
> Anyway, I would try to follow the cpu_hotplug_pm_callback()
> example.

good points, will fix.

> > +	WARN_ON(register_pm_notifier(&printk_pm_nb) != 0);
> 
> I think that a simple error message might be enough.
> 
> Also we might want to force the emergency mode in case of error.
> Otherwise, the messages will not appear during suspend and such
> operations.

good point, will fix.

> BTW: Do you know about some locations that would need to be
> patched explicitly, for example, kexec?

a tricky question and I need to think more. I'm a bit overloaded
with some other stuff now, sorry for long replies.

in case of kexec_image->preserve_context my assumption was that
PM would send out PM notifications. but it seems that I need to have
a second look. quite possible we will have to register yet another
PM notifier - add syscore_ops to syscore_ops_list via
register_syscore_ops() - the one that would be called from
syscore_suspend().

the other branch invokes kernel_restart_prepare(), which updates
`system_statem' variable and we test it in console_offload_printing().

some other cases also update `system_statem'. the exception is maybe
sysrq emergency restart. but I'm a bit uncertain here. on one hand the
expected emergency restart behavior is "just reboot the system, OK?",
on the other hand there are people who expect kernel log messages to
be flushed before that "just reboot the system" thing.

> Some old versions of this patch touched console_suspend().
> This brought me to snapshot_ioctl(). It looks like an
> API that allows to create snapshots from user space.
> I wonder if we should switch to the emergency mode
> there are well, probably in the SNAPSHOT_FREEZE
> stage.

my assumption was that PM ioctl sends out PM notifications.
see snapshot_open()/snapshot_release().

	-ss

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCHv3 2/5] printk: introduce printing kernel thread
  2017-05-29  9:29     ` Petr Mladek
  2017-05-29 12:12       ` Jan Kara
@ 2017-05-31  7:22       ` Sergey Senozhatsky
  2017-06-28 12:19         ` Petr Mladek
  1 sibling, 1 reply; 51+ messages in thread
From: Sergey Senozhatsky @ 2017-05-31  7:22 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Steven Rostedt, Jan Kara,
	Andrew Morton, Peter Zijlstra, Rafael J . Wysocki,
	Eric Biederman, Greg Kroah-Hartman, Jiri Slaby, Pavel Machek,
	Andreas Mohr, Tetsuo Handa, linux-kernel

Hello Petr,

On (05/29/17 11:29), Petr Mladek wrote:
[..]
> > +static inline bool console_offload_printing(void)
> > +{
> > +	static struct task_struct *printing_task;
> > +	static unsigned long long lines_printed;
> > +	static unsigned long saved_csw;
> > +
> > +	if (!printk_offloading_enabled())
> > +		return false;
> > +
> > +	if (system_state != SYSTEM_RUNNING || oops_in_progress)
> > +		return false;
> 
> Wow, I really like this test. I was not aware of this variable.
> 
> Well, I would allow to use the offload also during boot.
> We have reports when softlockups happened during boot.

hm, interesting. I guess we can tweak this part. e.g.
	"if system_stat > SYSTEM_RUNNING then return false"

> > +	/* A new task - reset the counters. */
> > +	if (printing_task != current) {
> > +		lines_printed = 0;
> > +		printing_task = current;
> > +		saved_csw = current->nvcsw + current->nivcsw;
> > +		return false;
> > +	}
> > +
> > +	lines_printed++;
> > +	if (lines_printed < atomic_print_limit)
> > +		return false;
> > +
> > +	if (current == printk_kthread) {
> > +		/*
> > +		 * Reset the `lines_printed' counter, just in case if
> > +		 * printk_kthread is the only process left that would
> > +		 * down() console_sem and call console_unlock().
> > +		 */
> > +		lines_printed = 0;
> > +		return true;
> > +	}
> > +
> > +	/*
> > +	 * A trivial emergency enforcement - give up on printk_kthread if
> > +	 * we can't wake it up. This assumes that `atomic_print_limit' is
> > +	 * reasonably and sufficiently large.
> > +	 */
> > +	if (lines_printed > 10 * (unsigned long long)atomic_print_limit &&
> > +			saved_csw == (current->nvcsw + current->nivcsw)) {
> > +		printk_enforce_emergency = true;
> > +		pr_crit("Declaring printk emergency mode.\n");
> > +		return false;
> > +	}
> 
> This is interesting way how to detect that the system is in the
> emergency state. The question is how reliable it is.
> 
> It might work only if the waken printk kthread is scheduled in a predictable
> time period.

yep.

> You try to achieve this by setting read time priority

yes, that's the reason.

> (4th patch of this patchset). The question is how this would work:

> First, the real time priority is questionable on its own. Logging
> is important but the real time priority is dangerous. Any "flood"
> of messages will starve all other processes with normal priority.
> It is better than a softlockup but it might cause problems as well.

so I try to minimize the negative impact of RT prio here. printk_kthread
is not special any more. it's an auxiliary kthread that we sometimes
wake_up. the thing is that printk_kthread also must offload at some
point, basically the same `atomic_print_limit' limit applies to it as
well. I think I made a mistake by resetting `lines_printed' in
'current == printk_kthread' branch. what I meant to do was

	if (current == printk_kthread) {
		/*
		 * Do not reset `lines_printed' counter. Force
		 * printk_kthread to offload printing once it goes above
		 * the `atomic_print_limit' limit, but avoid
		 * `printk_enforce_emergency'. `printk_kthread' might be
		 * the only process left that would down() console_sem and
		 * call console_unlock().
		 */
		lines_printed--;
		return true;
	}

so `printk_kthread' will try to up()/down() console_sem once it reached
`atomic_print_limit'. just like any other process.

> Third, if the console_trylock() fails below, it is not guaranteed that
> the other waiter will really continue printing. The console_trylock()
> would fail because someone else called console_lock(), failed to get it,
> and went into sleep. There is no guarantee that it will be waken
> once again.

please explain, why wouldn't it? we call up() after every line we print
once the task exceeds `atomic_print_limit'. you mean there probably
won't be any tasks in console_sem wait list? just for that case we have
'if (current == printk_kthread)' branch. `printk_kthread' can indeed be
the one and only task to down() the console_sem.

>       + slow => async =>
> 		+ might be far behind
> 		+ loosing messages
> 		+ showing only ones with high log level
> 		+ possible softlockup
> 		+ not guaranteed when dying

yeah. good summary. the previous async printk implementation...
we officially hate it now.

[..]
> The patch tries to reduce the negative impact by detecting when
> the negative impact is visible and switching back to the sync mode.

um, sorry, no. or I simply don't follow. the patch keeps the existing sync
mode and switches to async mode when the problem probably can show up. we
don't work in async mode by default. only when task reaches atomic_print_limit.

[..]
>   + I would suggest to detect flood of messages by a difference
>     between log_next_seq and console_seq. We could take into
>     account the number of messages in the buffer
>     (log_next_seq - log_first_seq)

yeah, I think I already posted a patch that would do exactly this
thing. the problem with printk flood is that it's hard to do the
right thing. if you have just one flooding CPU then yes, forcing
that CPU to printk its messages is a good thing. but if you have
at least 2 CPUs flooding logbuf then we can't do anything. if you
force one CPU to print logbuf messages, the other CPU is still
enjoying the benefits and pleasures of fast printk path (just log_store())
which most likely results in lost messages anyway.

> In addition, I think that a semi-sync-async mode might better
> handle the sudden dead. What is it? I have already talked about
> something similar last time.

but this is what the patch does. but default we are in sync mode.
assuming that `atomic_print_limit' is large enough it can take
seconds before we offload printing to printk_kthread. so that
semi-sync-async is what we are trying to have. am I missing something?

and that's why I'm *a bit* (just *really a bit*) less concerned about cases
that we missed. I expect that `atomic_print_limit' usually will be around
thousands of lines, if not tens of thousands, giving enough time to sync
printing.

> Normal printk() should do something like:
> 
>   printk()
>     vprintk_emit()
>       log_store()
>       if (!deferred)
> 	console_trylock()
> 	console_unlock()
> 	  while (num_lines < atomic_print_limit)
> 	    handle message
> 	  up_console_sem()
> 	  if (pending_messages)
> 	    wakeup(printk_kthread)

this is close to what we do. at the same we have better guarantees.
we don't just wakeup(printk_kthread) and leave. we wait for any other
process to re-take the console_sem. until this happens we can't leave
console_unlock().

> and printk_kthread should do:
> 
>    while() {
> 
>        if (!pending_messages)
> 	 sleep()
> 
>        console_lock()
> 
>        preempt_disable()
>        console_unlock()
> 	 while (pending_message && !need_resched)
> 	   handle message
> 	 up_console_sem()
>        preempt_enable()
>     }

hm. I don't want printk_kthread to be special. just because there are cases
when printk_kthread won't be there. we had too many problems with relying on
printk_kthread in all the corner cases. I want printk_kthread to be just one
extra process that can do the printing for us. if we have X tasks sleeping in
UNINTERRUPTIBLE on console_sem then we better use them; keeping them
in UNINTERRUPTIBLE as long as printk_kthread has pending messages does
no good. those can be user spaces processes missing user-space watchdog's
heartbeat signals, etc. etc. or there can be another process that calls
printk way to often and basically floods logbuf. so printk_kthread will
give it a chance to successfully lock the console_sem and slow down a bit.

so the current logic is

1) we print messages synchronously. up until `atomic_print_limit'. when
  we reach `atomic_print_limit' we keep printing messages synchronously,
  but at the same time we begin asking for help from other processes.
  one of those other processes is printk_kthread. but we don't go all
  in, it doesn't work, and we make sure that one of those 'other processes'
  actually locked console_sem.

2) printk_kthread does the same as any other process. see 1)

> By other words:
> 
>   + printk() should try to get console and handle few lines
>     synchronously. It will handle its own message in most cases.
> 
>   + printk_kthread is only a fallback. It is waken when there are
>     unhandled messages. It handles as many messages as
>     possible in its assigned time slot. But it releases
>     console_sem before it goes into sleep.
> 
> 
> The advantages:
> 
>    + all messages are handled synchronously if there is not
>      a flood of messages
> 
>    + if there is the flood of messages; kthread works only
>      as a fallback; processes that produce the messages
>      are involved into handling the console (nature throttling);
>      the messages appear on the console even when printk_kthread
>      is sleeping; but the softlockup is prevented
> 
> => it should work pretty well in all situations, including the flood
>    of messages and sudden dead.
> 
> What do you think, please?

need to think more.

	-ss

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCHv3 2/5] printk: introduce printing kernel thread
  2017-05-29 12:12       ` Jan Kara
@ 2017-05-31  7:30         ` Sergey Senozhatsky
  2017-05-31 21:44           ` Andreas Mohr
                             ` (2 more replies)
  0 siblings, 3 replies; 51+ messages in thread
From: Sergey Senozhatsky @ 2017-05-31  7:30 UTC (permalink / raw)
  To: Jan Kara
  Cc: Petr Mladek, Sergey Senozhatsky, Sergey Senozhatsky,
	Steven Rostedt, Andrew Morton, Peter Zijlstra,
	Rafael J . Wysocki, Eric Biederman, Greg Kroah-Hartman,
	Jiri Slaby, Pavel Machek, Andreas Mohr, Tetsuo Handa,
	linux-kernel

Hello Jan,

On (05/29/17 14:12), Jan Kara wrote:
[..]
> Actually I had something very similar in old versions of my patch set. And
> it didn't work very well. The problem was that e.g. sometimes scheduler
> decided that printk kthread should run on the same CPU as the process
> currently doing printing and in such case printk kthread never took over
> printing and the machine locked up due to heavy printing.

hm, interesting.

> > First, the real time priority is questionable on its own. Logging
> > is important but the real time priority is dangerous. Any "flood"
> > of messages will starve all other processes with normal priority.
> > It is better than a softlockup but it might cause problems as well.
> 
> Processes with real-time priority should have well bounded runtime (in
> miliseconds). Printk kthread doesn't have such bounded runtime so it should
> not be a real time process as it could hog the CPU it is running on...

yeah, I can easily make it a normal prio task. at the same time
printk_kthread has 'soft' limits on its execution. it's under the
same constraints as the rest of the processes that do printing.
there can be a random RT task doing console_trylock()->console_unlock(),
so we still can hog CPUs. but, yeah, I don't want printk_kthread to be
special.

> So I think what Petr suggests below is better. Keep normal priority, print
> something to console from the process doing printk() and just wake up
> printk kthread and hope it can print the rest. It is not ideal but unless
> there's a flood of messages there is no regression to current state.

hm, this is very close to what I do in my patch. with some additional
guarantess. because people mostly want to have good old printk. that
let's hope part basically doesn't work when it's needed the most. we
had a ton of cases of lost messages in serial logs.

I replied in more details in another mail.

	-ss

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCHv3 2/5] printk: introduce printing kernel thread
  2017-05-31  7:30         ` Sergey Senozhatsky
@ 2017-05-31 21:44           ` Andreas Mohr
  2017-06-01  7:21           ` Sergey Senozhatsky
  2017-06-28 12:45           ` Petr Mladek
  2 siblings, 0 replies; 51+ messages in thread
From: Andreas Mohr @ 2017-05-31 21:44 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Jan Kara, Petr Mladek, Sergey Senozhatsky, Steven Rostedt,
	Andrew Morton, Peter Zijlstra, Rafael J . Wysocki,
	Eric Biederman, Greg Kroah-Hartman, Jiri Slaby, Pavel Machek,
	Andreas Mohr, Tetsuo Handa, linux-kernel

On Wed, May 31, 2017 at 04:30:59PM +0900, Sergey Senozhatsky wrote:
> Hello Jan,
> 
> On (05/29/17 14:12), Jan Kara wrote:
> [..]
> > Actually I had something very similar in old versions of my patch set. And
> > it didn't work very well. The problem was that e.g. sometimes scheduler
> > decided that printk kthread should run on the same CPU as the process
> > currently doing printing and in such case printk kthread never took over
> > printing and the machine locked up due to heavy printing.
> 
> hm, interesting.

Not too knowledgeable, but just my thoughts:
was this on a non-preemption kernel (!CONFIG_PREEMPT).
If so, perhaps we are missing some non-preempt-case yield somewhere.
I'd think that it really *cannot* be that there are relevant processes
(printk kthread) which functionality invoked by a user of that process *does*
know about (read: printk() APIs), yet somehow those printk() APIs then
do not properly cause a yield
[thus possibly and hopefully: to the printk kernel thread]
(quite possibly after some certain amount of printk resource use exceeded).
--> implementation bug??
(due to staying at execution of a client process, despite that one making
massive calls to printk() APIs, where we then ought to be able to
properly set up a preemption point)

Pseudo code:

void printk_use_annotate()
{
  if (count_printk_since_last_servicing > 30)
    yield();
}

Note that a preemption yield action obviously will not reliably guarantee
hitting printk thread
(unless there's an API for directed yield),
thus this mechanism has to be qualified as *unreliable*,
thus this has to be taken into account by implementation code.

But all this reasoning is a strong indication that
there might be no
properly precisely handshaked producer/consumer protocol -
this would e.g. be the case for a proper select() loop
with event handling where the partner would be woken up *precisely*
(via POSIX 3-way handshake!)
when the condition for wakeup is fulfilled
(read: too many pending printk()s or some such),
rather than having some imprecise (sleep-type / "oh I do not have much to
do any more now") "handwaving" scheduling.

> yeah, I can easily make it a normal prio task. at the same time
> printk_kthread has 'soft' limits on its execution. it's under the
> same constraints as the rest of the processes that do printing.
> there can be a random RT task doing console_trylock()->console_unlock(),
> so we still can hog CPUs. but, yeah, I don't want printk_kthread to be
> special.

[printk_thread not special] Indeed. Think clean dependency abstraction.
printk kthread (i.e., that "more special" thread worker context)
should merely be *one* user of these printk queue servicing APIs.
And those servicing APIs should simply be invocable by anyone
who would want to take part in
getting the printk queue load properly serviced,
without any special handwaving.

[I can easily make all those "easy" wishlist requests -
I did not have to painfully design all that stuff ;-)]

Andreas Mohr

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCHv3 2/5] printk: introduce printing kernel thread
  2017-05-31  7:30         ` Sergey Senozhatsky
  2017-05-31 21:44           ` Andreas Mohr
@ 2017-06-01  7:21           ` Sergey Senozhatsky
  2017-06-01  7:23             ` Sergey Senozhatsky
                               ` (2 more replies)
  2017-06-28 12:45           ` Petr Mladek
  2 siblings, 3 replies; 51+ messages in thread
From: Sergey Senozhatsky @ 2017-06-01  7:21 UTC (permalink / raw)
  To: Jan Kara, Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Andrew Morton,
	Peter Zijlstra, Rafael J . Wysocki, Eric Biederman,
	Greg Kroah-Hartman, Jiri Slaby, Pavel Machek, Andreas Mohr,
	Tetsuo Handa, linux-kernel, Sergey Senozhatsky

On (05/31/17 16:30), Sergey Senozhatsky wrote:
> On (05/29/17 14:12), Jan Kara wrote:
> [..]
> > Actually I had something very similar in old versions of my patch set. And
> > it didn't work very well. The problem was that e.g. sometimes scheduler
> > decided that printk kthread should run on the same CPU as the process
> > currently doing printing and in such case printk kthread never took over
> > printing and the machine locked up due to heavy printing.
> 
> hm, interesting.

that's a tricky problem to deal with.



... so may be we can have per-CPU printk kthreads then

	static DEFINE_PER_CPU(struct task_struct *, printk_kthread);


SMP hotplug threads, to be precise, the same way as watchdog has it. and
then during offloading we can wake_up any printk_kthread that is knowingly
not from this-CPU, all of them, let them compete for the console_sem.

just a quick idea.

thoughts?

	-ss

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCHv3 2/5] printk: introduce printing kernel thread
  2017-06-01  7:21           ` Sergey Senozhatsky
@ 2017-06-01  7:23             ` Sergey Senozhatsky
  2017-06-01  9:20             ` Sergey Senozhatsky
  2017-06-28 13:17             ` Petr Mladek
  2 siblings, 0 replies; 51+ messages in thread
From: Sergey Senozhatsky @ 2017-06-01  7:23 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Jan Kara, Petr Mladek, Sergey Senozhatsky, Steven Rostedt,
	Andrew Morton, Peter Zijlstra, Rafael J . Wysocki,
	Eric Biederman, Greg Kroah-Hartman, Jiri Slaby, Pavel Machek,
	Andreas Mohr, Tetsuo Handa, linux-kernel

On (06/01/17 16:21), Sergey Senozhatsky wrote:
> On (05/31/17 16:30), Sergey Senozhatsky wrote:
> > On (05/29/17 14:12), Jan Kara wrote:
> > [..]
> > > Actually I had something very similar in old versions of my patch set. And
> > > it didn't work very well. The problem was that e.g. sometimes scheduler
> > > decided that printk kthread should run on the same CPU as the process
> > > currently doing printing and in such case printk kthread never took over
> > > printing and the machine locked up due to heavy printing.
> > 
> > hm, interesting.
> 
> that's a tricky problem to deal with.
> 
> 
> 
> ... so may be we can have per-CPU printk kthreads then
> 
> 	static DEFINE_PER_CPU(struct task_struct *, printk_kthread);
> 
> 
> SMP hotplug threads, to be precise, the same way as watchdog has it. and
> then during offloading we can wake_up any printk_kthread that is knowingly
> not from this-CPU, all of them, let them compete for the console_sem.
             ^^^  *OR* all of them

	     -ss

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCHv3 2/5] printk: introduce printing kernel thread
  2017-06-01  7:21           ` Sergey Senozhatsky
  2017-06-01  7:23             ` Sergey Senozhatsky
@ 2017-06-01  9:20             ` Sergey Senozhatsky
  2017-06-28 13:17             ` Petr Mladek
  2 siblings, 0 replies; 51+ messages in thread
From: Sergey Senozhatsky @ 2017-06-01  9:20 UTC (permalink / raw)
  To: Jan Kara, Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Andrew Morton,
	Peter Zijlstra, Rafael J . Wysocki, Eric Biederman,
	Greg Kroah-Hartman, Jiri Slaby, Pavel Machek, Andreas Mohr,
	Tetsuo Handa, linux-kernel, Sergey Senozhatsky

On (06/01/17 16:21), Sergey Senozhatsky wrote:
> On (05/31/17 16:30), Sergey Senozhatsky wrote:
> > On (05/29/17 14:12), Jan Kara wrote:
> > [..]
> > > Actually I had something very similar in old versions of my patch set. And
> > > it didn't work very well. The problem was that e.g. sometimes scheduler
> > > decided that printk kthread should run on the same CPU as the process
> > > currently doing printing and in such case printk kthread never took over
> > > printing and the machine locked up due to heavy printing.
> > 
> > hm, interesting.
> 
> that's a tricky problem to deal with.
> 
> 
> 
> ... so may be we can have per-CPU printk kthreads then
> 
> 	static DEFINE_PER_CPU(struct task_struct *, printk_kthread);
> 
> 
> SMP hotplug threads, to be precise, the same way as watchdog has it. and
> then during offloading we can wake_up any printk_kthread that is knowingly
> not from this-CPU, all of them, let them compete for the console_sem.
> 
> just a quick idea.
> 
> thoughts?

and we, of course, can provide a user space knob to enforce cpumask
of printk_kthreads, if someone doesn't want to have too many printk
kthreads (e.g. a system with 200 CPUs). so we can wake_up printk_kthread
only on given CPUs.

so I hacked a quick and dirty version of printing offloading using smp
threads, ran some tests and it seems to work better then the version
with a single printk_kthread.

overall this looks like the right direction. to me. well, I may be
wrong.

	-ss

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCHv3 5/5] printk: register PM notifier
  2017-05-31  6:42     ` Sergey Senozhatsky
@ 2017-06-27 14:45       ` Petr Mladek
  0 siblings, 0 replies; 51+ messages in thread
From: Petr Mladek @ 2017-06-27 14:45 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Sergey Senozhatsky, Steven Rostedt, Jan Kara, Andrew Morton,
	Peter Zijlstra, Rafael J . Wysocki, Eric Biederman,
	Greg Kroah-Hartman, Jiri Slaby, Pavel Machek, Andreas Mohr,
	Tetsuo Handa, linux-kernel

On Wed 2017-05-31 15:42:24, Sergey Senozhatsky wrote:
> On (05/30/17 11:55), Petr Mladek wrote:
> > BTW: Do you know about some locations that would need to be
> > patched explicitly, for example, kexec?
> 
> a tricky question and I need to think more. I'm a bit overloaded
> with some other stuff now, sorry for long replies.

Same here.

> in case of kexec_image->preserve_context my assumption was that
> PM would send out PM notifications. but it seems that I need to have
> a second look. quite possible we will have to register yet another
> PM notifier - add syscore_ops to syscore_ops_list via
> register_syscore_ops() - the one that would be called from
> syscore_suspend().

syscore_suspend() is called also in many other locations. Also it
might be too late. For example, in the chain:

  hybernate()
   power_down()
     hibernation_platform_enter()
       syscore_suspend()

PM_HIBERNATION_PREPARE notifiers are called much earlier
in hybernate().


> the other branch invokes kernel_restart_prepare(), which updates
> `system_statem' variable and we test it in console_offload_printing().

Sounds promissing.


> some other cases also update `system_statem'. the exception is maybe
> sysrq emergency restart. but I'm a bit uncertain here. on one hand the
> expected emergency restart behavior is "just reboot the system, OK?",
> on the other hand there are people who expect kernel log messages to
> be flushed before that "just reboot the system" thing.
> 
> > Some old versions of this patch touched console_suspend().
> > This brought me to snapshot_ioctl(). It looks like an
> > API that allows to create snapshots from user space.
> > I wonder if we should switch to the emergency mode
> > there are well, probably in the SNAPSHOT_FREEZE
> > stage.
> 
> my assumption was that PM ioctl sends out PM notifications.
> see snapshot_open()/snapshot_release().

Great catch! I have missed this. It seems that the open()
funtion really has to be called before any ioctl() stuff
and we are on the safe side.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCHv3 2/5] printk: introduce printing kernel thread
  2017-05-31  7:22       ` Sergey Senozhatsky
@ 2017-06-28 12:19         ` Petr Mladek
  2017-06-29  6:26           ` Andreas Mohr
  2017-06-29  7:33           ` Sergey Senozhatsky
  0 siblings, 2 replies; 51+ messages in thread
From: Petr Mladek @ 2017-06-28 12:19 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Sergey Senozhatsky, Steven Rostedt, Jan Kara, Andrew Morton,
	Peter Zijlstra, Rafael J . Wysocki, Eric Biederman,
	Greg Kroah-Hartman, Jiri Slaby, Pavel Machek, Andreas Mohr,
	Tetsuo Handa, linux-kernel

On Wed 2017-05-31 16:22:33, Sergey Senozhatsky wrote:
> Hello Petr,
> 
> On (05/29/17 11:29), Petr Mladek wrote:
> [..]
> > > +static inline bool console_offload_printing(void)
> > > +{
> > > +	 * A trivial emergency enforcement - give up on printk_kthread if
> > > +	 * we can't wake it up. This assumes that `atomic_print_limit' is
> > > +	 * reasonably and sufficiently large.
> > > +	 */
> > > +	if (lines_printed > 10 * (unsigned long long)atomic_print_limit &&
> > > +			saved_csw == (current->nvcsw + current->nivcsw)) {
> > > +		printk_enforce_emergency = true;
> > > +		pr_crit("Declaring printk emergency mode.\n");
> > > +		return false;
> > > +	}
> > 
> > It might work only if the waken printk kthread is scheduled in a predictable
> > time period.
> 
> > You try to achieve this by setting real time priority
> 
> yes, that's the reason.
> 
> > (4th patch of this patchset). The question is how this would work:
> 
> > First, the real time priority is questionable on its own. Logging
> > is important but the real time priority is dangerous. Any "flood"
> > of messages will starve all other processes with normal priority.
> > It is better than a softlockup but it might cause problems as well.
> 
> so I try to minimize the negative impact of RT prio here. printk_kthread
> is not special any more. it's an auxiliary kthread that we sometimes
> wake_up. the thing is that printk_kthread also must offload at some
> point, basically the same `atomic_print_limit' limit applies to it as
> well.

You might call cond_resched() outside console_unlock(). But you have
to keep printk_kthread in runnable state as long as there are pending
messages. Then scheduler will always prefer this RT task over non-RT
tasks. Or am I wrong?

The more I think about it, the more I am persuaded that RT priority
is no no for printk_kthread.

> > Third, if the console_trylock() fails below, it is not guaranteed that
> > the other waiter will really continue printing. The console_trylock()
> > would fail because someone else called console_lock(), failed to get it,
> > and went into sleep. There is no guarantee that it will be waken
> > once again.
> 
> please explain, why wouldn't it? we call up() after every line we print
> once the task exceeds `atomic_print_limit'. you mean there probably
> won't be any tasks in console_sem wait list? just for that case we have
> 'if (current == printk_kthread)' branch. `printk_kthread' can indeed be
> the one and only task to down() the console_sem.

I think that I was unclear and talked only about one possibility.

console_trylock() fails if anyone else already owns the lock
or if anyone else is in the wait list. I talked about the 2nd
possibility. But both ways look tricky:

1. We are talking about flood of messages. printk() does
   console_trylock() and does not add itself into the wait list.

   It means that another printk() could take over the console
   stuff only when its console_trylock() is called in the small
   window when console_unlock() releases the lock:

   CPU0				CPU1

   console_unlock()		printk()
    //handle few lines		// store message into log buffer
    up_console_sem()
				console_trylock()

    if (pending_output)
      console_trylock()

   The printk() on CPU1 got console_lock only because it called
   console_trylock() in the right time. This looks too tricky
   to me.

   Note that the interleaving of the two processes is complicated
   also because they both take logbug_lock. It might affect the
   chances as well (in possive or negative way).

2. There are only few functions that call console_lock() and
   add itself into the wait queue.

   They will get the console_lock directly only when they race
   the same way as printk() described above.

   The more likely scenario is that they will fail to get
   console_lock because someone else is in console_unlock().
   They will add itself into the wait queue and get into
   sleep.

   Then up_console_sem() will set TASK_RUNNING for the waiting
   process and add it to the run queue. The console_trylock()
   in console_unlock() will fail because the lock is assigned
   to the waiter. But the waiter is not running. It is just
   in the run queue. There is no quarantee how soon it will
   get scheduled. It might not happen at all if the system
   is in bad state.

> >       + slow => async =>
> > 		+ might be far behind
> > 		+ loosing messages
> > 		+ showing only ones with high log level
> > 		+ possible softlockup
> > 		+ not guaranteed when dying
> 
> yeah. good summary. the previous async printk implementation...
> we officially hate it now.
> 
> [..]
> > The patch tries to reduce the negative impact by detecting when
> > the negative impact is visible and switching back to the sync mode.
> 
> um, sorry, no. or I simply don't follow. the patch keeps the existing sync
> mode and switches to async mode when the problem probably can show up. we
> don't work in async mode by default. only when task reaches atomic_print_limit.

I talked about situation where the async mode was active and we tried
to detect emergency mode and forced the sync mode again.

> [..]
> >   + I would suggest to detect flood of messages by a difference
> >     between log_next_seq and console_seq. We could take into
> >     account the number of messages in the buffer
> >     (log_next_seq - log_first_seq)
> 
> yeah, I think I already posted a patch that would do exactly this
> thing. the problem with printk flood is ... if you have at least 2 CPUs
> flooding logbuf ... other CPU is still enjoying ... just log_store()

Just thinking loudly. I wonder if we could somehow detect this and
throttle such processes in printk(). I mean to make them spin there
for a bit ;-) Well, I would leave this for later optimizations.

> > In addition, I think that a semi-sync-async mode might better
> > handle the sudden dead. What is it? I have already talked about
> > something similar last time.
> 
> but this is what the patch does. but default we are in sync mode.
> assuming that `atomic_print_limit' is large enough it can take
> seconds before we offload printing to printk_kthread. so that
> semi-sync-async is what we are trying to have. am I missing something?

Perfect. I am really sorry, I was too deep in thinking and missed how
close the two proposals actually were.

> and that's why I'm *a bit* (just *really a bit*) less concerned about cases
> that we missed. I expect that `atomic_print_limit' usually will be around
> thousands of lines, if not tens of thousands, giving enough time to sync
> printing.

Well, the limit must be low enough to prevent soft lockup. In fact, it
should be low enough to keep almost normal scheduling rythm. It does
not make sense to block "innoncent" processes for seconds when they do
important job and just by chance printed their rare message.

On the other hand, it should be big enough to allow flushing
some average log, e.g. lockdep, sysrq report.

I might be too optimistic. Maybe it is not possible to tune it
this way.

> > Normal printk() should do something like:
> > 
> >   printk()
> >     vprintk_emit()
> >       log_store()
> >       if (!deferred)
> > 	console_trylock()
> > 	console_unlock()
> > 	  while (num_lines < atomic_print_limit)
> > 	    handle message
> > 	  up_console_sem()
> > 	  if (pending_messages)
> > 	    wakeup(printk_kthread)
> 
> this is close to what we do.

I see it now.

> at the same we have better guarantees.
> we don't just wakeup(printk_kthread) and leave. we wait for any other
> process to re-take the console_sem. until this happens we can't leave
> console_unlock().

And this is my problem. I am scared of the waiting. It is very hard
to predict, especially without RT priority. But it is tricky anyway,
see above.

It makes it very hard to tune the emergency situation detection.
If we enable the emergency mode too easily, the soft lockups will
be back again.

In addition, it adds another dependecy on the scheduler behavior.
It is a can of worms as explained by Jan Kara in another mail.

> 
> > and printk_kthread should do:
> > 
> >    while() {
> > 
> >        if (!pending_messages)
> > 	 sleep()
> > 
> >        console_lock()
> > 
> >        preempt_disable()
> >        console_unlock()
> > 	 while (pending_message && !need_resched)
> > 	   handle message
> > 	 up_console_sem()
> >        preempt_enable()
> >     }
> 
> hm. I don't want printk_kthread to be special. just because there are cases
> when printk_kthread won't be there. we had too many problems with relying on
> printk_kthread in all the corner cases. I want printk_kthread to be just one
> extra process that can do the printing for us.
> if we have X tasks sleeping in UNINTERRUPTIBLE on console_sem then
> we better use them; keeping them in UNINTERRUPTIBLE as long as
> printk_kthread has pending messages does no good.

I am a bit confused by this paragraph. What exactly makes the
printk_kthread special in my proposal? Is the main problem
that it checks need_resched instead of atomic_print_limit?

You are right that it should not block other waiters. And I forgot
to take other console_lock() callers into account.

My motivation was exactly the opposite. I wanted to reduce
unnecessary up_console_sem()/down_console_sem() dance
and slow down console output by this. The printk_kthread
is special after all because its only dute is to flush the console.
Also it is the only process where we know for sure that it does
the job in normal context. It looked to me that it made sense
to use its assigned time slot as much as possible.

Anyway, this is just an optimization. Let's handle printk_kthread
like any other process for now.

Summary:

Our two proposals are very close after all. I suggest to make
the following changes in your patch:

  + Remove the waiting for another console_lock owner. It is
    too tricky.

  + Instead try to reduce sleeping with console_lock taken.
    Remove that cond_resched() from console_unlock(). Maybe
    even call console_trylock()/console_unlock() with
    disabled preemtion again. It will increase the chance
    that anyone else will continue handling the console.

  + keep the code to force sync mode in known emergency
    situations (halt, suspend, ...).

This way we should be good in all situations:

  + sudden death because we are in sync mode until atomic_limit
    is reached

  + flood of messages because printk() does not sleep with
    console_lock taken. Either someone is flushing console
    or any printk() call could continue flushing the console.

  + critical situations because we force the sync mode
    explicitely

How does that sound, please?

Best Regards,
Petr

PS: I am sorry for the late reply. I was deep in other things.
I wanted to anwer this with a clean head.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCHv3 2/5] printk: introduce printing kernel thread
  2017-05-31  7:30         ` Sergey Senozhatsky
  2017-05-31 21:44           ` Andreas Mohr
  2017-06-01  7:21           ` Sergey Senozhatsky
@ 2017-06-28 12:45           ` Petr Mladek
  2 siblings, 0 replies; 51+ messages in thread
From: Petr Mladek @ 2017-06-28 12:45 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Jan Kara, Sergey Senozhatsky, Steven Rostedt, Andrew Morton,
	Peter Zijlstra, Rafael J . Wysocki, Eric Biederman,
	Greg Kroah-Hartman, Jiri Slaby, Pavel Machek, Andreas Mohr,
	Tetsuo Handa, linux-kernel

On Wed 2017-05-31 16:30:59, Sergey Senozhatsky wrote:
> Hello Jan,
> 
> On (05/29/17 14:12), Jan Kara wrote:
> [..]
> > Actually I had something very similar in old versions of my patch set. And
> > it didn't work very well. The problem was that e.g. sometimes scheduler
> > decided that printk kthread should run on the same CPU as the process
> > currently doing printing and in such case printk kthread never took over
> > printing and the machine locked up due to heavy printing.
> 
> hm, interesting.
> 
> > > First, the real time priority is questionable on its own. Logging
> > > is important but the real time priority is dangerous. Any "flood"
> > > of messages will starve all other processes with normal priority.
> > > It is better than a softlockup but it might cause problems as well.
> > 
> > Processes with real-time priority should have well bounded runtime (in
> > miliseconds). Printk kthread doesn't have such bounded runtime so it should
> > not be a real time process as it could hog the CPU it is running on...
> 
> yeah, I can easily make it a normal prio task. at the same time
> printk_kthread has 'soft' limits on its execution. it's under the
> same constraints as the rest of the processes that do printing.
> there can be a random RT task doing console_trylock()->console_unlock(),
> so we still can hog CPUs. but, yeah, I don't want printk_kthread to be
> special.

There is a difference. The only job of printk_kthread is to flush
messages to the console. It is scheduled as long as there
are pending messages. Any other RT task has other purpose. It stops
being scheduled when the other purpose is handled.

I am sorry if this was obvious after reading my other reply.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCHv3 2/5] printk: introduce printing kernel thread
  2017-06-01  7:21           ` Sergey Senozhatsky
  2017-06-01  7:23             ` Sergey Senozhatsky
  2017-06-01  9:20             ` Sergey Senozhatsky
@ 2017-06-28 13:17             ` Petr Mladek
  2017-06-29  7:40               ` Sergey Senozhatsky
  2 siblings, 1 reply; 51+ messages in thread
From: Petr Mladek @ 2017-06-28 13:17 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Jan Kara, Sergey Senozhatsky, Steven Rostedt, Andrew Morton,
	Peter Zijlstra, Rafael J . Wysocki, Eric Biederman,
	Greg Kroah-Hartman, Jiri Slaby, Pavel Machek, Andreas Mohr,
	Tetsuo Handa, linux-kernel

On Thu 2017-06-01 16:21:02, Sergey Senozhatsky wrote:
> On (05/31/17 16:30), Sergey Senozhatsky wrote:
> > On (05/29/17 14:12), Jan Kara wrote:
> > [..]
> > > Actually I had something very similar in old versions of my patch set. And
> > > it didn't work very well. The problem was that e.g. sometimes scheduler
> > > decided that printk kthread should run on the same CPU as the process
> > > currently doing printing and in such case printk kthread never took over
> > > printing and the machine locked up due to heavy printing.
> > 
> > hm, interesting.
> 
> that's a tricky problem to deal with.
> 
> 
> 
> ... so may be we can have per-CPU printk kthreads then
> 
> 	static DEFINE_PER_CPU(struct task_struct *, printk_kthread);
> 
> 
> SMP hotplug threads, to be precise, the same way as watchdog has it. and
> then during offloading we can wake_up any printk_kthread that is knowingly
> not from this-CPU, all of them, let them compete for the console_sem.
> 
> just a quick idea.
> 
> thoughts?

I am not sure if this is worth the resources. It think that one
big win of workqueues was that it reduced the amount of running
per-CPU kthreads. There are systems with thousands of CPUs.

I am a bit afraid to use workqueues for flushing consoles.
It would be another dependency and another risk.

Otherwise, per-CPU kthreads/workqueues primary handle per-CPU
resources. But printk_kthread would handle consoles that
need to be serialized anyway. It sounds weird to have
per-CPU task just to increase the chance that it will
get scheduled.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCHv3 2/5] printk: introduce printing kernel thread
  2017-06-28 12:19         ` Petr Mladek
@ 2017-06-29  6:26           ` Andreas Mohr
  2017-06-29  7:00             ` Andreas Mohr
  2017-06-29  7:33           ` Sergey Senozhatsky
  1 sibling, 1 reply; 51+ messages in thread
From: Andreas Mohr @ 2017-06-29  6:26 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Steven Rostedt, Jan Kara,
	Andrew Morton, Peter Zijlstra, Rafael J . Wysocki,
	Eric Biederman, Greg Kroah-Hartman, Jiri Slaby, Pavel Machek,
	Andreas Mohr, Tetsuo Handa, linux-kernel

On Wed, Jun 28, 2017 at 02:19:25PM +0200, Petr Mladek wrote:
> On Wed 2017-05-31 16:22:33, Sergey Senozhatsky wrote:
> > so I try to minimize the negative impact of RT prio here. printk_kthread
> > is not special any more. it's an auxiliary kthread that we sometimes
> > wake_up. the thing is that printk_kthread also must offload at some
> > point, basically the same `atomic_print_limit' limit applies to it as
> > well.
> 
> You might call cond_resched() outside console_unlock(). But you have
> to keep printk_kthread in runnable state as long as there are pending
> messages. Then scheduler will always prefer this RT task over non-RT
> tasks. Or am I wrong?

Not sure whether I mentioned this advice before, but:
I believe we should strive to achieve a design where
cond_resched() etc. is *not* needed -
cond_resched() / sleep() etc. likely are signs of extended code smell:
one should strive to achieve handling which has
a properly *precisely*/*strictly* handshaked
request/response (producer/consumer) communication protocol.
I.e., IPC mechanism objects (mutex etc.).
That way, one avoids
the polling-type, imprecise-type "are we there yet? is it our job now?" handling
and instead uses
properly precise (thus, *not* needlessly inefficient!)
scheduler wakeup mechanisms.

Thus, it's "merely" (hah!) a matter of
designing handling where responsibilities / transitions are clearly spelt out,
thus end up as
properly precisely implemented notifications via IPC mechanisms.

For a very simple setup (which quite possibly cannot be done this easily!),
things could be:

printk_work_wakeup_one_worker()
{
  reliably_notify_only_one_available_computing_hardware_resource(); <-- kernel mechanism available, I think
}

void printk_work_dump_my_package(work_count_max)
{
  while (++work_count < work_count_max)
    printk_work_dump_element();
}

void printk_work_dump_handler(work_count_max)
{
  --> added mutex to have continuation check be done within *inner* atomic handling section (avoid race window).
  printk_work_dump_my_package(work_count_max);

  take_mutex();
  bool all_done = !have_payload_remain;
  bool need_continuation = !(all_done);
  if (need_continuation)
    printk_work_wakeup_one_worker();
  release_mutex();
}

worker thread frame function:
WORKER printk_work_worker()
{
  for (;;)
  {
    switch(select())
      case dump_requested:
        printk_work_dump_handler(printk_work_count_max_kthread);
      case termination_requested:
        return;
  }
}

/* tasked_cpus = all_cpus; */
tasked_cpus = all_cpus / 4 */
for(tasked_cpus)
{
  new_kthread(printk_work_worker());
}

printk_impl()
{
  printk_queue_element(...);
  printk_work_dump_handler(printk_work_count_max_immediate);
}

Very Q&D thoughts, but might be helpful...

(ermm, however launching #cpus workers most likely is useless,
since schedulable decision-making does *not* depend on #cpus -
if any computing resource is available, this *can* be executed, thus
quite likely only one worker is needed anyway - no, two, since
one worker needs to be able to wake up precisely *one* other
to have things precisely continue on *another* computing resource!)

Oh, and this doesn't implement
(and especially not reliably/atomically race-window-less)
the case of
having another activity trigger another printk queuing
and thus (potentially - but not necessarily!!) another package dump activity
while some worker activity already is ongoing.
I.e. we've got a race window in the (multi-)use of printk_work_dump_handler()
(which should be solvable, though).

Oh, and rather than doing some specific work_count limit comparisons,
it might actually be more elegant instead to
bundle worker work packages in *advance*,
to achieve simple/isolated/separate submission to
whichever worker one would prefer; inner handling would then simply do:

work_handler()
{
  while (!my_package_done)
    dump_package_element();
}

But splitting into separate work packages bears the risk of
illegal reordering of printk payload elements, thus
most likely one should *not* do this and instead keep doing
simple work_count checks on a *global*/*shared* printk queue payload
(insertion submission of further elements into this queue
must still be possible at any time
without (excessive) blocking, though, of course).

while (can_continue_dumping)
{
payload_mutex_begin();
element = grab_element();
payload_mutex_end();
dump_element(element);
}

payload_mutex_begin();
queue_element(element);
payload_mutex_end();

HTH,

Andreas Mohr

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCHv3 2/5] printk: introduce printing kernel thread
  2017-06-29  6:26           ` Andreas Mohr
@ 2017-06-29  7:00             ` Andreas Mohr
  0 siblings, 0 replies; 51+ messages in thread
From: Andreas Mohr @ 2017-06-29  7:00 UTC (permalink / raw)
  To: Andreas Mohr
  Cc: Petr Mladek, Sergey Senozhatsky, Sergey Senozhatsky,
	Steven Rostedt, Jan Kara, Andrew Morton, Peter Zijlstra,
	Rafael J . Wysocki, Eric Biederman, Greg Kroah-Hartman,
	Jiri Slaby, Pavel Machek, Tetsuo Handa, linux-kernel

On Thu, Jun 29, 2017 at 08:26:20AM +0200, Andreas Mohr wrote:
> On Wed, Jun 28, 2017 at 02:19:25PM +0200, Petr Mladek wrote:
> > On Wed 2017-05-31 16:22:33, Sergey Senozhatsky wrote:
> > > so I try to minimize the negative impact of RT prio here. printk_kthread
> > > is not special any more. it's an auxiliary kthread that we sometimes
> > > wake_up. the thing is that printk_kthread also must offload at some
> > > point, basically the same `atomic_print_limit' limit applies to it as
> > > well.
> > 
> > You might call cond_resched() outside console_unlock(). But you have
> > to keep printk_kthread in runnable state as long as there are pending
> > messages. Then scheduler will always prefer this RT task over non-RT
> > tasks. Or am I wrong?
> 
> Not sure whether I mentioned this advice before, but:
> I believe we should strive to achieve a design where
> cond_resched() etc. is *not* needed -
> cond_resched() / sleep() etc. likely are signs of extended code smell:
> one should strive to achieve handling which has
> a properly *precisely*/*strictly* handshaked
> request/response (producer/consumer) communication protocol.
> I.e., IPC mechanism objects (mutex etc.).
> That way, one avoids
> the polling-type, imprecise-type "are we there yet? is it our job now?" handling
> and instead uses
> properly precise (thus, *not* needlessly inefficient!)
> scheduler wakeup mechanisms.
> 
> Thus, it's "merely" (hah!) a matter of
> designing handling where responsibilities / transitions are clearly spelt out,
> thus end up as
> properly precisely implemented notifications via IPC mechanisms.

Big picture:

IMHO (I might be missing something, or even more)
this still is a simple producer/consumer type scenario:
- you've got any producer inserting payload elements into a *shared* (*global*) store
  shared store --> store mutex needed
- reordering is very undesirable --> keep as one *shared*/*global* data store
- producer side(!): in case of hitting a max limit, queuing of elements
  is discarded completely (emergency situation), BUT with a printk marker element LIMIT EXCEEDED
- consumer side:
  - you've got one central handler which
    simply knows nothing other (SEPARATION OF CONCERNS!) than
    how to reliably dequeue and dump (to console) elements,
    in *any* processing context
- producer side (as a special shortcut handling) will *also* subsequently turn to some consumption stuff
  "dequeue elements until a limit is reached"
  (here: with its special direct-context limit count)
- consumption side is specially constructed
  to have properly atomic inner handling of
  decision-making who is the one to dequeue the store
  at a particular moment in time


Thus, you want:
- reliable insertion at-any-moment
- one consistent/global/shared store
- properly precise scheduler decision-making on how the consumer side is done
- and of course keep concerns (printk user-side handling, printk payload
  element store handling, *thread-specific* worker context handling) properly
implementation-separate

I currently don't see any other complications to this.

HTH,

Andreas Mohr

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCHv3 2/5] printk: introduce printing kernel thread
  2017-06-28 12:19         ` Petr Mladek
  2017-06-29  6:26           ` Andreas Mohr
@ 2017-06-29  7:33           ` Sergey Senozhatsky
  2017-06-29 11:24             ` Tetsuo Handa
                               ` (2 more replies)
  1 sibling, 3 replies; 51+ messages in thread
From: Sergey Senozhatsky @ 2017-06-29  7:33 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Steven Rostedt, Jan Kara,
	Andrew Morton, Peter Zijlstra, Rafael J . Wysocki,
	Eric Biederman, Greg Kroah-Hartman, Jiri Slaby, Pavel Machek,
	Andreas Mohr, Tetsuo Handa, linux-kernel

On (06/28/17 14:19), Petr Mladek wrote:
[..]
> > so I try to minimize the negative impact of RT prio here. printk_kthread
> > is not special any more. it's an auxiliary kthread that we sometimes
> > wake_up. the thing is that printk_kthread also must offload at some
> > point, basically the same `atomic_print_limit' limit applies to it as
> > well.
> 
> You might call cond_resched() outside console_unlock(). But you have
> to keep printk_kthread in runnable state as long as there are pending
> messages. Then scheduler will always prefer this RT task over non-RT
> tasks. Or am I wrong?

if we try to offload from IRQ->console_unlock() (or with preemption
disabled, etc. etc.) and scheduler decides to enqueue printk_kthread
on the same CPU, then no offloading will take place. I can reproduce
it on my system. we need to play some affinity games, I think. but
there are corner cases, once again.

> The more I think about it, the more I am persuaded that RT priority
> is no no for printk_kthread.

yeah, in fact, it didn't work as expected. so I dropped that idea
some time ago.

[..]
> > at the same we have better guarantees.
> > we don't just wakeup(printk_kthread) and leave. we wait for any other
> > process to re-take the console_sem. until this happens we can't leave
> > console_unlock().
> 
> And this is my problem. I am scared of the waiting. It is very hard
> to predict, especially without RT priority. But it is tricky anyway,
> see above.

but.....
the opposite possibility is that messages either won't be printed
soon (until next printk or console_unlock()) or won't be printed
ever at all (in case of sudden system death). I don't think it's
a good alternative.

[..]
> > hm. I don't want printk_kthread to be special. just because there are cases
> > when printk_kthread won't be there. we had too many problems with relying on
> > printk_kthread in all the corner cases. I want printk_kthread to be just one
> > extra process that can do the printing for us.
> > if we have X tasks sleeping in UNINTERRUPTIBLE on console_sem then
> > we better use them; keeping them in UNINTERRUPTIBLE as long as
> > printk_kthread has pending messages does no good.
> 
> I am a bit confused by this paragraph. What exactly makes the
> printk_kthread special in my proposal?

that fact that we rely on it and rely on the scheduler. may be I'm
simply misunderstanding you, sorry if so, but seems that you both want
and don't want to depend on the scheduler at the same time. I choose
not to depend on it. and for this choice to become reasonable we need
to preserve the existing 'direct' print out guarantees. and yes, I
understand that sometimes it may cause good old lockups.

> In addition, it adds another dependecy on the scheduler behavior.
> It is a can of worms as explained by Jan Kara in another mail.

and that's exactly why "wake_up() and leave console_unlock()" is not
going to fly, is it? what am I missing?

[..]
> Our two proposals are very close after all. I suggest to make
> the following changes in your patch:
> 
>   + Remove the waiting for another console_lock owner. It is
>     too tricky.

we lose the printing guarantees this way. what if printk_kthread
doesn't wake up after all? the whole point of this design twist
(and previous discussions) was that people spoke up and said that
they want printk to do the thing it was doing for decades. even if
it would cause lockup reports sometimes (but it doesn't seem to be
such a common problem after all. how many people see printk lockup
reports more or less regularly?).

>   + Instead try to reduce sleeping with console_lock taken.
>     Remove that cond_resched() from console_unlock(). Maybe
>     even call console_trylock()/console_unlock() with
>     disabled preemtion again. It will increase the chance
>     that anyone else will continue handling the console.

console_unlock() should run with the preemption disable, yes.

>   + keep the code to force sync mode in known emergency
>     situations (halt, suspend, ...).
> 
> 
> This way we should be good in all situations:
> 
>   + sudden death because we are in sync mode until atomic_limit
>     is reached

what if sudden death happens right after wake_up(printk_kthread)?
we can't just leave console_unlock().

>   + flood of messages because printk() does not sleep with
>     console_lock taken. Either someone is flushing console
>     or any printk() call could continue flushing the console.
> 
>   + critical situations because we force the sync mode
>     explicitely

	-ss

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCHv3 2/5] printk: introduce printing kernel thread
  2017-06-28 13:17             ` Petr Mladek
@ 2017-06-29  7:40               ` Sergey Senozhatsky
  0 siblings, 0 replies; 51+ messages in thread
From: Sergey Senozhatsky @ 2017-06-29  7:40 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Jan Kara, Sergey Senozhatsky, Steven Rostedt,
	Andrew Morton, Peter Zijlstra, Rafael J . Wysocki,
	Eric Biederman, Greg Kroah-Hartman, Jiri Slaby, Pavel Machek,
	Andreas Mohr, Tetsuo Handa, linux-kernel

On (06/28/17 15:17), Petr Mladek wrote:
> On Thu 2017-06-01 16:21:02, Sergey Senozhatsky wrote:
> > On (05/31/17 16:30), Sergey Senozhatsky wrote:
> > > On (05/29/17 14:12), Jan Kara wrote:
> > > [..]
> > > > Actually I had something very similar in old versions of my patch set. And
> > > > it didn't work very well. The problem was that e.g. sometimes scheduler
> > > > decided that printk kthread should run on the same CPU as the process
> > > > currently doing printing and in such case printk kthread never took over
> > > > printing and the machine locked up due to heavy printing.
> > > 
> > > hm, interesting.
> > 
> > that's a tricky problem to deal with.
> > 
> > 
> > 
> > ... so may be we can have per-CPU printk kthreads then
> > 
> > 	static DEFINE_PER_CPU(struct task_struct *, printk_kthread);
> > 
> > 
> > SMP hotplug threads, to be precise, the same way as watchdog has it. and
> > then during offloading we can wake_up any printk_kthread that is knowingly
> > not from this-CPU, all of them, let them compete for the console_sem.
> > 
> > just a quick idea.
> > 
> > thoughts?
> 
> I am not sure if this is worth the resources. It think that one
> big win of workqueues was that it reduced the amount of running
> per-CPU kthreads. There are systems with thousands of CPUs.
> 
> I am a bit afraid to use workqueues for flushing consoles.
> It would be another dependency and another risk.
> 
> Otherwise, per-CPU kthreads/workqueues primary handle per-CPU
> resources. But printk_kthread would handle consoles that
> need to be serialized anyway. It sounds weird to have
> per-CPU task just to increase the chance that it will
> get scheduled.

yeah. was just a quick idea. it has some 'interesting' options, tho.
I'll reply in another thread.

the waste of resources argument is somewhat interesting. I'm not
arguing, and agree that per-CPU kthread for printk seems like a
massive-massive overkill. the point is - I think that 99.999% of
the time printk_safe and printk_nmi buffers are not needed. they
simply waste memory. on a $BIG systems that's, once again, can be
huge. so in terms of resources printk probably must do better,
already.

	-ss

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCHv3 2/5] printk: introduce printing kernel thread
  2017-06-29  7:33           ` Sergey Senozhatsky
@ 2017-06-29 11:24             ` Tetsuo Handa
  2017-06-30  7:01             ` Sergey Senozhatsky
  2017-06-30 11:54             ` Petr Mladek
  2 siblings, 0 replies; 51+ messages in thread
From: Tetsuo Handa @ 2017-06-29 11:24 UTC (permalink / raw)
  To: sergey.senozhatsky.work, pmladek
  Cc: sergey.senozhatsky, rostedt, jack, akpm, peterz, rjw, ebiederm,
	gregkh, jslaby, pavel, andi, linux-kernel

Sergey Senozhatsky wrote:
> On (06/28/17 14:19), Petr Mladek wrote:
> > Our two proposals are very close after all. I suggest to make
> > the following changes in your patch:
> > 
> >   + Remove the waiting for another console_lock owner. It is
> >     too tricky.
> 
> we lose the printing guarantees this way. what if printk_kthread
> doesn't wake up after all? the whole point of this design twist
> (and previous discussions) was that people spoke up and said that
> they want printk to do the thing it was doing for decades. even if
> it would cause lockup reports sometimes (but it doesn't seem to be
> such a common problem after all. how many people see printk lockup
> reports more or less regularly?).

I do see printk lockup regularly because warn_alloc() from __alloc_pages_may_oom()
is basically doing

   while (1) {
     dump_stack();
     cond_resched();
   }

as long as somebody is looping inside console_unlock() with oom_lock mutex held.
Since warn_alloc() is currently a local DoS attack vector, I'm waiting for this
printk offloading patch to be merged into mainline, but it seems that this
offloading might not be able to become a solution to warn_alloc() lockup problem.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCHv3 2/5] printk: introduce printing kernel thread
  2017-06-29  7:33           ` Sergey Senozhatsky
  2017-06-29 11:24             ` Tetsuo Handa
@ 2017-06-30  7:01             ` Sergey Senozhatsky
  2017-06-30 10:18               ` Tetsuo Handa
  2017-06-30 13:16               ` Petr Mladek
  2017-06-30 11:54             ` Petr Mladek
  2 siblings, 2 replies; 51+ messages in thread
From: Sergey Senozhatsky @ 2017-06-30  7:01 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Steven Rostedt, Jan Kara,
	Andrew Morton, Peter Zijlstra, Rafael J . Wysocki,
	Eric Biederman, Greg Kroah-Hartman, Jiri Slaby, Pavel Machek,
	Andreas Mohr, Tetsuo Handa, linux-kernel

Hello,

On (06/29/17 16:33), Sergey Senozhatsky wrote:
[..]
> On (06/28/17 14:19), Petr Mladek wrote:
> [..]
> > > so I try to minimize the negative impact of RT prio here. printk_kthread
> > > is not special any more. it's an auxiliary kthread that we sometimes
> > > wake_up. the thing is that printk_kthread also must offload at some
> > > point, basically the same `atomic_print_limit' limit applies to it as
> > > well.
> > 
> > You might call cond_resched() outside console_unlock(). But you have
> > to keep printk_kthread in runnable state as long as there are pending
> > messages. Then scheduler will always prefer this RT task over non-RT
> > tasks. Or am I wrong?
> 
> if we try to offload from IRQ->console_unlock() (or with preemption
> disabled, etc. etc.) and scheduler decides to enqueue printk_kthread
> on the same CPU, then no offloading will take place. I can reproduce
> it on my system. we need to play some affinity games, I think. but
> there are corner cases, once again.

something like this (in console_offload_printing()). try to keep
printk_kthread out of this_cpu->rq, so we (hopefully) wake it up
on some other CPU (if there are other CPUs online), that is not
in console_unlock() now:

---
        if (!alloc_cpumask_var(&cpus_allowed, GFP_KERNEL)) {
                wake_up_process(printk_kthread);
                return true;
        }

        cpumask_copy(cpus_allowed, cpu_online_mask);
        cpumask_clear_cpu(smp_processor_id(), cpus_allowed);

        /*
         * If this_cpu is the only one online, then try to wake up
         * `printk_kthread' on it. what else we can do...
         */
        if (cpumask_empty(cpus_allowed))
                cpumask_set_cpu(smp_processor_id(), cpus_allowed);

        set_cpus_allowed_ptr(printk_kthread, cpus_allowed);
        wake_up_process(printk_kthread);

        free_cpumask_var(cpus_allowed);
---

once again, this is merely for testing purposes. I haven't done
many tests yet, will continue testing; but looks promising. there
are some corner cases here.

JFYI, pushed updated patch set to my tree
https://github.com/sergey-senozhatsky/linux-next-ss/commits/printk-kthread-affine

it also contains a debugging patch that I'm using to track
printk_kthread behaviour.


I'm still thinking about Steven's proposals; but we will need offloading
anyways, so the bits we are talking about here are important regardless
the direction printk design will take, I think.


> [..]
> > Our two proposals are very close after all. I suggest to make
> > the following changes in your patch:
> > 
> >   + Remove the waiting for another console_lock owner. It is
> >     too tricky.
> 
> we lose the printing guarantees this way. what if printk_kthread
> doesn't wake up after all? the whole point of this design twist
> (and previous discussions) was that people spoke up and said that
> they want printk to do the thing it was doing for decades. even if
> it would cause lockup reports sometimes

to be clear on this.


we are doing our best in order to avoid lockups caused by console_unlock(),
but the top priority remains messages print out. If we can't guarantee that
anything will take over and print the messages, we continue printing from
the current process, even though it may result in lockups.


this is based on my own experience with the previous "wake_up and forget
about it" async printk patch set (v12) which we rolled out to our fleet
of developers' boards; responses we received from the community; and
somehow it also aligned with the recent Linus' reply

 : If those two things aren't the absolutely primary goals, the whole
 : thing is pointless to even discuss. No amount of cool features,
 : performance, or theoretical deadlock avoidance matters ONE WHIT
 : compared to the two things above.

// the two things above were -- messages on the screen and dmesg.


so, offloading is cool and avoiding lockups is also very much desired,
but we need to print out the messages in the first place.

	-ss

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCHv3 2/5] printk: introduce printing kernel thread
  2017-06-30  7:01             ` Sergey Senozhatsky
@ 2017-06-30 10:18               ` Tetsuo Handa
  2017-06-30 11:57                 ` Sergey Senozhatsky
  2017-06-30 13:16               ` Petr Mladek
  1 sibling, 1 reply; 51+ messages in thread
From: Tetsuo Handa @ 2017-06-30 10:18 UTC (permalink / raw)
  To: sergey.senozhatsky.work, pmladek
  Cc: sergey.senozhatsky, rostedt, jack, akpm, peterz, rjw, ebiederm,
	gregkh, jslaby, pavel, andi, linux-kernel

Sergey Senozhatsky wrote:
>         if (!alloc_cpumask_var(&cpus_allowed, GFP_KERNEL)) {
>                 wake_up_process(printk_kthread);
>                 return true;
>         }

Please avoid memory allocations when trying to print something.
__GFP_DIRECT_RECLAIM allocations (e.g. GFP_KERNEL) can sleep for
unpredictable duration. Allocations without __GFP_NOWARN will cause
e.g. memory allocation failure messages. Even with __GFP_NOWARN,
some messages might be still printed (e.g. serious problem).

> I'm still thinking about Steven's proposals; but we will need offloading
> anyways, so the bits we are talking about here are important regardless
> the direction printk design will take, I think.

Is there a chance that printk() waits for only data queued by that printk()
call (exception will be printk() from NMI). If we carry penalty for printk()
(charge delay according to amount of data queued by that printk()), users
will stop doing stupid flooding with printk() based on an assumption that
offloaded kernel thread will manage magically with guarantee of being
printed out (i.e. users has to become careful).

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCHv3 2/5] printk: introduce printing kernel thread
  2017-06-29  7:33           ` Sergey Senozhatsky
  2017-06-29 11:24             ` Tetsuo Handa
  2017-06-30  7:01             ` Sergey Senozhatsky
@ 2017-06-30 11:54             ` Petr Mladek
  2017-06-30 12:42               ` Sergey Senozhatsky
  2 siblings, 1 reply; 51+ messages in thread
From: Petr Mladek @ 2017-06-30 11:54 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Sergey Senozhatsky, Steven Rostedt, Jan Kara, Andrew Morton,
	Peter Zijlstra, Rafael J . Wysocki, Eric Biederman,
	Greg Kroah-Hartman, Jiri Slaby, Pavel Machek, Andreas Mohr,
	Tetsuo Handa, linux-kernel

On Thu 2017-06-29 16:33:22, Sergey Senozhatsky wrote:
> On (06/28/17 14:19), Petr Mladek wrote:
> [..]
> > > at the same we have better guarantees.
> > > we don't just wakeup(printk_kthread) and leave. we wait for any other
> > > process to re-take the console_sem. until this happens we can't leave
> > > console_unlock().
> > 
> > And this is my problem. I am scared of the waiting. It is very hard
> > to predict, especially without RT priority. But it is tricky anyway,
> > see above.
> 
> but.....
> the opposite possibility is that messages either won't be printed
> soon (until next printk or console_unlock()) or won't be printed
> ever at all (in case of sudden system death). I don't think it's
> a good alternative.

I see it like a weighing machine. There is a "guaranteed" output on
one side and a softlockups prevention on the other side. The more
we prevent the softlockups the less we guarantee the output.

We do not have the same opinion about the balance. My solution
completely prevents softlockups. Your one tries to be more
conservative. It might look that a compromise is better but
we need to consider how it is achieved, what the effect
and side-effects are.

My main unresolved doubts about this patchset are:

1. It gives other processes only very small change to take
   over the job. They either need to call console_trylock()
   in very small "race" window or they need to call
   console_lock(). Where console_lock() only signalizes
   that the caller is willing to take the job and puts
   him into sleep.

   Another detailed description of this problem can be found
   in my previous mail, see
   https://lkml.kernel.org/r/20170628121925.GN1538@pathway.suse.cz

2. It adds rather complex dependency on the scheduler. I know
   that my simplified solution do this as well but another way[*]
   Let me explain. I would split the dependency on the code
   and behavior relation.

   From the code side: The current printk() calls wake_up_process()
   and we need to use printk_deferred() in the related scheduler code.
   This patch does this as well, so there is no win and no lose.
   Well, you talk about changing the affinity and other tricks
   in the other mails. This might add more locations where
   printk_deferred() would be needed.

   From the behavior side: The current code wakes the process
   and is done. The code in this patch wakes the process and
   waits until it[**] gets CPU and really runs. It switches to
   the emergency mode when the other process does not run in time.
   By other words, this patch depends on more actions done
   by the scheduler and changes behavior based on it. IMHO,
   this will be very hard to code, tune, and debug.
   A proper solution might require more code dependency.

   [*] My solution depends on the scheduler in the sense
       that messages will get lost when nobody else will take
       over the console job. The logic is simple, no scheduler
       => only atomic_print_limit messages. It might sound
       drastic but see below. The main win is that it is "simple".

   [**] It is enough when any other process takes over the
       console_lock. But this is tricky, see my 1st point above.

3. The prevention of soft-lockups is questionable. If you are in
   soft-lockup prone situation, the only prevention is to do an
   offload. But if you switch to the emergency mode and give
   up offloading too early, the new code stops preventing
   the softlockup.

   Of course, the patchset does not make it worse but the question
   is how much it really helps. It would be bad to add a lot of
   code/complexity with almost no gain.

IMHO, if we try to solve the 1st problem (chance of offloading),
it might add a risk of deadlocks and/or make the 2nd problem
(dependency on scheduler) worse. Also I am afraid that we would
repeat many dead ways already tried by Jan Kara.

If you will try to improve 3rd problem and make some guaranties
of the soft-lockup prevention, it would make the 2nd problem
(dependency on scheduler) worse. Also the code might be
very hard to understand and tune.

This is why I look for a rather simple solution. IMHO, we both
agree that:

   +  the offload will be activated only when there is
      a flood of messages

   + the only reason to wait for the other handler is to
     better handle sudden death where panic() is not called.

IMHO, the only one who brought the problem of sudden death
was Pavel Machek. AFAIK, he works on embedded systems and
hardware enablement. I guess the combination of the flood
of messages and sudden death is rare there. Also I doubt
that the current code handle it well. The flood is badly
handed in general. In each case, I wonder how long we could
survive flushing messages when there is sudden death
and scheduling does not work.

One problem here is that some questions/doubts are hard to
answer/prove without wide testing.

A compromise might be to start with the simple code
and disable the offloading by default. I am sure that
there will be volunteers that would want to play with it,
e.g. Tetsuo. We would enable it in SUSE as well because
there should not be any regression against what we have
used for years now. We could make it always more complex
according to the feedback and eventually enable it
by default.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCHv3 2/5] printk: introduce printing kernel thread
  2017-06-30 10:18               ` Tetsuo Handa
@ 2017-06-30 11:57                 ` Sergey Senozhatsky
  2017-06-30 12:35                   ` Tetsuo Handa
  0 siblings, 1 reply; 51+ messages in thread
From: Sergey Senozhatsky @ 2017-06-30 11:57 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: sergey.senozhatsky.work, pmladek, sergey.senozhatsky, rostedt,
	jack, akpm, peterz, rjw, ebiederm, gregkh, jslaby, pavel, andi,
	linux-kernel

On (06/30/17 19:18), Tetsuo Handa wrote:
> Sergey Senozhatsky wrote:
> >         if (!alloc_cpumask_var(&cpus_allowed, GFP_KERNEL)) {
> >                 wake_up_process(printk_kthread);
> >                 return true;
> >         }
> 
> Please avoid memory allocations when trying to print something.
> __GFP_DIRECT_RECLAIM allocations (e.g. GFP_KERNEL) can sleep for
> unpredictable duration. Allocations without __GFP_NOWARN will cause
> e.g. memory allocation failure messages. Even with __GFP_NOWARN,
> some messages might be still printed (e.g. serious problem).

wow... dammit, what a stupid mistake. you are 100% right, thanks!
it's sooo unsafe and dumb, console_unlock() and, thus, offloading
can happen from IRQ.

thanks again.

> > I'm still thinking about Steven's proposals; but we will need offloading
> > anyways, so the bits we are talking about here are important regardless
> > the direction printk design will take, I think.
> 
> Is there a chance that printk() waits for only data queued by that printk()
> call (exception will be printk() from NMI).

hm, I don't think this can be done easily... consider

	console_lock();
	printk();
	printk();
	...			-> this guys will wait forever. nothing
				   flushes the logbuf.
	printk();
	console_unlock();


> If we carry penalty for printk() (charge delay according to amount of
> data queued by that printk()), users will stop doing stupid flooding
> with printk() based on an assumption that offloaded kernel thread will
> manage magically with guarantee of being printed out (i.e. users has
> to become careful).

ratelimiting is on my list. but it's a bit tricky... what should happen
if one does

	CPU1

	-> IRQ
		printk()
			console_unlock();
		printk()
			console_unlock();
		...
		printk()
			ratelimit();
			console_unlock();

	-> NMI panic()


need to think more.

	-ss

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCHv3 2/5] printk: introduce printing kernel thread
  2017-06-30 11:57                 ` Sergey Senozhatsky
@ 2017-06-30 12:35                   ` Tetsuo Handa
  2017-06-30 12:50                     ` Sergey Senozhatsky
  0 siblings, 1 reply; 51+ messages in thread
From: Tetsuo Handa @ 2017-06-30 12:35 UTC (permalink / raw)
  To: sergey.senozhatsky.work
  Cc: pmladek, sergey.senozhatsky, rostedt, jack, akpm, peterz, rjw,
	ebiederm, gregkh, jslaby, pavel, andi, linux-kernel

Sergey Senozhatsky wrote:
> On (06/30/17 19:18), Tetsuo Handa wrote:
> > > I'm still thinking about Steven's proposals; but we will need offloading
> > > anyways, so the bits we are talking about here are important regardless
> > > the direction printk design will take, I think.
> > 
> > Is there a chance that printk() waits for only data queued by that printk()
> > call (exception will be printk() from NMI).
> 
> hm, I don't think this can be done easily... consider
> 
> 	console_lock();
> 	printk();
> 	printk();
> 	...			-> this guys will wait forever. nothing
> 				   flushes the logbuf.
> 	printk();
> 	console_unlock();

Can't we remove console_lock()/console_unlock() from printk() ?
I think that printk() depends on console_unlock() is complicating.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCHv3 2/5] printk: introduce printing kernel thread
  2017-06-30 11:54             ` Petr Mladek
@ 2017-06-30 12:42               ` Sergey Senozhatsky
  2017-06-30 13:33                 ` Steven Rostedt
  2017-07-04 14:03                 ` Petr Mladek
  0 siblings, 2 replies; 51+ messages in thread
From: Sergey Senozhatsky @ 2017-06-30 12:42 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Steven Rostedt, Jan Kara,
	Andrew Morton, Peter Zijlstra, Rafael J . Wysocki,
	Eric Biederman, Greg Kroah-Hartman, Jiri Slaby, Pavel Machek,
	Andreas Mohr, Tetsuo Handa, linux-kernel

Hello,

On (06/30/17 13:54), Petr Mladek wrote:
> > but.....
> > the opposite possibility is that messages either won't be printed
> > soon (until next printk or console_unlock()) or won't be printed
> > ever at all (in case of sudden system death). I don't think it's
> > a good alternative.
> 
> I see it like a weighing machine. There is a "guaranteed" output on
> one side and a softlockups prevention on the other side. The more
> we prevent the softlockups the less we guarantee the output.

I apply a very simple litmus test. if the answer to the question
"so we leave console_unlock() and there are pending messages,
who and when is going to flush the remaining messages?" is
"something sometime in the future" then it's a no-no.

"something sometime in the future" is equal to "no one".

we must stay and continue printing. because it gives the right
answer - "current process and right now. until someone else
(+printk_kthread) takes over".


> We do not have the same opinion about the balance. My solution
> completely prevents softlockups.

not at all costs. especially if we talk about possibility of losing
messages. if we consider such possibility, then let's just
unconditionally drop logbuf entries every time console_lock() is
looping for too long.

> Your one tries to be more conservative.

Linus wrote:

: If those two things aren't the absolutely primary goals, the whole
: thing is pointless to even discuss. No amount of cool features,
: performance, or theoretical deadlock avoidance matters ONE WHIT
: compared to the two things above.

and

Linus wrote:

: But not having any messages at all, because we were trying so hard to
: abstract things out and put them in buffers so that we couldn't
: deadlock with the IO routines, and the timer or workqueue that was
: supposed to do it is never going to happen any more because of the bug
: that is trying to be printed out?
:
: THAT is bad.


I think our priorities should be quite clear here.


> My main unresolved doubts about this patchset are:
> 
> 1. It gives other processes only very small change to take
>    over the job. They either need to call console_trylock()
>    in very small "race" window or they need to call
>    console_lock(). Where console_lock() only signalizes
>    that the caller is willing to take the job and puts
>    him into sleep.

printk_kthread does console_lock().

we may (and need to) improve the retry path in console_unlock().
but we must not leave it until other process locks the console_sem.

>    Another detailed description of this problem can be found
>    in my previous mail, see
>    https://lkml.kernel.org/r/20170628121925.GN1538@pathway.suse.cz
> 
> 
> 2. It adds rather complex dependency on the scheduler. I know
>    that my simplified solution do this as well but another way[*]
>    Let me explain. I would split the dependency on the code
>    and behavior relation.
> 
>    From the code side: The current printk() calls wake_up_process()
>    and we need to use printk_deferred() in the related scheduler code.
>    This patch does this as well, so there is no win and no lose.
>    Well, you talk about changing the affinity and other tricks
>    in the other mails. This might add more locations where
>    printk_deferred() would be needed.

we are in printk_safe all the way through console_offload_printing(),
the context is derived from console_unlock().

why we would need printk_deferred()?

>    From the behavior side: The current code wakes the process
>    and is done. The code in this patch wakes the process and
>    waits until it[**] gets CPU and really runs. It switches to
>    the emergency mode when the other process does not run in time.
>    By other words, this patch depends on more actions done
>    by the scheduler and changes behavior based on it. IMHO,
>    this will be very hard to code, tune, and debug.
>    A proper solution might require more code dependency.
> 
>    [*] My solution depends on the scheduler in the sense
>        that messages will get lost when nobody else will take
>        over the console job.

which is precisely and exactly the thing that we should never
let to happen. there is no _win_, because we _lost_ the messages.


> 3. The prevention of soft-lockups is questionable. If you are in
>    soft-lockup prone situation, the only prevention is to do an
>    offload. But if you switch to the emergency mode and give
>    up offloading too early, the new code stops preventing
>    the softlockup.
> 
>    Of course, the patchset does not make it worse but the question
>    is how much it really helps. It would be bad to add a lot of
>    code/complexity with almost no gain.
> 
> 
> IMHO, if we try to solve the 1st problem (chance of offloading),
> it might add a risk of deadlocks and/or make the 2nd problem
> (dependency on scheduler) worse. Also I am afraid that we would
> repeat many dead ways already tried by Jan Kara.

what deadlock?


> If you will try to improve 3rd problem and make some guaranties
> of the soft-lockup prevention, it would make the 2nd problem
> (dependency on scheduler) worse. Also the code might be
> very hard to understand and tune.
> 
> 
> This is why I look for a rather simple solution. IMHO, we both
> agree that:
> 
>    +  the offload will be activated only when there is
>       a flood of messages
> 
>    + the only reason to wait for the other handler is to
>      better handle sudden death where panic() is not called.
> 
> IMHO, the only one who brought the problem of sudden death
> was Pavel Machek.

we gave up on printk-async. the last bug report was titled "blah blah
missing backtrace". and I really would rather prefer to see that
backtrace + soft lockup or even hard lockup. still would have been
better than seeing nothing at all.

	-ss

> AFAIK, he works on embedded systems and hardware enablement.
> I guess the combination of the flood of messages and sudden
> death is rare there. Also I doubt that the current code handle
> it well. The flood is badly handed in general. In each case,
> I wonder how long we could survive flushing messages when there
> is sudden death and scheduling does not work.
> 
> One problem here is that some questions/doubts are hard to
> answer/prove without wide testing.
> 
> A compromise might be to start with the simple code
> and disable the offloading by default. I am sure that
> there will be volunteers that would want to play with it,
> e.g. Tetsuo. We would enable it in SUSE as well because
> there should not be any regression against what we have
> used for years now. We could make it always more complex
> according to the feedback and eventually enable it
> by default.
> 
> Best Regards,
> Petr
> 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCHv3 2/5] printk: introduce printing kernel thread
  2017-06-30 12:35                   ` Tetsuo Handa
@ 2017-06-30 12:50                     ` Sergey Senozhatsky
  0 siblings, 0 replies; 51+ messages in thread
From: Sergey Senozhatsky @ 2017-06-30 12:50 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: sergey.senozhatsky.work, pmladek, sergey.senozhatsky, rostedt,
	jack, akpm, peterz, rjw, ebiederm, gregkh, jslaby, pavel, andi,
	linux-kernel

On (06/30/17 21:35), Tetsuo Handa wrote:
> Date: Fri, 30 Jun 2017 21:35:28 +0900
> From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> To: sergey.senozhatsky.work@gmail.com
> Cc: pmladek@suse.com, sergey.senozhatsky@gmail.com, rostedt@goodmis.org,
>  jack@suse.cz, akpm@linux-foundation.org, peterz@infradead.org,
>  rjw@rjwysocki.net, ebiederm@xmission.com, gregkh@linuxfoundation.org,
>  jslaby@suse.com, pavel@ucw.cz, andi@lisas.de, linux-kernel@vger.kernel.org
> Subject: Re: [RFC][PATCHv3 2/5] printk: introduce printing kernel thread
> X-Mailer: Winbiff [Version 2.51 PL2]
> 
> Sergey Senozhatsky wrote:
> > On (06/30/17 19:18), Tetsuo Handa wrote:
> > > > I'm still thinking about Steven's proposals; but we will need offloading
> > > > anyways, so the bits we are talking about here are important regardless
> > > > the direction printk design will take, I think.
> > > 
> > > Is there a chance that printk() waits for only data queued by that printk()
> > > call (exception will be printk() from NMI).
> > 
> > hm, I don't think this can be done easily... consider
> > 
> > 	console_lock();
> > 	printk();
> > 	printk();
> > 	...			-> this guys will wait forever. nothing
> > 				   flushes the logbuf.
> > 	printk();
> > 	console_unlock();
> 
> Can't we remove console_lock()/console_unlock() from printk() ?

we can't... well, we can... and there are some ideas but we are years
away from all it becoming reality. console_sem is the giant and major
lock that fbcon, drm, tty and so on are using. please see a sub-thread
starting from here:

https://lists.linuxfoundation.org/pipermail/ksummit-discuss/2017-June/004389.html

> I think that printk() depends on console_unlock() is complicating.

it absolutely is.

	-ss

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCHv3 2/5] printk: introduce printing kernel thread
  2017-06-30  7:01             ` Sergey Senozhatsky
  2017-06-30 10:18               ` Tetsuo Handa
@ 2017-06-30 13:16               ` Petr Mladek
  2017-06-30 13:38                 ` Sergey Senozhatsky
  2017-06-30 14:10                 ` Sergey Senozhatsky
  1 sibling, 2 replies; 51+ messages in thread
From: Petr Mladek @ 2017-06-30 13:16 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Sergey Senozhatsky, Steven Rostedt, Jan Kara, Andrew Morton,
	Peter Zijlstra, Rafael J . Wysocki, Eric Biederman,
	Greg Kroah-Hartman, Jiri Slaby, Pavel Machek, Andreas Mohr,
	Tetsuo Handa, linux-kernel

On Fri 2017-06-30 16:01:31, Sergey Senozhatsky wrote:
> we are doing our best in order to avoid lockups caused by console_unlock(),
> but the top priority remains messages print out. If we can't guarantee that
> anything will take over and print the messages, we continue printing from
> the current process, even though it may result in lockups.
>
> this is based on my own experience with the previous "wake_up and forget
> about it" async printk patch set (v12) which we rolled out to our fleet
> of developers' boards;

Well, v12 completely failed when there was a sudden death. Also
printk_kthread slept with console_lock() taken. Therefore it was
much less effective during printk() floods.

> responses we received from the community; and
> somehow it also aligned with the recent Linus' reply
> 
>  : If those two things aren't the absolutely primary goals, the whole
>  : thing is pointless to even discuss. No amount of cool features,
>  : performance, or theoretical deadlock avoidance matters ONE WHIT
>  : compared to the two things above.
> 
> // the two things above were -- messages on the screen and dmesg.

Sure, this sounds cool but things are not black and white.

We have the offload patches in SUSE for years because some big
machines did not boot without them. On the other hand, AFAIK,
we have zero bug reports on losing messages caused by a flood
of messages. To be fair, we often look into crash dumps for
these messages.

I remember that you mentioned loosing messages in several threads.
I wonder if it is caused by different configuration, use case,
or extra patches. Anyway, it might suggest that you use the
printk() system (buffers, throughput) on the edge or even beyond
of its capacity.

Anyway, the handshake during offloading might be pretty
problematic. To be honest, I do not have much experience
with it. I have shared some my fears in the other mail[1].
Jan Kara spent a lot of time on this and probably could
say more.

Maybe, we could try to look into the throotling path. Slowing down
massive printk() callers looks necessary when things gets
out of control.

I wonder if I could add some counter into task_struct.
It might be configurable. I am not sure if people would
want this enabled on production systems where the level
of messages should be lower anyway.

[1] https://lkml.kernel.org/r/20170630115457.GE23069@pathway.suse.cz

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCHv3 2/5] printk: introduce printing kernel thread
  2017-06-30 12:42               ` Sergey Senozhatsky
@ 2017-06-30 13:33                 ` Steven Rostedt
  2017-06-30 14:28                   ` Sergey Senozhatsky
  2017-07-04 14:03                 ` Petr Mladek
  1 sibling, 1 reply; 51+ messages in thread
From: Steven Rostedt @ 2017-06-30 13:33 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Petr Mladek, Sergey Senozhatsky, Jan Kara, Andrew Morton,
	Peter Zijlstra, Rafael J . Wysocki, Eric Biederman,
	Greg Kroah-Hartman, Jiri Slaby, Pavel Machek, Andreas Mohr,
	Tetsuo Handa, linux-kernel

On Fri, 30 Jun 2017 21:42:24 +0900
Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> wrote:

> I apply a very simple litmus test. if the answer to the question
> "so we leave console_unlock() and there are pending messages,
> who and when is going to flush the remaining messages?" is
> "something sometime in the future" then it's a no-no.

I totally agree with this, but...

> 
> "something sometime in the future" is equal to "no one".
> 
> we must stay and continue printing. because it gives the right
> answer - "current process and right now. until someone else
> (+printk_kthread) takes over".

Would it be acceptable to have a user knob that allows for it not to
happen? That is, let the user of the kernel decide if they care about
critical prints or not? If a knob says, "only print X, then offload"
would that be allowed. Of course the default would be "only print ALL
OF IT" to keep the current behavior.

A lot of times the console isn't recorded to debug hard lock ups. I
know most desktops running a GUI do not. When ever my workstation locks
up, and it has no serial, I don't get to see the dmesg at all. In this
situation, I don't care if the prints are offloaded or not.

-- Steve

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCHv3 2/5] printk: introduce printing kernel thread
  2017-06-30 13:16               ` Petr Mladek
@ 2017-06-30 13:38                 ` Sergey Senozhatsky
  2017-07-03 11:11                   ` Sergey Senozhatsky
  2017-06-30 14:10                 ` Sergey Senozhatsky
  1 sibling, 1 reply; 51+ messages in thread
From: Sergey Senozhatsky @ 2017-06-30 13:38 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Steven Rostedt, Jan Kara,
	Andrew Morton, Peter Zijlstra, Rafael J . Wysocki,
	Eric Biederman, Greg Kroah-Hartman, Jiri Slaby, Pavel Machek,
	Andreas Mohr, Tetsuo Handa, linux-kernel

On (06/30/17 15:16), Petr Mladek wrote:
> Anyway, the handshake during offloading might be pretty
> problematic. To be honest, I do not have much experience
> with it. I have shared some my fears in the other mail[1].
> Jan Kara spent a lot of time on this and probably could
> say more.
> 
> Maybe, we could try to look into the throotling path. Slowing down
> massive printk() callers looks necessary when things gets
> out of control.

throttling, in some form, is already there. I think.

there is a printk_delay() function. which we can silently activate
when things don't look cool anymore. and printk_delay() is already
getting called on every vprintk_emit() entry. the question is -- how
big should be our delay value, and... when do we need to activate
printk_delay()?

when the distance between console_seq and log_next_seq... suggests
that we will drop (overwrite) un-flushed messages sooner than console_seq
reaches log_next_seq? so log_next_seq is closer to log_first_seq than
console_seq to log_next_seq. or something like this... I'm a bit tired
after a long week; need more time to think about it.

the same printk_delay() *may be* can be used in console_unlock(), to
give potential new console_sem owner more time. just an idea.

> I wonder if I could add some counter into task_struct.
> It might be configurable. I am not sure if people would
> want this enabled on production systems where the level
> of messages should be lower anyway.

	-ss

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCHv3 2/5] printk: introduce printing kernel thread
  2017-06-30 13:16               ` Petr Mladek
  2017-06-30 13:38                 ` Sergey Senozhatsky
@ 2017-06-30 14:10                 ` Sergey Senozhatsky
  1 sibling, 0 replies; 51+ messages in thread
From: Sergey Senozhatsky @ 2017-06-30 14:10 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Steven Rostedt, Jan Kara,
	Andrew Morton, Peter Zijlstra, Rafael J . Wysocki,
	Eric Biederman, Greg Kroah-Hartman, Jiri Slaby, Pavel Machek,
	Andreas Mohr, Tetsuo Handa, linux-kernel

On (06/30/17 15:16), Petr Mladek wrote:
[..]
>
> I wonder if I could add some counter into task_struct.
> It might be configurable. I am not sure if people would
> want this enabled on production systems where the level
> of messages should be lower anyway.

I don't think it really matters which one of the tasks has issued way
too many printk()-s. it's the state of the logbuf that matters after
all. was it one task or 5 tasks, or 15 that pushed logbuf far ahead of
console_seq, we need to throttle all of new printk() calls anyway.
just because logbuf is already too far and we need to give console a
chache to catch up. so let's activate throttling based on the logbuf
state (those seq numbers).

	-ss

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCHv3 2/5] printk: introduce printing kernel thread
  2017-06-30 13:33                 ` Steven Rostedt
@ 2017-06-30 14:28                   ` Sergey Senozhatsky
  2017-06-30 14:45                     ` Steven Rostedt
  0 siblings, 1 reply; 51+ messages in thread
From: Sergey Senozhatsky @ 2017-06-30 14:28 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Sergey Senozhatsky, Petr Mladek, Sergey Senozhatsky, Jan Kara,
	Andrew Morton, Peter Zijlstra, Rafael J . Wysocki,
	Eric Biederman, Greg Kroah-Hartman, Jiri Slaby, Pavel Machek,
	Andreas Mohr, Tetsuo Handa, linux-kernel

On (06/30/17 09:33), Steven Rostedt wrote:
[..]
> > "something sometime in the future" is equal to "no one".
> > 
> > we must stay and continue printing. because it gives the right
> > answer - "current process and right now. until someone else
> > (+printk_kthread) takes over".
> 
> Would it be acceptable to have a user knob that allows for it not to
> happen? That is, let the user of the kernel decide if they care about
> critical prints or not? If a knob says, "only print X, then offload"
> would that be allowed. Of course the default would be "only print ALL
> OF IT" to keep the current behavior.

I'm sorry, I'm not sure I got your question right (I need some rest
probably).

I guess the question was, a knob that would determine what happens after
current wakes up printk_kthread -- does it stay in console_unlock() and
wait for new console_sem owner, printing the messages in the meantime,
or goes all in and expects printk_kthread or anything else to lock
console_sem at some point and flush the remaining messages. is that
correct? we can do this (well, I'm absolutely not in position to say
"we can't do this" :) ). I guess the sort of a problem we have now is
that we can't guarantee that wake_up() will actually wake_up printk_kthread.
but if user requests it, then well... it might be easier to adjust watchdog
timeout value ;) just kidding. or may be I misunderstood your question.

or... another silly idea... shall we start touching the lockup
watchdog per-cpu counters on the CPU that has:

a) woken up printk_kthread
b) but is still in console_unlock() waiting for the new console_sem owner?

...

or we, may be, can add a new sysrq that would flush logbuf messages...
the same way as console_flush_on_panic() does. just in case if wake_up
didn't work.... or is there already a sysrq for that?

	-ss

> A lot of times the console isn't recorded to debug hard lock ups. I
> know most desktops running a GUI do not. When ever my workstation locks
> up, and it has no serial, I don't get to see the dmesg at all. In this
> situation, I don't care if the prints are offloaded or not.
> 
> -- Steve
> 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCHv3 2/5] printk: introduce printing kernel thread
  2017-06-30 14:28                   ` Sergey Senozhatsky
@ 2017-06-30 14:45                     ` Steven Rostedt
  2017-07-01  1:50                       ` Sergey Senozhatsky
  0 siblings, 1 reply; 51+ messages in thread
From: Steven Rostedt @ 2017-06-30 14:45 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Petr Mladek, Sergey Senozhatsky, Jan Kara, Andrew Morton,
	Peter Zijlstra, Rafael J . Wysocki, Eric Biederman,
	Greg Kroah-Hartman, Jiri Slaby, Pavel Machek, Andreas Mohr,
	Tetsuo Handa, linux-kernel

On Fri, 30 Jun 2017 23:28:51 +0900
Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> wrote:

> I guess the question was, a knob that would determine what happens after
> current wakes up printk_kthread -- does it stay in console_unlock() and
> wait for new console_sem owner, printing the messages in the meantime,
> or goes all in and expects printk_kthread or anything else to lock
> console_sem at some point and flush the remaining messages. is that
> correct? we can do this (well, I'm absolutely not in position to say
> "we can't do this" :) ). I guess the sort of a problem we have now is
> that we can't guarantee that wake_up() will actually wake_up printk_kthread.
> but if user requests it, then well... it might be easier to adjust watchdog
> timeout value ;) just kidding. or may be I misunderstood your question.

Actually, what I was thinking of was just setting a hard limit at how
much printk() can write without pushing it off to a helper thread. Kind
of like how softirq works. If there's too much work, it then pushes off
the rest to ksoftirqd.

If we have a kprintd, have a knob that says how much printk can write
under console_lock, and after it hits a limit (default being no limit),
then to simply wake up kprintd to do the rest. This should allow those
that care to limit how much time printk can stop a CPU by, with the
knowledge that if the kernel were to crash, there's a good chance that
the necessary debug info will not be printed.

Now as I have said, the default would be to keep things as is, and let
printk take up as much of the CPU it needs till it gets all the pending
prints out.

I would also like (in my wish list), a mechanism, that when an oops in
progress happens, to set an emergency flag. If something already has
console_lock, then we can let it do what it does today, and have the
critical thread write its data to the logbuf, and let the current
printer hopefully output it (I think that's what we do today). But if
there's no holder of console lock, the critical thread takes control
and dumps the buffer to get everything out. Once that emergency flag is
set, there will be no more offloading. All prints will be in critical
mode, and try to get the data out if possible.

While I'm giving you my Christmas list, I would also like a way to have
an NMI safe lock in printk (like Peter Zijlstra and I have patches for
with early-printk). Which checks if the current CPU owns the spin lock,
and if it doesn't it is safe to take the lock even in NMI mode. But if
the current CPU has the lock, then the NMI handle does the printing
without locking, not caring if it screws up the thread that is currently
printing.

Of course we need to be careful about various consoles that have their
own locks. Maybe make an NMI lock primitive that all consoles use?

This is just wishful thinking, but hopefully we can come up with
something that can solve all issues reasonably.

-- Steve

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCHv3 2/5] printk: introduce printing kernel thread
  2017-06-30 14:45                     ` Steven Rostedt
@ 2017-07-01  1:50                       ` Sergey Senozhatsky
  0 siblings, 0 replies; 51+ messages in thread
From: Sergey Senozhatsky @ 2017-07-01  1:50 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Sergey Senozhatsky, Petr Mladek, Sergey Senozhatsky, Jan Kara,
	Andrew Morton, Peter Zijlstra, Rafael J . Wysocki,
	Eric Biederman, Greg Kroah-Hartman, Jiri Slaby, Pavel Machek,
	Andreas Mohr, Tetsuo Handa, linux-kernel

On (06/30/17 10:45), Steven Rostedt wrote:
> On Fri, 30 Jun 2017 23:28:51 +0900
> Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> wrote:
> 
> > I guess the question was, a knob that would determine what happens after
> > current wakes up printk_kthread -- does it stay in console_unlock() and
> > wait for new console_sem owner, printing the messages in the meantime,
> > or goes all in and expects printk_kthread or anything else to lock
> > console_sem at some point and flush the remaining messages. is that
> > correct? we can do this (well, I'm absolutely not in position to say
> > "we can't do this" :) ). I guess the sort of a problem we have now is
> > that we can't guarantee that wake_up() will actually wake_up printk_kthread.
> > but if user requests it, then well... it might be easier to adjust watchdog
> > timeout value ;) just kidding. or may be I misunderstood your question.
> 
> Actually, what I was thinking of was just setting a hard limit at how
> much printk() can write without pushing it off to a helper thread. Kind
> of like how softirq works. If there's too much work, it then pushes off
> the rest to ksoftirqd.

oh, OK. this is exactly the way it works now. we've got a sysfs knob

 echo 150 > /sys/module/printk/parameters/atomic_print_limit

which sets the limit on the number of lines we print in direct (old
printk mode). as soon as current prints more than atomic_print_limit
lines it wakes up printk_kthread.

the default value for atomic_print_limit is 0, which means no limit.
IOW, no offloading at all, aka the old behaviour.

> If we have a kprintd, have a knob that says how much printk can write
> under console_lock, and after it hits a limit (default being no limit),
> then to simply wake up kprintd to do the rest. This should allow those
> that care to limit how much time printk can stop a CPU by, with the
> knowledge that if the kernel were to crash, there's a good chance that
> the necessary debug info will not be printed.

yes. this is how it's done in the patch set. with some additional tricks.

> Now as I have said, the default would be to keep things as is, and let
> printk take up as much of the CPU it needs till it gets all the pending
> prints out.

yes. this is the way it's done in the patch set.

> I would also like (in my wish list), a mechanism, that when an oops in
> progress happens, to set an emergency flag. If something already has
> console_lock, then we can let it do what it does today, and have the
> critical thread write its data to the logbuf, and let the current
> printer hopefully output it (I think that's what we do today). But if
> there's no holder of console lock, the critical thread takes control
> and dumps the buffer to get everything out. Once that emergency flag is
> set, there will be no more offloading. All prints will be in critical
> mode, and try to get the data out if possible.

yes. this is what we are trying to do in the patch set.

when we decide if we need to offload printing, we call this guy first

static inline bool printk_offloading_enabled(void)
{
	return atomic_print_limit &&
		printk_enforce_emergency == 0 &&
		printk_kthread &&
		atomic_read(&printk_emergency) == 0;
}

which checks printk internal emergency flags, etc.. for example,
printk_enforce_emergency is set automatically when we printk()
LOGLEVEL_EMERG message the first time.


then we check the system wide flags

	if (system_state != SYSTEM_RUNNING || oops_in_progress)
		return false;


if all checks pass (no oops, no emergency, nothing) then we continue
with the offloading thing. increment the counter which tracks the number
of lines printed already printed and offload printing if that counter
goes above user defined atomic_print_limit threshold.

otherwise, we just stay and continue printing, because we are in oops or
emergency, or whatever.

> While I'm giving you my Christmas list, I would also like a way to have
> an NMI safe lock in printk (like Peter Zijlstra and I have patches for
> with early-printk). Which checks if the current CPU owns the spin lock,
> and if it doesn't it is safe to take the lock even in NMI mode. But if
> the current CPU has the lock, then the NMI handle does the printing
> without locking, not caring if it screws up the thread that is currently
> printing.
> 
> Of course we need to be careful about various consoles that have their
> own locks. Maybe make an NMI lock primitive that all consoles use?
> 
> This is just wishful thinking, but hopefully we can come up with
> something that can solve all issues reasonably.

ok, the NMI thing is not in the current patch set. we can do it a bit
later. the early-printk patch set is also interesting. and I hope that
Petr will take a look on it once we settle down the offloading.

	-ss

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCHv3 2/5] printk: introduce printing kernel thread
  2017-07-03 11:11                   ` Sergey Senozhatsky
@ 2017-07-02 20:27                     ` Pavel Machek
  2017-07-05 16:36                       ` Sergey Senozhatsky
  2017-07-03 19:34                     ` Steven Rostedt
  1 sibling, 1 reply; 51+ messages in thread
From: Pavel Machek @ 2017-07-02 20:27 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Petr Mladek, Steven Rostedt, Sergey Senozhatsky, Jan Kara,
	Andrew Morton, Peter Zijlstra, Rafael J . Wysocki,
	Eric Biederman, Greg Kroah-Hartman, Jiri Slaby, Andreas Mohr,
	Tetsuo Handa, linux-kernel

On Mon 2017-07-03 20:11:30, Sergey Senozhatsky wrote:
> On (06/30/17 22:38), Sergey Senozhatsky wrote:
> > On (06/30/17 15:16), Petr Mladek wrote:
> > > Anyway, the handshake during offloading might be pretty
> > > problematic. To be honest, I do not have much experience
> > > with it. I have shared some my fears in the other mail[1].
> > > Jan Kara spent a lot of time on this and probably could
> > > say more.
> > > 
> > > Maybe, we could try to look into the throotling path. Slowing down
> > > massive printk() callers looks necessary when things gets
> > > out of control.
> > 
> > throttling, in some form, is already there. I think.
> > 
> > there is a printk_delay() function. which we can silently activate
> > when things don't look cool anymore. and printk_delay() is already
> > getting called on every vprintk_emit() entry. the question is -- how
> > big should be our delay value, and... when do we need to activate
> > printk_delay()?
> > 
> > when the distance between console_seq and log_next_seq... suggests
> > that we will drop (overwrite) un-flushed messages sooner than console_seq
> > reaches log_next_seq? so log_next_seq is closer to log_first_seq than
> > console_seq to log_next_seq.
> 
> something like below, may be. a sketch, just to demonstrate the
> idea. but, once polished, can go to printk out of series.
> 
> ===8<===8<===8<===
> 
> Throttle printk() callers when we detect that consoles are
> far behind the logbuf: we printed to the consoles 4 times
> less messages than we still have to print.
> 
> Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
> ---
>  kernel/printk/printk.c | 38 ++++++++++++++++++++++++++++++++------
>  1 file changed, 32 insertions(+), 6 deletions(-)
> 
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index f24d3789faa0..fd546bd95207 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -1758,17 +1758,43 @@ static void call_console_drivers(const char *ext_text, size_t ext_len,
>  	}
>  }
>  
> +#define PRINTK_FLOOD_DEFAULT_DELAY	10
> +
>  int printk_delay_msec __read_mostly;

= 10;

> +	if (console_seen < 4 * console_to_see) {
> +		if (printk_delay_msec)
> +			__printk_delay(printk_delay_msec);
> +		else
> +			__printk_delay(PRINTK_FLOOD_DEFAULT_DELAY);

And get rid of if () here? We want to enable delays of 0...
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCHv3 2/5] printk: introduce printing kernel thread
  2017-06-30 13:38                 ` Sergey Senozhatsky
@ 2017-07-03 11:11                   ` Sergey Senozhatsky
  2017-07-02 20:27                     ` Pavel Machek
  2017-07-03 19:34                     ` Steven Rostedt
  0 siblings, 2 replies; 51+ messages in thread
From: Sergey Senozhatsky @ 2017-07-03 11:11 UTC (permalink / raw)
  To: Petr Mladek, Steven Rostedt
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Jan Kara, Andrew Morton,
	Peter Zijlstra, Rafael J . Wysocki, Eric Biederman,
	Greg Kroah-Hartman, Jiri Slaby, Pavel Machek, Andreas Mohr,
	Tetsuo Handa, linux-kernel

On (06/30/17 22:38), Sergey Senozhatsky wrote:
> On (06/30/17 15:16), Petr Mladek wrote:
> > Anyway, the handshake during offloading might be pretty
> > problematic. To be honest, I do not have much experience
> > with it. I have shared some my fears in the other mail[1].
> > Jan Kara spent a lot of time on this and probably could
> > say more.
> > 
> > Maybe, we could try to look into the throotling path. Slowing down
> > massive printk() callers looks necessary when things gets
> > out of control.
> 
> throttling, in some form, is already there. I think.
> 
> there is a printk_delay() function. which we can silently activate
> when things don't look cool anymore. and printk_delay() is already
> getting called on every vprintk_emit() entry. the question is -- how
> big should be our delay value, and... when do we need to activate
> printk_delay()?
> 
> when the distance between console_seq and log_next_seq... suggests
> that we will drop (overwrite) un-flushed messages sooner than console_seq
> reaches log_next_seq? so log_next_seq is closer to log_first_seq than
> console_seq to log_next_seq.

something like below, may be. a sketch, just to demonstrate the
idea. but, once polished, can go to printk out of series.

===8<===8<===8<===

Throttle printk() callers when we detect that consoles are
far behind the logbuf: we printed to the consoles 4 times
less messages than we still have to print.

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
---
 kernel/printk/printk.c | 38 ++++++++++++++++++++++++++++++++------
 1 file changed, 32 insertions(+), 6 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index f24d3789faa0..fd546bd95207 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1758,17 +1758,43 @@ static void call_console_drivers(const char *ext_text, size_t ext_len,
 	}
 }
 
+#define PRINTK_FLOOD_DEFAULT_DELAY	10
+
 int printk_delay_msec __read_mostly;
 
+static inline void __printk_delay(int m)
+{
+	while (m--) {
+		mdelay(1);
+		touch_nmi_watchdog();
+	}
+}
+
 static inline void printk_delay(void)
 {
-	if (unlikely(printk_delay_msec)) {
-		int m = printk_delay_msec;
+	unsigned long flags;
+	u64 console_seen = 0, console_to_see;
 
-		while (m--) {
-			mdelay(1);
-			touch_nmi_watchdog();
-		}
+	if (printk_delay_msec) {
+		__printk_delay(printk_delay_msec);
+		return;
+	}
+
+	/*
+	 * Check if consoles are far behind the loguf head and
+	 * throttle printk() callers if so.
+	 */
+	logbuf_lock_irqsave(flags);
+	if (console_seq > log_first_seq)
+		console_seen = console_seq - log_first_seq;
+	console_to_see = log_next_seq - console_seq;
+	logbuf_unlock_irqrestore(flags);
+
+	if (console_seen < 4 * console_to_see) {
+		if (printk_delay_msec)
+			__printk_delay(printk_delay_msec);
+		else
+			__printk_delay(PRINTK_FLOOD_DEFAULT_DELAY);
 	}
 }
 
-- 
2.13.2

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCHv3 2/5] printk: introduce printing kernel thread
  2017-07-03 11:11                   ` Sergey Senozhatsky
  2017-07-02 20:27                     ` Pavel Machek
@ 2017-07-03 19:34                     ` Steven Rostedt
  2017-07-04  5:26                       ` Sergey Senozhatsky
  1 sibling, 1 reply; 51+ messages in thread
From: Steven Rostedt @ 2017-07-03 19:34 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Petr Mladek, Sergey Senozhatsky, Jan Kara, Andrew Morton,
	Peter Zijlstra, Rafael J . Wysocki, Eric Biederman,
	Greg Kroah-Hartman, Jiri Slaby, Pavel Machek, Andreas Mohr,
	Tetsuo Handa, linux-kernel

On Mon, 3 Jul 2017 20:11:30 +0900
Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> wrote:

> +#define PRINTK_FLOOD_DEFAULT_DELAY	10
> +
>  int printk_delay_msec __read_mostly;
>  
> +static inline void __printk_delay(int m)
> +{
> +	while (m--) {
> +		mdelay(1);
> +		touch_nmi_watchdog();
> +	}
> +}
> +
>  static inline void printk_delay(void)
>  {
> -	if (unlikely(printk_delay_msec)) {
> -		int m = printk_delay_msec;
> +	unsigned long flags;
> +	u64 console_seen = 0, console_to_see;
>  
> -		while (m--) {
> -			mdelay(1);
> -			touch_nmi_watchdog();
> -		}
> +	if (printk_delay_msec) {
> +		__printk_delay(printk_delay_msec);
> +		return;
> +	}
> +

This had better be an option, and not default. And what happens if the
printk caller happens to preempt the one doing the writes to consoles?

-- Steve

> +	/*
> +	 * Check if consoles are far behind the loguf head and
> +	 * throttle printk() callers if so.
> +	 */
> +	logbuf_lock_irqsave(flags);
> +	if (console_seq > log_first_seq)
> +		console_seen = console_seq - log_first_seq;
> +	console_to_see = log_next_seq - console_seq;
> +	logbuf_unlock_irqrestore(flags);
> +
> +	if (console_seen < 4 * console_to_see) {
> +		if (printk_delay_msec)
> +			__printk_delay(printk_delay_msec);
> +		else
> +			__printk_delay(PRINTK_FLOOD_DEFAULT_DELAY);
>  	}
>  }
>  

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCHv3 2/5] printk: introduce printing kernel thread
  2017-07-03 19:34                     ` Steven Rostedt
@ 2017-07-04  5:26                       ` Sergey Senozhatsky
  2017-07-04  6:53                         ` Sergey Senozhatsky
  0 siblings, 1 reply; 51+ messages in thread
From: Sergey Senozhatsky @ 2017-07-04  5:26 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Sergey Senozhatsky, Petr Mladek, Sergey Senozhatsky, Jan Kara,
	Andrew Morton, Peter Zijlstra, Rafael J . Wysocki,
	Eric Biederman, Greg Kroah-Hartman, Jiri Slaby, Pavel Machek,
	Andreas Mohr, Tetsuo Handa, linux-kernel

On (07/03/17 15:34), Steven Rostedt wrote:
> > +#define PRINTK_FLOOD_DEFAULT_DELAY	10
> > +
> >  int printk_delay_msec __read_mostly;
> >  
> > +static inline void __printk_delay(int m)
> > +{
> > +	while (m--) {
> > +		mdelay(1);
> > +		touch_nmi_watchdog();
> > +	}
> > +}
> > +
> >  static inline void printk_delay(void)
> >  {
> > -	if (unlikely(printk_delay_msec)) {
> > -		int m = printk_delay_msec;
> > +	unsigned long flags;
> > +	u64 console_seen = 0, console_to_see;
> >  
> > -		while (m--) {
> > -			mdelay(1);
> > -			touch_nmi_watchdog();
> > -		}
> > +	if (printk_delay_msec) {
> > +		__printk_delay(printk_delay_msec);
> > +		return;
> > +	}
> > +
> 
> This had better be an option, and not default.

yes.

> And what happens if the printk caller happens to preempt the one
> doing the writes to consoles?

in short - we just burn CPU cycles. that case is broken.

that's mostly the reason behind PRINTK_FLOOD_DEFAULT_DELAY being quite
small.

one can simply do

	console_lock();
	printk();
	printk();
	....
	printk();
	console_unlock();

and trigger a useless throttling. a needed one in general case,
but useless in the given circumstances.

not sure if we can properly throttle printk in all of the cases.
we know that console_sem is locked, but we don't know what for.
is CPU that owns the console_sem is now in console_unlock() or
somewhere in fbcon, or anywhere else. we probably need not to
throttle printk() if we know that console_sem is already locked
by this_cpu and we simply call printk either from IRQ that
preempted console_unlock() on this_cpu or recursive printk from
console_unlock()... and so on.

	-ss

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCHv3 2/5] printk: introduce printing kernel thread
  2017-07-04  5:26                       ` Sergey Senozhatsky
@ 2017-07-04  6:53                         ` Sergey Senozhatsky
  0 siblings, 0 replies; 51+ messages in thread
From: Sergey Senozhatsky @ 2017-07-04  6:53 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Steven Rostedt, Petr Mladek, Sergey Senozhatsky, Jan Kara,
	Andrew Morton, Peter Zijlstra, Rafael J . Wysocki,
	Eric Biederman, Greg Kroah-Hartman, Jiri Slaby, Pavel Machek,
	Andreas Mohr, Tetsuo Handa, linux-kernel

On (07/04/17 14:26), Sergey Senozhatsky wrote:
[..]
> not sure if we can properly throttle printk in all of the cases.
> we know that console_sem is locked, but we don't know what for.
> is CPU that owns the console_sem is now in console_unlock() or
> somewhere in fbcon, or anywhere else. we probably need not to
> throttle printk() if we know that console_sem is already locked
> by this_cpu and we simply call printk either from IRQ that
> preempted console_unlock() on this_cpu or recursive printk from
> console_unlock()... and so on.

which is hard to do, given that console_unlock() can schedule with
console_sem locked. so CPU number won't do the trick. unless we will
forbid preemption in console_unlock()... we sort of need to do it.

	-ss

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCHv3 2/5] printk: introduce printing kernel thread
  2017-06-30 12:42               ` Sergey Senozhatsky
  2017-06-30 13:33                 ` Steven Rostedt
@ 2017-07-04 14:03                 ` Petr Mladek
  2017-07-05  5:02                   ` Sergey Senozhatsky
  1 sibling, 1 reply; 51+ messages in thread
From: Petr Mladek @ 2017-07-04 14:03 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Sergey Senozhatsky, Steven Rostedt, Jan Kara, Andrew Morton,
	Peter Zijlstra, Rafael J . Wysocki, Eric Biederman,
	Greg Kroah-Hartman, Jiri Slaby, Pavel Machek, Andreas Mohr,
	Tetsuo Handa, linux-kernel

On Fri 2017-06-30 21:42:24, Sergey Senozhatsky wrote:
> Hello,
> 
> On (06/30/17 13:54), Petr Mladek wrote:
> > > but.....
> > > the opposite possibility is that messages either won't be printed
> > > soon (until next printk or console_unlock()) or won't be printed
> > > ever at all (in case of sudden system death). I don't think it's
> > > a good alternative.
> > 
> > I see it like a weighing machine. There is a "guaranteed" output on
> > one side and a softlockups prevention on the other side. The more
> > we prevent the softlockups the less we guarantee the output.
> 
> I apply a very simple litmus test. if the answer to the question
> "so we leave console_unlock() and there are pending messages,
> who and when is going to flush the remaining messages?" is
> "something sometime in the future" then it's a no-no.
>
> "something sometime in the future" is equal to "no one".

I am afraid that it is hard or even impossible to have offloading
and be sure that the other side continues. IMHO, there always
will be a possibility that something could go wrong.

I think that resisting on keeping the current guarantees might
force use to keep the current code (status quo).

Please, let me better explain the problems that I see with
the current code.


> > My main unresolved doubts about this patchset are:
> > 
> > 1. It gives other processes only very small change to take
> >    over the job. They either need to call console_trylock()
> >    in very small "race" window or they need to call
> >    console_lock(). Where console_lock() only signalizes
> >    that the caller is willing to take the job and puts
> >    him into sleep.
> 
> printk_kthread does console_lock().

But this is not enough. In fact, it makes things worse. Let me
explain:

We are talking about flood of messages when we go over
atomic_print_limit and activate offloading. Let's look
at it in more details:

CPU0				CPU1


console_unlock()

  // handle atomic_print_limit lines
  // enable offloading

  wake_up_process(printk_kthread)

  // handly many other lines

				// printk_kthread finally gets CPU
				// waken in printk_kthread_func()

				console_lock()
				  down()
				    // lock most likely taken by CPU0
				    list_add_tail(&waiter.list)
				    schedule()

   // handle one more line
   up_console_sem()
     list_del(&waiter.list)
     wake_up_process(waiter->task)

   console_trylock()
   // fails because the lock is
   // taken by the waiter

Regression 1: Nobody flushes the messages to the console until
	the printk_kthread() is scheduled again, really takes
	the console_lock(), and calls console_unlock().

	The original code flushed the messages
	all the time until all were flushed. (Except
	if it was allowed to sleep in console_unlock).


Regression 2: printk_kthread() takes console_lock() in
	   preemtible context => is allowed to sleep
	   and nobody is able to get console_lock
	   in the meantime => slowdown and risk of
	   lost messages.


In addition, the printk kthread need not be scheduled
at all when:

  + It is assigned to the same CPU and the current
    console_lock() owner.

  + It is assigned to a CPU where some higher priority
    task got mad.


Finally, it might be scheduled too late when we already
handled too many lines over the atomic limit and switched
to the emergency mode.

How big should be atomic limit to give the kthread chance
to get scheduled in time? It might be pretty complicated
mathematics. It will depend on the hardware, system
load, used scheduler, name spaces limitations, ...


> we may (and need to) improve the retry path in console_unlock().
> but we must not leave it until other process locks the console_sem.

Several possibilities came to my mind but they all have problems.
Let me explain.


1. Make printk_kthread() special and avoid sleeping with
   console_lock() taken => force offload when need_resched
   gets true. But this still keeps regression 1.

   Also it means that we are offloading without guarantees.
   But the same is true for sleeping with console_lock.


2. Increase the chance that other printk() callers will take
   over console_lock():

     + Add delay before console_trylock() in console_unlock().
       But this is very ugly. Waste of resources => no way


     + console_unlock() might signalize offloading and
       printk() caller might call console_lock() instead
       of console_trylock().

       It must be only one printk() caller (first one?). It does
       not make sense to block all CPUs and the entire system.

       Anyway, console_lock() might sleep and printk() does not know
       the context => no way

       The only chance is that we combine this with throttling
       tasks that do too many printk() calls. But it is still
       ugly and hard to predict.


     + Replace console_lock() with spin lock.
       But this would require complete rework/clean up
       => task for years

Did I miss a nice trick?


> >    Another detailed description of this problem can be found
> >    in my previous mail, see
> >    https://lkml.kernel.org/r/20170628121925.GN1538@pathway.suse.cz
> > 
> > 
> > 2. It adds rather complex dependency on the scheduler. I know
> >    that my simplified solution do this as well but another way[*]
> >    Let me explain. I would split the dependency on the code
> >    and behavior relation.
> > 
> >    From the code side: The current printk() calls wake_up_process()
> >    and we need to use printk_deferred() in the related scheduler code.
> >    This patch does this as well, so there is no win and no lose.
> >    Well, you talk about changing the affinity and other tricks
> >    in the other mails. This might add more locations where
> >    printk_deferred() would be needed.
> 
> we are in printk_safe all the way through console_offload_printing(),
> the context is derived from console_unlock().
> 
> why we would need printk_deferred()?

I think that the reason is the same why we still need
printk_deferred() now.

printk_safe() helps only when the recursion starts inside
printk(). It fails when printk() is called from a function
that is used inside printk().



> >    From the behavior side: The current code wakes the process
> >    and is done. The code in this patch wakes the process and
> >    waits until it[**] gets CPU and really runs. It switches to
> >    the emergency mode when the other process does not run in time.
> >    By other words, this patch depends on more actions done
> >    by the scheduler and changes behavior based on it. IMHO,
> >    this will be very hard to code, tune, and debug.
> >    A proper solution might require more code dependency.
> > 
> >    [*] My solution depends on the scheduler in the sense
> >        that messages will get lost when nobody else will take
> >        over the console job.
> 
> which is precisely and exactly the thing that we should never
> let to happen. there is no _win_, because we _lost_ the messages.

I do not see a way how to avoid this at the moment. Therefore I
suggest changes that might reduce the possible damage as much
as possible:

   + reduce sleeping with console_lock() => this will actually
     increase the change to flush the messages in panic()


   + use high enough atomic_print_limit so that we prevent
     blocking in atomic context for seconds but that we trigger
     offloading only in very rare situations

     Maybe we could use a number in secons for the limit
     instead of number of lines.

     Note that I still consider flood of messages as a broken
     system. If anyone gets into the situation and lose messages,
     he should fix/reduce the flood first. The atomic limits
     should flush enought messages to do so


   + allow to throttle printk() caller that cause the flood;
     this looks like a win-win solution; it handles the root
     of the problem

   + eventually write on the console that offloading
     has happended, so that we could see when it failed

   + add the early printk stuff from Peter Zijstra so that
     developers have another chance to debug these problems

     This is needed already now when printk() fails.


Does this really sound that bad?

Of course, there might be situations where printk() might behave
worse. But we still should allow to disable offloading completely
for this cases.

Well, I admit that it is hard to be sure. printk() is used
in too many situations. But we either want to do something
or we could just keep the status quo.


> 
> > 3. The prevention of soft-lockups is questionable. If you are in
> >    soft-lockup prone situation, the only prevention is to do an
> >    offload. But if you switch to the emergency mode and give
> >    up offloading too early, the new code stops preventing
> >    the softlockup.
> > 
> >    Of course, the patchset does not make it worse but the question
> >    is how much it really helps. It would be bad to add a lot of
> >    code/complexity with almost no gain.
> > 
> > 
> > IMHO, if we try to solve the 1st problem (chance of offloading),
> > it might add a risk of deadlocks and/or make the 2nd problem
> > (dependency on scheduler) worse. Also I am afraid that we would
> > repeat many dead ways already tried by Jan Kara.
> 
> what deadlock?

The real risk semms to be a missing printk_deferred()
needed in a newly used code.

I had also some wrongly done hand-shake of console_lock that
cause several deadlocks. For example, we might wait for someone in
console_unlock() who will never be scheduled. Or we will
start waiting in printk() for console_lock() but there
might be already another waiters in the list. There might
be some dependecy between them. OK, this is rather
theoreticall.

Anyway, using more code in printk() might cause recursion
and deadlocks. So, we need to be careful.

Uff, all this is complicated. I wish I managed to explain
this more easily.

Best Regards,
Petr

PS: I am going to be off until the next Monday. It might take
some time until I process all the other mails. I try hard to
sort the ideas and keep some clear view.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCHv3 2/5] printk: introduce printing kernel thread
  2017-07-04 14:03                 ` Petr Mladek
@ 2017-07-05  5:02                   ` Sergey Senozhatsky
  0 siblings, 0 replies; 51+ messages in thread
From: Sergey Senozhatsky @ 2017-07-05  5:02 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Steven Rostedt, Jan Kara,
	Andrew Morton, Peter Zijlstra, Rafael J . Wysocki,
	Eric Biederman, Greg Kroah-Hartman, Jiri Slaby, Pavel Machek,
	Andreas Mohr, Tetsuo Handa, linux-kernel

Hello,

Petr, thanks for commenting/taking a look

On (07/04/17 16:03), Petr Mladek wrote:
> On Fri 2017-06-30 21:42:24, Sergey Senozhatsky wrote:
> > Hello,
> > 
> > On (06/30/17 13:54), Petr Mladek wrote:
> > > > but.....
> > > > the opposite possibility is that messages either won't be printed
> > > > soon (until next printk or console_unlock()) or won't be printed
> > > > ever at all (in case of sudden system death). I don't think it's
> > > > a good alternative.
> > > 
> > > I see it like a weighing machine. There is a "guaranteed" output on
> > > one side and a softlockups prevention on the other side. The more
> > > we prevent the softlockups the less we guarantee the output.
> > 
> > I apply a very simple litmus test. if the answer to the question
> > "so we leave console_unlock() and there are pending messages,
> > who and when is going to flush the remaining messages?" is
> > "something sometime in the future" then it's a no-no.
> >
> > "something sometime in the future" is equal to "no one".
> 
> I am afraid that it is hard or even impossible to have offloading
> and be sure that the other side continues. IMHO, there always
> will be a possibility that something could go wrong.
> 
> I think that resisting on keeping the current guarantees might
> force use to keep the current code (status quo).
> 
> Please, let me better explain the problems that I see with
> the current code.
> 
> 
> > > My main unresolved doubts about this patchset are:
> > > 
> > > 1. It gives other processes only very small change to take
> > >    over the job. They either need to call console_trylock()
> > >    in very small "race" window or they need to call
> > >    console_lock(). Where console_lock() only signalizes
> > >    that the caller is willing to take the job and puts
> > >    him into sleep.
> > 
> > printk_kthread does console_lock().
> 
> But this is not enough. In fact, it makes things worse. Let me
> explain:
> 
> We are talking about flood of messages when we go over
> atomic_print_limit and activate offloading. Let's look
> at it in more details:
> 
> CPU0				CPU1
> 
> 
> console_unlock()
> 
>   // handle atomic_print_limit lines
>   // enable offloading
> 
>   wake_up_process(printk_kthread)
> 
>   // handly many other lines
> 
> 				// printk_kthread finally gets CPU
> 				// waken in printk_kthread_func()
> 
> 				console_lock()
> 				  down()
> 				    // lock most likely taken by CPU0
> 				    list_add_tail(&waiter.list)
> 				    schedule()
> 
>    // handle one more line
>    up_console_sem()
>      list_del(&waiter.list)
>      wake_up_process(waiter->task)
> 
>    console_trylock()
>    // fails because the lock is
>    // taken by the waiter
> 
> Regression 1: Nobody flushes the messages to the console until
> 	the printk_kthread() is scheduled again, really takes
> 	the console_lock(), and calls console_unlock().

yeah, ok. I'm not sure I can fully agree that this is a regression.

this is how the code is meant to work, isn't it?
some other CPU could have slept on console_sem from e.g. tty_open()
by the time CPU0 attempts to re-take the lock, forcing CPU0
  a) to fail to console_trylock()
and
  b) delay flush of the pending messages.

so, sorry, I'm sure I see why this is a regression. the behaviour
is pretty much the same. what am I missing?

... yet, after offloading we leave console_unlock() with the knowledge
that there is a process in console_sem waiting for that lock. which
sounds like something, but at the same... that's pretty much it.
IOW, "so what?". we just know that there is a process in the waiter
list. if we can't be sure that wake_up_process() from up_console_sem()
succeeds, then that knowledge doesn't buy us anything at all. real
guarantees (console_sem owner is running on some CPU are a bit harder
to achieve, given that console_unlock() can schedule).

so I think I'll drop that "wait for a new console_sem owner" thing
after all. not because we still can't agree on it, but because may
be it's not worth it. sorry, it took me so long.


> 	The original code flushed the messages
> 	all the time until all were flushed. (Except
> 	if it was allowed to sleep in console_unlock).

but any console_lock()/console_trylock() call from any other CPU
(fbcon, tty, etc.) prevents complete flush of the pending messages
until that other CPU invokes console_unlock(). we don't introduce
anything new here.

> Regression 2: printk_kthread() takes console_lock() in
> 	   preemtible context => is allowed to sleep
> 	   and nobody is able to get console_lock
> 	   in the meantime => slowdown and risk of
> 	   lost messages.

do you mean scheduling under console_sem? but this is true for
console_unlock() in general. we already schedule with console_sem,
quite a lot. I can't see how printk_kthread makes it any worse.

> In addition, the printk kthread need not be scheduled
> at all when:
> 
>   + It is assigned to the same CPU and the current
>     console_lock() owner.
> 
>   + It is assigned to a CPU where some higher priority
>     task got mad.

need to think more.

but we probably can forbid preemption in console_unlock() if atomic
limit is not 0. it kinda makes sense to me.

> Finally, it might be scheduled too late when we already
> handled too many lines over the atomic limit and switched
> to the emergency mode.

the possibility of lockup still exists, yes. I never claimed
the otherwise, and that is the part I'm not happy with.

in general,

offloading based on atomic limit watermark does not magically
address lockups. the limit simply can be too high. so *may be*
we need to use `watchdog_thresh' based check at the end of the
day and drop the atomic limit sysfs knob.

	console_unlock()
	{
		for (;;) {
			if (spent_in_unlock > watchdog_thresh / 2)
				offload-and-return;
		}
	}

which, once again, won't fully address every single case possible (can
we even do it?). the problem here is that we only know how much time we
spent in console_unlock(), but we have no idea how much time the CPU had
been running with preemption or local IRQs disabled before it got into
console_unlock().

it's the watchdog who knows that information - touch_ts, etc. so may
be we need to make printk more "watchdog aware" and make offloading
decisions based using is_softlockup()/etc. which may look like
unnecessary dependency/coupling, but this dependency/coupling is
already there, in fact - lockups that we are trying to avoid are
coming from watchdog. so watchdog may hint/help printk, not just
report the lockup and kill the system.

IOW, watchdog timeout seems to be the only reliable number to
look at when we want to know if lockup is about to happen.

need to think more

> How big should be atomic limit to give the kthread chance
> to get scheduled in time? It might be pretty complicated
> mathematics. It will depend on the hardware, system
> load, used scheduler, name spaces limitations, ...

yep yep, that's why there is a sysfs knob. there is no "the right number" 
for all :)

> > we may (and need to) improve the retry path in console_unlock().
> > but we must not leave it until other process locks the console_sem.
> 
> Several possibilities came to my mind but they all have problems.
> Let me explain.
> 
> 
> 1. Make printk_kthread() special and avoid sleeping with
>    console_lock() taken => force offload when need_resched
>    gets true. But this still keeps regression 1.
> 
>    Also it means that we are offloading without guarantees.
>    But the same is true for sleeping with console_lock.
> 
> 
> 2. Increase the chance that other printk() callers will take
>    over console_lock():
> 
>      + Add delay before console_trylock() in console_unlock().
>        But this is very ugly. Waste of resources => no way

yeah, I thought about it.

>      + console_unlock() might signalize offloading and
>        printk() caller might call console_lock() instead
>        of console_trylock().
> 
>        It must be only one printk() caller (first one?). It does
>        not make sense to block all CPUs and the entire system.
> 
>        Anyway, console_lock() might sleep and printk() does not know
>        the context => no way
> 
>        The only chance is that we combine this with throttling
>        tasks that do too many printk() calls. But it is still
>        ugly and hard to predict.
> 
> 
>      + Replace console_lock() with spin lock.
>        But this would require complete rework/clean up
>        => task for years

yep.

> Did I miss a nice trick?
> 

I need more time to answer :)


> > >    Another detailed description of this problem can be found
> > >    in my previous mail, see
> > >    https://lkml.kernel.org/r/20170628121925.GN1538@pathway.suse.cz
> > > 
> > > 
> > > 2. It adds rather complex dependency on the scheduler. I know
> > >    that my simplified solution do this as well but another way[*]
> > >    Let me explain. I would split the dependency on the code
> > >    and behavior relation.
> > > 
> > >    From the code side: The current printk() calls wake_up_process()
> > >    and we need to use printk_deferred() in the related scheduler code.
> > >    This patch does this as well, so there is no win and no lose.
> > >    Well, you talk about changing the affinity and other tricks
> > >    in the other mails. This might add more locations where
> > >    printk_deferred() would be needed.
> > 
> > we are in printk_safe all the way through console_offload_printing(),
> > the context is derived from console_unlock().
> > 
> > why we would need printk_deferred()?
> 
> I think that the reason is the same why we still need
> printk_deferred() now.
> 
> printk_safe() helps only when the recursion starts inside
> printk(). It fails when printk() is called from a function
> that is used inside printk().

what I meant is that in console_unlock()/console_offload_printing() we
are in printk_safe. any printk from here will not end up in vprintk_emit().

> > >    From the behavior side: The current code wakes the process
> > >    and is done. The code in this patch wakes the process and
> > >    waits until it[**] gets CPU and really runs. It switches to
> > >    the emergency mode when the other process does not run in time.
> > >    By other words, this patch depends on more actions done
> > >    by the scheduler and changes behavior based on it. IMHO,
> > >    this will be very hard to code, tune, and debug.
> > >    A proper solution might require more code dependency.
> > > 
> > >    [*] My solution depends on the scheduler in the sense
> > >        that messages will get lost when nobody else will take
> > >        over the console job.
> > 
> > which is precisely and exactly the thing that we should never
> > let to happen. there is no _win_, because we _lost_ the messages.
> 
> I do not see a way how to avoid this at the moment. Therefore I
> suggest changes that might reduce the possible damage as much
> as possible:
> 
>    + reduce sleeping with console_lock() => this will actually
>      increase the change to flush the messages in panic()
> 
> 
>    + use high enough atomic_print_limit so that we prevent
>      blocking in atomic context for seconds but that we trigger
>      offloading only in very rare situations
> 
>      Maybe we could use a number in secons for the limit
>      instead of number of lines.

oh, nice. I talked about it a bit earlier in this mail.

>      Note that I still consider flood of messages as a broken
>      system. If anyone gets into the situation and lose messages,
>      he should fix/reduce the flood first. The atomic limits
>      should flush enought messages to do so

good that you mentioned that. I was going to ask if we can leave
the "flood" case behind. I agree that this case is broken and printk
can't help with it. if messages can be ratelimited, then they better
be ratelimited. it's when we can't ratelimit and begin to lockup the
system that we are trying to fix. for example, a massive
show_state_filter() over a slow serial console.


>    + allow to throttle printk() caller that cause the flood;
>      this looks like a win-win solution; it handles the root
>      of the problem

it's tricky. I would like to throttle, but it's tricky. need to
think more how we can make it more or less reliable.

	local_irq_disable();
	printk();
	...
	printk();		<< throttle
	local_irq_restore();	<< lockup


calling log_store() faster than other CPU can call_console_drivers()
is bad, but hard lockup is even worse.

OK, I'll re-shape and re-work the patch set.
will push it to my tree somewhere along the week.


>    + eventually write on the console that offloading
>      has happended, so that we could see when it failed
> 
>    + add the early printk stuff from Peter Zijstra so that
>      developers have another chance to debug these problems
> 
>      This is needed already now when printk() fails.
> 
> 
> Does this really sound that bad?
> 
> Of course, there might be situations where printk() might behave
> worse. But we still should allow to disable offloading completely
> for this cases.
> 
> Well, I admit that it is hard to be sure. printk() is used
> in too many situations. But we either want to do something
> or we could just keep the status quo.
> 
> 
> > 
> > > 3. The prevention of soft-lockups is questionable. If you are in
> > >    soft-lockup prone situation, the only prevention is to do an
> > >    offload. But if you switch to the emergency mode and give
> > >    up offloading too early, the new code stops preventing
> > >    the softlockup.
> > > 
> > >    Of course, the patchset does not make it worse but the question
> > >    is how much it really helps. It would be bad to add a lot of
> > >    code/complexity with almost no gain.
> > > 
> > > 
> > > IMHO, if we try to solve the 1st problem (chance of offloading),
> > > it might add a risk of deadlocks and/or make the 2nd problem
> > > (dependency on scheduler) worse. Also I am afraid that we would
> > > repeat many dead ways already tried by Jan Kara.
> > 
> > what deadlock?
> 
> The real risk semms to be a missing printk_deferred()
> needed in a newly used code.
> 
> I had also some wrongly done hand-shake of console_lock that
> cause several deadlocks. For example, we might wait for someone in
> console_unlock() who will never be scheduled. Or we will
> start waiting in printk() for console_lock() but there
> might be already another waiters in the list. There might
> be some dependecy between them. OK, this is rather
> theoreticall.
> 
> Anyway, using more code in printk() might cause recursion
> and deadlocks. So, we need to be careful.
> 
> Uff, all this is complicated. I wish I managed to explain
> this more easily.
> 
> Best Regards,
> Petr
> 
> PS: I am going to be off until the next Monday. It might take
> some time until I process all the other mails. I try hard to
> sort the ideas and keep some clear view.

thanks. have fun.

	-ss

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCHv3 2/5] printk: introduce printing kernel thread
  2017-07-02 20:27                     ` Pavel Machek
@ 2017-07-05 16:36                       ` Sergey Senozhatsky
  0 siblings, 0 replies; 51+ messages in thread
From: Sergey Senozhatsky @ 2017-07-05 16:36 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Sergey Senozhatsky, Petr Mladek, Steven Rostedt,
	Sergey Senozhatsky, Jan Kara, Andrew Morton, Peter Zijlstra,
	Rafael J . Wysocki, Eric Biederman, Greg Kroah-Hartman,
	Jiri Slaby, Andreas Mohr, Tetsuo Handa, linux-kernel

Hello,

On (07/02/17 22:27), Pavel Machek wrote:
[..]
> > +#define PRINTK_FLOOD_DEFAULT_DELAY	10
> > +
> >  int printk_delay_msec __read_mostly;
> 
> = 10;

the idea is to throttle printk() only (and as long as) when we see
that something is not right. like when we added 4 (a random number)
times more messages that we were able to print so far. printk_delay_msec
would throttle all printk calls. there is a difference.

there are some cases that are sort of broken with this automatic
throttling. the simples one is printk() under console_lock().
the harder one is throttling printk() from IRQ context. that
can be risky, in theory. may be I'm overcomplicating here; if
the system has hard locked up due to printk throttling from IRQ,
then probably it was going to lockup anyway.

need to check sysrq-t case, tho.

> > +	if (console_seen < 4 * console_to_see) {
> > +		if (printk_delay_msec)
> > +			__printk_delay(printk_delay_msec);
> > +		else
> > +			__printk_delay(PRINTK_FLOOD_DEFAULT_DELAY);
> 
> And get rid of if () here? We want to enable delays of 0...

this if () is totally useless and stupid, yes. realized after I sent
the mail.

	-ss

^ permalink raw reply	[flat|nested] 51+ messages in thread

end of thread, other threads:[~2017-07-05 16:37 UTC | newest]

Thread overview: 51+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-09  8:28 [RFC][PATCHv3 0/5] printk: introduce printing kernel thread Sergey Senozhatsky
2017-05-09  8:28 ` [RFC][PATCHv3 1/5] printk: move printk_pending out of per-cpu Sergey Senozhatsky
2017-05-25 12:11   ` Petr Mladek
2017-05-25 12:36     ` Sergey Senozhatsky
2017-05-25 12:43       ` Petr Mladek
2017-05-09  8:28 ` [RFC][PATCHv3 2/5] printk: introduce printing kernel thread Sergey Senozhatsky
2017-05-10  5:59   ` Sergey Senozhatsky
2017-05-29  9:29     ` Petr Mladek
2017-05-29 12:12       ` Jan Kara
2017-05-31  7:30         ` Sergey Senozhatsky
2017-05-31 21:44           ` Andreas Mohr
2017-06-01  7:21           ` Sergey Senozhatsky
2017-06-01  7:23             ` Sergey Senozhatsky
2017-06-01  9:20             ` Sergey Senozhatsky
2017-06-28 13:17             ` Petr Mladek
2017-06-29  7:40               ` Sergey Senozhatsky
2017-06-28 12:45           ` Petr Mladek
2017-05-31  7:22       ` Sergey Senozhatsky
2017-06-28 12:19         ` Petr Mladek
2017-06-29  6:26           ` Andreas Mohr
2017-06-29  7:00             ` Andreas Mohr
2017-06-29  7:33           ` Sergey Senozhatsky
2017-06-29 11:24             ` Tetsuo Handa
2017-06-30  7:01             ` Sergey Senozhatsky
2017-06-30 10:18               ` Tetsuo Handa
2017-06-30 11:57                 ` Sergey Senozhatsky
2017-06-30 12:35                   ` Tetsuo Handa
2017-06-30 12:50                     ` Sergey Senozhatsky
2017-06-30 13:16               ` Petr Mladek
2017-06-30 13:38                 ` Sergey Senozhatsky
2017-07-03 11:11                   ` Sergey Senozhatsky
2017-07-02 20:27                     ` Pavel Machek
2017-07-05 16:36                       ` Sergey Senozhatsky
2017-07-03 19:34                     ` Steven Rostedt
2017-07-04  5:26                       ` Sergey Senozhatsky
2017-07-04  6:53                         ` Sergey Senozhatsky
2017-06-30 14:10                 ` Sergey Senozhatsky
2017-06-30 11:54             ` Petr Mladek
2017-06-30 12:42               ` Sergey Senozhatsky
2017-06-30 13:33                 ` Steven Rostedt
2017-06-30 14:28                   ` Sergey Senozhatsky
2017-06-30 14:45                     ` Steven Rostedt
2017-07-01  1:50                       ` Sergey Senozhatsky
2017-07-04 14:03                 ` Petr Mladek
2017-07-05  5:02                   ` Sergey Senozhatsky
2017-05-09  8:28 ` [RFC][PATCHv3 3/5] printk: add enforce_emergency parameter Sergey Senozhatsky
2017-05-09  8:28 ` [RFC][PATCHv3 4/5] printk: enable printk offloading Sergey Senozhatsky
2017-05-09  8:28 ` [RFC][PATCHv3 5/5] printk: register PM notifier Sergey Senozhatsky
2017-05-30  9:55   ` Petr Mladek
2017-05-31  6:42     ` Sergey Senozhatsky
2017-06-27 14:45       ` Petr Mladek

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.