All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC][PATCH 0/4] printk: introduce printing kernel thread
@ 2017-03-06 12:45 Sergey Senozhatsky
  2017-03-06 12:45 ` [RFC][PATCH 1/4] " Sergey Senozhatsky
                   ` (5 more replies)
  0 siblings, 6 replies; 29+ messages in thread
From: Sergey Senozhatsky @ 2017-03-06 12:45 UTC (permalink / raw)
  To: Steven Rostedt, Petr Mladek
  Cc: Andrew Morton, Linus Torvalds, Peter Zijlstra,
	Rafael J . Wysocki, linux-kernel, Sergey Senozhatsky,
	Sergey Senozhatsky

Hello,

	RFC

	This patch set adds a printk() kernel thread which lets us to
print kernel messages to the console from a non-atomic/schedule-able
context, avoiding different sort of lockups, stalls, etc.


The patch set is in [RFC] stage - it may contain ugly and dumb ideas.
The reason I decided to take a step back is because we've been using
printk kthread in development/production for ~10 months now and there
are some observations from my side, which I want to list/outline.

 1) We had a `bool' flag to temporarily disable printk offloading (in suspend,
 hibernate, etc.) to address a bug reported by Viresh Kumar [1]. Turned out
 that our in-house kernel tweaks/ad hocs in PM code introduce nested sections
 that disable/enable printk offloading and thus `bool' flag can re-enable
 printk-kthread too early. (it wasn't me who added those PM 'tweaks' :)).

 Would there be any objections if we switch to `int' flag and allow nested
 printk-kthread enable/disable sections?


 2) Just letting you know,
 printk kthread changes the behavior of printk in one _corner case_.
 The corner case is quite interesting and actually consists of two corner
 cases. Suppose on SMP system there is only one CPU that printk()-s a lot,
 the rest of CPUs don't lock console_sem and don't printk(). Previously
 that printing CPU had been throttling itself (*) because of console drivers
 call for every printk():

          CPU0

       printk("a")
        console_unlock()
         call_console_drivers("a")

         ...

       printk("z")
        console_unlock()
         call_console_drivers("z")

 * Given that no other CPU locks the console_sem.

 With printk kthread the case turns into this one:

          CPU0                               CPU1

       printk("a")
       wake_up printk_kthread
       ...                               printk_kthread
       printk("k")                        console_unlock()
       ...                                 call_console_drivers("a")
       printk("z")                         call_console_drivers("b")
                                           call_console_drivers("c")
                                           ...


 The second 'corner case' part here is that CPU0 may be much faster
 than printing CPU, which may result in dropped printk messages.

 This all is absolutely possible even with out the printk-kthread.
 A single console_lock() call from CPUx will result in exactly the
 same condition. So it's not necessarily a regression. But there may
 be some scenarios in the kernel that may suffer from this change.
 From the top of my head -- sysrq backtrace dump, and, probably, OOM
 print out and backtrace dump.

 sysrq is potentially even trickier. can we always wake_up() kernel
 thread from sysrq? there probably might be cases when we can't rely
 on the scheduler.

 So may be we would want to export disable/enable printk-kthread offloading
 symbols and let people use them in appropriate places.

 What do you think?


 3) A minor one,
 'async printk' is, apparently, a confusing name. people assume that the
 opposite to 'asynchronous printk' is 'synchronous printk'. which is not
 the case. printk may accidentally be synchronous but in general there is
 no way to make it always synchronous on SMP system for all CPUs (well,
 except for wrapping every printk() in console_lock()/console_unlock(),
 probably), because it's up to the CPU that holds the console_sem to print
 kernel messages from console_unlock() at some point. so in some sense it's
 asynchronous by design. printk-kthread targets different aims anyway.
 but still I would probably prefer to avoid this confusion.



[1] this is from the days when printk-kthread had RT prio
    lkml.kernel.org/r/20160714221251.GE3057@ubuntu



= Patches 0001/0002 just introduce the printk kthread, it's not eanbled yet.

= Patch 0003 is incomplete and a bit ugly. I need to think more. It exposes
some printk internals and in general I'm not really happy with it. PM parts,
probably, can be kept within the printk.c. For example, disable printk-kthread
in suspend_console() and enable it in resume_console(). unlike sysrq though.

= Patch 0004 enables printk-kthread.


Sergey Senozhatsky (4):
  printk: introduce printing kernel thread
  printk: offload printing from wake_up_klogd_work_func()
  kernel, power: disable printk_kthread in unsafe places
  printk: enable printk offloading

 include/linux/console.h  |   3 ++
 kernel/kexec_core.c      |   4 ++
 kernel/power/hibernate.c |   8 +++
 kernel/power/suspend.c   |   4 ++
 kernel/printk/printk.c   | 133 ++++++++++++++++++++++++++++++++++++++++++++---
 5 files changed, 144 insertions(+), 8 deletions(-)

-- 
2.12.0

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [RFC][PATCH 1/4] printk: introduce printing kernel thread
  2017-03-06 12:45 [RFC][PATCH 0/4] printk: introduce printing kernel thread Sergey Senozhatsky
@ 2017-03-06 12:45 ` Sergey Senozhatsky
  2017-03-22 16:40   ` Petr Mladek
  2017-03-06 12:45 ` [RFC][PATCH 2/4] printk: offload printing from wake_up_klogd_work_func() Sergey Senozhatsky
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 29+ messages in thread
From: Sergey Senozhatsky @ 2017-03-06 12:45 UTC (permalink / raw)
  To: Steven Rostedt, Petr Mladek
  Cc: Andrew Morton, Linus Torvalds, Peter Zijlstra,
	Rafael J . Wysocki, linux-kernel, Sergey Senozhatsky,
	Sergey Senozhatsky, Jan Kara

printk() is quite complex internally and, basically, it does two
slightly independent things:
 a) adds a new message to a kernel log buffer (log_store())
 b) prints kernel log messages to serial consoles (console_unlock())

while (a) is guaranteed to be executed by printk(), (b) is not, for a
variety of reasons. Unlike log_store(), console_unlock() comes at
a price:

 1) console_unlock() attempts to flush all pending kernel log messages
to the console. Thus, it can loop indefinitely.

 2) while console_unlock() is executed on one particular CPU, printing
pending kernel log messages, other CPUs can simultaneously append new
messages to the kernel log buffer.

 3) the time it takes console_unlock() to print kernel messages also
depends on the speed of the console -- which may not be fast at all.

 4) console_unlock() is executed in the same context as printk(), so
it may be non-preemptible/atomic, which makes 1)-3) dangerous.

As a result, nobody knows how long a printk() call will take, so
it's not really safe to call printk() in a number of situations,
including atomic context, RCU critical sections, interrupt context,
and more.

This patch introduces a dedicated printing kernel thread - printk_kthread.
The main purpose of this kthread is to offload printing to a non-atomic
and always scheduleable context, which eliminates 4) and makes 1)-3) less
critical. printk() now just appends log messages to the kernel log buffer
and wake_up()s printk_kthread instead of locking console_sem and calling
into potentially unsafe console_unlock().

This, however, is not always safe on its own. For example, we can't call
into the scheduler from panic(), because this may cause deadlock. That's
why printk() switches back to the old behaviour (console_unlock() from
vprintk_emit()) in those cases. In general, this switch happens
automatically once a EMERG log level message appears in the log buffer.
Another cases when wake_up() might not work as expected are suspend,
hibernate, etc. For those situations we provide two new functions:
 -- console_printing_thread_off()
    Disables printk offloading. All printk() calls will attempt
    to lock the console_sem and, if successful, flush kernel log
    messages.

 -- console_printing_thread_on()
    Enables printk offloading.

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 include/linux/console.h |   3 ++
 kernel/printk/printk.c  | 101 +++++++++++++++++++++++++++++++++++++++++++++---
 2 files changed, 99 insertions(+), 5 deletions(-)

diff --git a/include/linux/console.h b/include/linux/console.h
index 5949d1855589..bea775609249 100644
--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -187,6 +187,9 @@ extern bool console_suspend_enabled;
 extern void suspend_console(void);
 extern void resume_console(void);
 
+extern void console_printing_thread_off(void);
+extern void console_printing_thread_on(void);
+
 int mda_console_init(void);
 void prom_con_init(void);
 
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 2984fb0f0257..1c4232ca2e6a 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -48,6 +48,7 @@
 #include <linux/sched/clock.h>
 #include <linux/sched/debug.h>
 #include <linux/sched/task_stack.h>
+#include <linux/kthread.h>
 
 #include <linux/uaccess.h>
 #include <asm/sections.h>
@@ -437,6 +438,46 @@ static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN);
 static char *log_buf = __log_buf;
 static u32 log_buf_len = __LOG_BUF_LEN;
 
+static struct task_struct *printk_kthread __read_mostly;
+/* When `true' printing thread has messages to print. */
+static bool printk_kthread_need_flush_console;
+/*
+ * We can't call into the scheduler (wake_up() printk kthread), for example,
+ * during suspend/kexec. This temporarily switches printk to old behaviour.
+ */
+static int printk_kthread_disable __read_mostly;
+/*
+ * Switch printk to old behavior permanently. Unlike `oops_in_progress'
+ * it doesn't go back to 0.
+ */
+static bool printk_emergency __read_mostly;
+
+static inline bool printk_kthread_enabled(void)
+{
+	return !printk_emergency &&
+		printk_kthread && READ_ONCE(printk_kthread_disable) == 0;
+}
+
+/*
+ * This disables printing offloading and instead attempts
+ * to do the usual console_trylock()->console_unlock().
+ *
+ * Note, this does not stop the printk_kthread if it's already
+ * printing logbuf messages.
+ */
+void console_printing_thread_off(void)
+{
+	printk_kthread_disable++;
+	barrier();
+}
+
+/* This re-enables printk_kthread offloading. */
+void console_printing_thread_on(void)
+{
+	barrier();
+	printk_kthread_disable--;
+}
+
 /* Return log buffer address */
 char *log_buf_addr_get(void)
 {
@@ -1710,6 +1751,14 @@ asmlinkage int vprintk_emit(int facility, int level,
 		in_sched = true;
 	}
 
+	/*
+	 * Emergency level indicates that the system is unstable and, thus,
+	 * we better stop relying on wake_up(printk_kthread) and try to do
+	 * a direct printing.
+	 */
+	if (level == LOGLEVEL_EMERG)
+		printk_emergency = true;
+
 	boot_delay_msec(level);
 	printk_delay();
 
@@ -1762,12 +1811,28 @@ asmlinkage int vprintk_emit(int facility, int level,
 	/* If called from the scheduler, we can not call up(). */
 	if (!in_sched) {
 		/*
-		 * Try to acquire and then immediately release the console
-		 * semaphore.  The release will print out buffers and wake up
-		 * /dev/kmsg and syslog() users.
+		 * Under heavy printing load/slow serial console/etc
+		 * console_unlock() can stall CPUs, which can result in
+		 * soft/hard-lockups, lost interrupts, RCU stalls, etc.
+		 * Therefore we attempt to print the messages to console
+		 * from a dedicated printk_kthread, which always runs in
+		 * schedulable context.
 		 */
-		if (console_trylock())
-			console_unlock();
+		if (printk_kthread_enabled()) {
+			printk_kthread_need_flush_console = true;
+
+			printk_safe_enter_irqsave(flags);
+			wake_up_process(printk_kthread);
+			printk_safe_exit_irqrestore(flags);
+		} else {
+			/*
+			 * Try to acquire and then immediately release the
+			 * console semaphore. The release will print out
+			 * buffers and wake up /dev/kmsg and syslog() users.
+			 */
+			if (console_trylock())
+				console_unlock();
+		}
 	}
 
 	return printed_len;
@@ -1874,6 +1939,9 @@ static size_t msg_print_text(const struct printk_log *msg,
 			     bool syslog, char *buf, size_t size) { return 0; }
 static bool suppress_message_printing(int level) { return false; }
 
+void console_printing_thread_off(void) {}
+void console_printing_thread_on(void) {}
+
 #endif /* CONFIG_PRINTK */
 
 #ifdef CONFIG_EARLY_PRINTK
@@ -2681,6 +2749,29 @@ static DEFINE_PER_CPU(struct irq_work, wake_up_klogd_work) = {
 	.flags = IRQ_WORK_LAZY,
 };
 
+static int printk_kthread_func(void *data)
+{
+	while (1) {
+		set_current_state(TASK_INTERRUPTIBLE);
+		if (!printk_kthread_need_flush_console)
+			schedule();
+
+		__set_current_state(TASK_RUNNING);
+		/*
+		 * Avoid an infinite loop when console_unlock() cannot
+		 * access consoles, e.g. because console_suspended is
+		 * true. schedule(), someone else will print the messages
+		 * from resume_console().
+		 */
+		printk_kthread_need_flush_console = false;
+
+		console_lock();
+		console_unlock();
+	}
+
+	return 0;
+}
+
 void wake_up_klogd(void)
 {
 	preempt_disable();
-- 
2.12.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC][PATCH 2/4] printk: offload printing from wake_up_klogd_work_func()
  2017-03-06 12:45 [RFC][PATCH 0/4] printk: introduce printing kernel thread Sergey Senozhatsky
  2017-03-06 12:45 ` [RFC][PATCH 1/4] " Sergey Senozhatsky
@ 2017-03-06 12:45 ` Sergey Senozhatsky
  2017-03-17 12:19   ` Petr Mladek
  2017-03-06 12:45 ` [RFC][PATCH 3/4] kernel, power: disable printk_kthread in unsafe places Sergey Senozhatsky
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 29+ messages in thread
From: Sergey Senozhatsky @ 2017-03-06 12:45 UTC (permalink / raw)
  To: Steven Rostedt, Petr Mladek
  Cc: Andrew Morton, Linus Torvalds, Peter Zijlstra,
	Rafael J . Wysocki, linux-kernel, Sergey Senozhatsky,
	Sergey Senozhatsky

Offload printing of printk_deferred() messages from IRQ context
to a schedulable printing kthread, when possible (the same way
we do it in vprintk_emit()). Otherwise, console_unlock() can
force the printing CPU to spend unbound amount of time flushing
kernel messages from IRQ context.

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
---
 kernel/printk/printk.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 1c4232ca2e6a..6e00073a7331 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2735,9 +2735,16 @@ static void wake_up_klogd_work_func(struct irq_work *irq_work)
 	int pending = __this_cpu_xchg(printk_pending, 0);
 
 	if (pending & PRINTK_PENDING_OUTPUT) {
-		/* If trylock fails, someone else is doing the printing */
-		if (console_trylock())
-			console_unlock();
+		if (printk_kthread_enabled()) {
+			wake_up_process(printk_kthread);
+		} else {
+			/*
+			 * If trylock fails, someone else is doing the
+			 * printing
+			 */
+			if (console_trylock())
+				console_unlock();
+		}
 	}
 
 	if (pending & PRINTK_PENDING_WAKEUP)
-- 
2.12.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC][PATCH 3/4] kernel, power: disable printk_kthread in unsafe places
  2017-03-06 12:45 [RFC][PATCH 0/4] printk: introduce printing kernel thread Sergey Senozhatsky
  2017-03-06 12:45 ` [RFC][PATCH 1/4] " Sergey Senozhatsky
  2017-03-06 12:45 ` [RFC][PATCH 2/4] printk: offload printing from wake_up_klogd_work_func() Sergey Senozhatsky
@ 2017-03-06 12:45 ` Sergey Senozhatsky
  2017-03-22 15:38     ` Petr Mladek
  2017-03-06 12:45 ` [RFC][PATCH 4/4] printk: enable printk offloading Sergey Senozhatsky
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 29+ messages in thread
From: Sergey Senozhatsky @ 2017-03-06 12:45 UTC (permalink / raw)
  To: Steven Rostedt, Petr Mladek
  Cc: Andrew Morton, Linus Torvalds, Peter Zijlstra,
	Rafael J . Wysocki, linux-kernel, Sergey Senozhatsky,
	Sergey Senozhatsky

It's not always possible/safe to wake_up() printk kernel
thread. For example, late suspend/early resume may printk()
while timekeeping is not initialized yet, so calling into the
scheduler may result in recursive warnings. Thus we need
printk() to operate in old mode there and attempt to immediately
flush pending kernel message to the console. This patch adds
console_printing_thread_off/on sections.

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
---
 kernel/kexec_core.c      | 4 ++++
 kernel/power/hibernate.c | 8 ++++++++
 kernel/power/suspend.c   | 4 ++++
 3 files changed, 16 insertions(+)

diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index bfe62d5b3872..b40fba31150c 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -1496,6 +1496,8 @@ int kernel_kexec(void)
 		goto Unlock;
 	}
 
+	console_printing_thread_off();
+
 #ifdef CONFIG_KEXEC_JUMP
 	if (kexec_image->preserve_context) {
 		lock_system_sleep();
@@ -1565,6 +1567,8 @@ int kernel_kexec(void)
 	}
 #endif
 
+	console_printing_thread_on();
+
  Unlock:
 	mutex_unlock(&kexec_mutex);
 	return error;
diff --git a/kernel/power/hibernate.c b/kernel/power/hibernate.c
index a8b978c35a6a..15634841ccc6 100644
--- a/kernel/power/hibernate.c
+++ b/kernel/power/hibernate.c
@@ -502,6 +502,7 @@ int hibernation_restore(int platform_mode)
 {
 	int error;
 
+	console_printing_thread_off();
 	pm_prepare_console();
 	suspend_console();
 	pm_restrict_gfp_mask();
@@ -519,6 +520,7 @@ int hibernation_restore(int platform_mode)
 	pm_restore_gfp_mask();
 	resume_console();
 	pm_restore_console();
+	console_printing_thread_on();
 	return error;
 }
 
@@ -542,6 +544,7 @@ int hibernation_platform_enter(void)
 		goto Close;
 
 	entering_platform_hibernation = true;
+	console_printing_thread_off();
 	suspend_console();
 	error = dpm_suspend_start(PMSG_HIBERNATE);
 	if (error) {
@@ -589,6 +592,7 @@ int hibernation_platform_enter(void)
 	entering_platform_hibernation = false;
 	dpm_resume_end(PMSG_RESTORE);
 	resume_console();
+	console_printing_thread_on();
 
  Close:
 	hibernation_ops->end();
@@ -692,6 +696,7 @@ int hibernate(void)
 		goto Unlock;
 	}
 
+	console_printing_thread_off();
 	pm_prepare_console();
 	error = __pm_notifier_call_chain(PM_HIBERNATION_PREPARE, -1, &nr_calls);
 	if (error) {
@@ -759,6 +764,7 @@ int hibernate(void)
  Exit:
 	__pm_notifier_call_chain(PM_POST_HIBERNATION, nr_calls, NULL);
 	pm_restore_console();
+	console_printing_thread_on();
 	atomic_inc(&snapshot_device_available);
  Unlock:
 	unlock_system_sleep();
@@ -868,6 +874,7 @@ static int software_resume(void)
 		goto Unlock;
 	}
 
+	console_printing_thread_off();
 	pm_prepare_console();
 	error = __pm_notifier_call_chain(PM_RESTORE_PREPARE, -1, &nr_calls);
 	if (error) {
@@ -884,6 +891,7 @@ static int software_resume(void)
  Finish:
 	__pm_notifier_call_chain(PM_POST_RESTORE, nr_calls, NULL);
 	pm_restore_console();
+	console_printing_thread_on();
 	atomic_inc(&snapshot_device_available);
 	/* For success case, the suspend path will release the lock */
  Unlock:
diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c
index 15e6baef5c73..7c6e228b7de4 100644
--- a/kernel/power/suspend.c
+++ b/kernel/power/suspend.c
@@ -433,6 +433,7 @@ int suspend_devices_and_enter(suspend_state_t state)
 	if (!sleep_state_supported(state))
 		return -ENOSYS;
 
+	console_printing_thread_off();
 	error = platform_suspend_begin(state);
 	if (error)
 		goto Close;
@@ -462,6 +463,7 @@ int suspend_devices_and_enter(suspend_state_t state)
 
  Close:
 	platform_resume_end(state);
+	console_printing_thread_on();
 	return error;
 
  Recover_platform:
@@ -520,6 +522,7 @@ static int enter_state(suspend_state_t state)
 #endif
 
 	pr_debug("PM: Preparing system for sleep (%s)\n", pm_states[state]);
+	console_printing_thread_off();
 	pm_suspend_clear_flags();
 	error = suspend_prepare(state);
 	if (error)
@@ -537,6 +540,7 @@ static int enter_state(suspend_state_t state)
  Finish:
 	pr_debug("PM: Finishing wakeup.\n");
 	suspend_finish();
+	console_printing_thread_on();
  Unlock:
 	mutex_unlock(&pm_mutex);
 	return error;
-- 
2.12.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC][PATCH 4/4] printk: enable printk offloading
  2017-03-06 12:45 [RFC][PATCH 0/4] printk: introduce printing kernel thread Sergey Senozhatsky
                   ` (2 preceding siblings ...)
  2017-03-06 12:45 ` [RFC][PATCH 3/4] kernel, power: disable printk_kthread in unsafe places Sergey Senozhatsky
@ 2017-03-06 12:45 ` Sergey Senozhatsky
  2017-03-22 15:43   ` Petr Mladek
  2017-03-22 17:59 ` [RFC][PATCH 0/4] printk: introduce printing kernel thread Peter Zijlstra
  2017-04-03 11:53 ` Sergey Senozhatsky
  5 siblings, 1 reply; 29+ messages in thread
From: Sergey Senozhatsky @ 2017-03-06 12:45 UTC (permalink / raw)
  To: Steven Rostedt, Petr Mladek
  Cc: Andrew Morton, Linus Torvalds, Peter Zijlstra,
	Rafael J . Wysocki, linux-kernel, Sergey Senozhatsky,
	Sergey Senozhatsky

Initialize the kernel printing thread and enable printk()
offloading.

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
---
 kernel/printk/printk.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 6e00073a7331..601a9ef6db89 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2779,6 +2779,25 @@ static int printk_kthread_func(void *data)
 	return 0;
 }
 
+/*
+ * Init printk kthread at late_initcall stage, after core/arch/device/etc.
+ * initialization.
+ */
+static int __init init_printk_kthread(void)
+{
+	struct task_struct *thread;
+
+	thread = kthread_run(printk_kthread_func, NULL, "printk");
+	if (IS_ERR(thread)) {
+		pr_err("printk: unable to create printing thread\n");
+		return PTR_ERR(thread);
+	}
+
+	printk_kthread = thread;
+	return 0;
+}
+late_initcall(init_printk_kthread);
+
 void wake_up_klogd(void)
 {
 	preempt_disable();
-- 
2.12.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [RFC][PATCH 2/4] printk: offload printing from wake_up_klogd_work_func()
  2017-03-06 12:45 ` [RFC][PATCH 2/4] printk: offload printing from wake_up_klogd_work_func() Sergey Senozhatsky
@ 2017-03-17 12:19   ` Petr Mladek
  2017-03-18  9:57     ` Sergey Senozhatsky
  0 siblings, 1 reply; 29+ messages in thread
From: Petr Mladek @ 2017-03-17 12:19 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Steven Rostedt, Andrew Morton, Linus Torvalds, Peter Zijlstra,
	Rafael J . Wysocki, linux-kernel, Sergey Senozhatsky

On Mon 2017-03-06 21:45:52, Sergey Senozhatsky wrote:
> Offload printing of printk_deferred() messages from IRQ context
> to a schedulable printing kthread, when possible (the same way
> we do it in vprintk_emit()). Otherwise, console_unlock() can
> force the printing CPU to spend unbound amount of time flushing
> kernel messages from IRQ context.
> 
> Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
> ---
>  kernel/printk/printk.c | 13 ++++++++++---
>  1 file changed, 10 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 1c4232ca2e6a..6e00073a7331 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -2735,9 +2735,16 @@ static void wake_up_klogd_work_func(struct irq_work *irq_work)
>  	int pending = __this_cpu_xchg(printk_pending, 0);
>  
>  	if (pending & PRINTK_PENDING_OUTPUT) {
> -		/* If trylock fails, someone else is doing the printing */
> -		if (console_trylock())
> -			console_unlock();
> +		if (printk_kthread_enabled()) {
> +			wake_up_process(printk_kthread);

I have just noticed a possible race. printk_deferred() does not set
printk_kthread_need_flush_console and there might stay a
pending job:

CPU0					CPU1

printk_kthread_func()

  printk_kthread_need_flush_console = false;

  console_lock()
  console_unlock()

					printk_deferred()
					  vprintk_emit()
					  irq_work_queue()


					<IRQ>
					 wake_up_klogd_work_func()
					   if (printk_kthread_enabled())
					     wake_up_process(printk_kthread);

  set_current_state(TASK_INTERRUPTIBLE);
  if (!printk_kthread_need_flush_console)
    schedule();

Result: printk_kthread goes to sleep even though there is
	a pending job.


A solution might be to rename the variable to something like
printk_pending_output, always set it in vprintk_emit() and
clear it in console_unlock() when there are no pending messages.

I think that we have already discussed this in the past.
This solution would also remove one extra cycle if more messages
are handled by one console_unlock() call:

CPU0					CPU1

printk()
  vprintk_emit()
    printk_kthread_need_flush_console = true;
    wake_up_process(printk_kthread)


					<printk_kthread>

					printk_kthread_need_flush_console
					= false;

					console_lock()

printk()
  vprintk_emit()
    printk_kthread_need_flush_console = true;
    wake_up_process(printk_kthread)

					console_unlock()

					set_current_state(TASK_INTERRUPTIBLE);
					if (!printk_kthread_need_flush_console)
					  <fail>

					  _set_current_state(TASK_RUNNING);

					  console_lock()
					  console_unlock()

Result: The second console_unlock() has nothing to do.


If I remember correctly, you were not much happy with this
solution because it did spread the logic. I think that you did not
believe that it was worth fixing the second problem. But fixing
the race might need to spread the logic as well.

I see it the following way. vprintk_emit() is a producer,
console_unlock() is a consumer, and printk_thread is a room
that allows consumer to do its job. The consumer has more
rooms available. The state variable is a flag showing that
there is a pending job, consumer is looking for a room,
and printk_kthread should offer it.

Of course, it is possible that you will find a better
solution.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC][PATCH 2/4] printk: offload printing from wake_up_klogd_work_func()
  2017-03-17 12:19   ` Petr Mladek
@ 2017-03-18  9:57     ` Sergey Senozhatsky
  2017-03-20 16:09       ` Petr Mladek
  0 siblings, 1 reply; 29+ messages in thread
From: Sergey Senozhatsky @ 2017-03-18  9:57 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Andrew Morton,
	Linus Torvalds, Peter Zijlstra, Rafael J . Wysocki, linux-kernel,
	Sergey Senozhatsky

Hello Petr,

On (03/17/17 13:19), Petr Mladek wrote:
[..]
> A solution might be to rename the variable to something like
> printk_pending_output, always set it in vprintk_emit() and
> clear it in console_unlock() when there are no pending messages.

believe it or not, I thought that I set printk_kthread_need_flush_console
to `true' unconditionally. probably I did so in one of the previous patch
set iterations. weird. I agree that doing this makes sense. thanks for
bringing this up.


....

I don't want that printk_kthread_need_flush_console to exist. instead,
I think, I want to move printk_pending out of per-cpu memory and use a
global printk_pending. set PRINTK_PENDING_OUTPUT bit to true in
vprintk_emit(), clear it in console_unlock(). and make both printk_kthread
scheduling condition and console_unlock() retry path depend on
`printk_pending == 0' being true.

something like below (the code is ugly and lacks a ton of barriers, etc.
etc.)


---

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 601a9ef6db89..a0b231f49052 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -439,8 +439,6 @@ static char *log_buf = __log_buf;
 static u32 log_buf_len = __LOG_BUF_LEN;
 
 static struct task_struct *printk_kthread __read_mostly;
-/* When `true' printing thread has messages to print. */
-static bool printk_kthread_need_flush_console;
 /*
  * We can't call into the scheduler (wake_up() printk kthread), for example,
  * during suspend/kexec. This temporarily switches printk to old behaviour.
@@ -451,6 +449,13 @@ static int printk_kthread_disable __read_mostly;
  * it doesn't go back to 0.
  */
 static bool printk_emergency __read_mostly;
+/*
+ * Delayed printk version, for scheduler-internal messages:
+ */
+#define PRINTK_PENDING_WAKEUP	0x01
+#define PRINTK_PENDING_OUTPUT	0x02
+
+static int printk_pending = 0;
 
 static inline bool printk_kthread_enabled(void)
 {
@@ -1806,6 +1811,7 @@ asmlinkage int vprintk_emit(int facility, int level,
 
 	printed_len += log_output(facility, level, lflags, dict, dictlen, text, text_len);
 
+	printk_pending |= PRINTK_PENDING_OUTPUT;
 	logbuf_unlock_irqrestore(flags);
 
 	/* If called from the scheduler, we can not call up(). */
@@ -1819,8 +1825,6 @@ asmlinkage int vprintk_emit(int facility, int level,
 		 * schedulable context.
 		 */
 		if (printk_kthread_enabled()) {
-			printk_kthread_need_flush_console = true;
-
 			printk_safe_enter_irqsave(flags);
 			wake_up_process(printk_kthread);
 			printk_safe_exit_irqrestore(flags);
@@ -2220,10 +2224,11 @@ void console_unlock(void)
 	static char text[LOG_LINE_MAX + PREFIX_MAX];
 	static u64 seen_seq;
 	unsigned long flags;
-	bool wake_klogd = false;
-	bool do_cond_resched, retry;
+	bool wake_klogd;
+	bool do_cond_resched, retry = false;
 
 	if (console_suspended) {
+		printk_pending &= ~PRINTK_PENDING_OUTPUT;
 		up_console_sem();
 		return;
 	}
@@ -2242,6 +2247,8 @@ void console_unlock(void)
 	console_may_schedule = 0;
 
 again:
+	wake_klogd = printk_pending & PRINTK_PENDING_WAKEUP;
+	printk_pending = 0;
 	/*
 	 * We released the console_sem lock, so we need to recheck if
 	 * cpu is online and (if not) is there at least one CON_ANYTIME
@@ -2330,15 +2337,16 @@ void console_unlock(void)
 	 * flush, no worries.
 	 */
 	raw_spin_lock(&logbuf_lock);
-	retry = console_seq != log_next_seq;
+	if (printk_pending != 0 || console_seq != log_next_seq)
+		retry = true;
 	raw_spin_unlock(&logbuf_lock);
 	printk_safe_exit_irqrestore(flags);
 
-	if (retry && console_trylock())
-		goto again;
-
 	if (wake_klogd)
 		wake_up_klogd();
+
+	if (retry && console_trylock())
+		goto again;
 }
 EXPORT_SYMBOL(console_unlock);
 
@@ -2722,19 +2730,9 @@ static int __init printk_late_init(void)
 late_initcall(printk_late_init);
 
 #if defined CONFIG_PRINTK
-/*
- * Delayed printk version, for scheduler-internal messages:
- */
-#define PRINTK_PENDING_WAKEUP	0x01
-#define PRINTK_PENDING_OUTPUT	0x02
-
-static DEFINE_PER_CPU(int, printk_pending);
-
 static void wake_up_klogd_work_func(struct irq_work *irq_work)
 {
-	int pending = __this_cpu_xchg(printk_pending, 0);
-
-	if (pending & PRINTK_PENDING_OUTPUT) {
+	if (printk_pending & PRINTK_PENDING_OUTPUT) {
 		if (printk_kthread_enabled()) {
 			wake_up_process(printk_kthread);
 		} else {
@@ -2747,8 +2745,10 @@ static void wake_up_klogd_work_func(struct irq_work *irq_work)
 		}
 	}
 
-	if (pending & PRINTK_PENDING_WAKEUP)
+	if (printk_pending & PRINTK_PENDING_WAKEUP) {
+		printk_pending &= ~PRINTK_PENDING_WAKEUP;
 		wake_up_interruptible(&log_wait);
+	}
 }
 
 static DEFINE_PER_CPU(struct irq_work, wake_up_klogd_work) = {
@@ -2760,18 +2760,10 @@ static int printk_kthread_func(void *data)
 {
 	while (1) {
 		set_current_state(TASK_INTERRUPTIBLE);
-		if (!printk_kthread_need_flush_console)
+		if (!(printk_pending & PRINTK_PENDING_OUTPUT))
 			schedule();
 
 		__set_current_state(TASK_RUNNING);
-		/*
-		 * Avoid an infinite loop when console_unlock() cannot
-		 * access consoles, e.g. because console_suspended is
-		 * true. schedule(), someone else will print the messages
-		 * from resume_console().
-		 */
-		printk_kthread_need_flush_console = false;
-
 		console_lock();
 		console_unlock();
 	}
@@ -2802,7 +2794,7 @@ void wake_up_klogd(void)
 {
 	preempt_disable();
 	if (waitqueue_active(&log_wait)) {
-		this_cpu_or(printk_pending, PRINTK_PENDING_WAKEUP);
+		printk_pending |= PRINTK_PENDING_WAKEUP;
 		irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
 	}
 	preempt_enable();
@@ -2818,7 +2810,6 @@ int printk_deferred(const char *fmt, ...)
 	r = vprintk_emit(0, LOGLEVEL_SCHED, NULL, 0, fmt, args);
 	va_end(args);
 
-	__this_cpu_or(printk_pending, PRINTK_PENDING_OUTPUT);
 	irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
 	preempt_enable();
 

---


[..]
> If I remember correctly, you were not much happy with this
> solution because it did spread the logic. I think that you did not
> believe that it was worth fixing the second problem.

hm, I think Jan Kara was the first one who said that we
are overcomplicating the whole thing... or may be it was me.
don't deny it either.

	-ss

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [RFC][PATCH 2/4] printk: offload printing from wake_up_klogd_work_func()
  2017-03-18  9:57     ` Sergey Senozhatsky
@ 2017-03-20 16:09       ` Petr Mladek
  2017-03-21  4:01         ` Sergey Senozhatsky
  2017-03-23  9:00         ` Sergey Senozhatsky
  0 siblings, 2 replies; 29+ messages in thread
From: Petr Mladek @ 2017-03-20 16:09 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Steven Rostedt, Andrew Morton, Linus Torvalds, Peter Zijlstra,
	Rafael J . Wysocki, linux-kernel, Sergey Senozhatsky

On Sat 2017-03-18 18:57:39, Sergey Senozhatsky wrote:
> Hello Petr,
> 
> On (03/17/17 13:19), Petr Mladek wrote:
> [..]
> > A solution might be to rename the variable to something like
> > printk_pending_output, always set it in vprintk_emit() and
> > clear it in console_unlock() when there are no pending messages.
> 
> believe it or not, I thought that I set printk_kthread_need_flush_console
> to `true' unconditionally. probably I did so in one of the previous patch
> set iterations. weird. I agree that doing this makes sense. thanks for
> bringing this up.
> 
> ....
> 
> I don't want that printk_kthread_need_flush_console to exist. instead,
> I think, I want to move printk_pending out of per-cpu memory and use a
> global printk_pending. set PRINTK_PENDING_OUTPUT bit to true in
> vprintk_emit(), clear it in console_unlock(). and make both printk_kthread
> scheduling condition and console_unlock() retry path depend on
> `printk_pending == 0' being true.

I like the idea. The things closely related.
 
> something like below (the code is ugly and lacks a ton of barriers, etc.
> etc.)

Sigh, I wanted to add few comments and it got me deeper than I wanted.

I am sorry if some of my comments are obvious. I know that the
patch was a draft and you probably was aware of many of the
problems.

Anyway, it might make sense to do the change in more steps.
One thing is removing the per-cpu variable. Another thing is changing
the logic and setting/clearing the state variable in different
situation. Yet another thing is adding the support for
kthread, etc. I am afraid that it might be a patchset on its own.


> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 601a9ef6db89..a0b231f49052 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -439,8 +439,6 @@ static char *log_buf = __log_buf;
>  static u32 log_buf_len = __LOG_BUF_LEN;
>  
>  static struct task_struct *printk_kthread __read_mostly;
> -/* When `true' printing thread has messages to print. */
> -static bool printk_kthread_need_flush_console;
>  /*
>   * We can't call into the scheduler (wake_up() printk kthread), for example,
>   * during suspend/kexec. This temporarily switches printk to old behaviour.
> @@ -451,6 +449,13 @@ static int printk_kthread_disable __read_mostly;
>   * it doesn't go back to 0.
>   */
>  static bool printk_emergency __read_mostly;
> +/*
> + * Delayed printk version, for scheduler-internal messages:
> + */
> +#define PRINTK_PENDING_WAKEUP	0x01
> +#define PRINTK_PENDING_OUTPUT	0x02
> +
> +static int printk_pending = 0;

Something tells me that we need to use atomic_t. Otherwise, we could
not safely manipulate the bits withtout a lock.

Alternative solution would be to use two separate variables.
This might make the code easier to read. I think that they
were combined only to safe space in the per-CPU area.


>  static inline bool printk_kthread_enabled(void)
>  {
> @@ -1806,6 +1811,7 @@ asmlinkage int vprintk_emit(int facility, int level,
>  
>  	printed_len += log_output(facility, level, lflags, dict, dictlen, text, text_len);
>  
> +	printk_pending |= PRINTK_PENDING_OUTPUT;
>  	logbuf_unlock_irqrestore(flags);
>
>  	/* If called from the scheduler, we can not call up(). */
> @@ -1819,8 +1825,6 @@ asmlinkage int vprintk_emit(int facility, int level,
>  		 * schedulable context.
>  		 */
>  		if (printk_kthread_enabled()) {
> -			printk_kthread_need_flush_console = true;
> -
>  			printk_safe_enter_irqsave(flags);
>  			wake_up_process(printk_kthread);
>  			printk_safe_exit_irqrestore(flags);
> @@ -2220,10 +2224,11 @@ void console_unlock(void)
>  	static char text[LOG_LINE_MAX + PREFIX_MAX];
>  	static u64 seen_seq;
>  	unsigned long flags;
> -	bool wake_klogd = false;
> -	bool do_cond_resched, retry;
> +	bool wake_klogd;
> +	bool do_cond_resched, retry = false;
>  
>  	if (console_suspended) {
> +		printk_pending &= ~PRINTK_PENDING_OUTPUT;

Hmm, this is pretty non-intuitive. I guess that it is needed to
avoid a busy cycle in the printk kthread?

I can't find a better solution. We should at least use a name
that makes more sense, e.g. PRINTK_POKE_CONSOLE.

>  		up_console_sem();
>  		return;
>  	}
> @@ -2242,6 +2247,8 @@ void console_unlock(void)
>  	console_may_schedule = 0;
>  
>  again:
> +	wake_klogd = printk_pending & PRINTK_PENDING_WAKEUP;
> +	printk_pending = 0;

This might be racy. PRINTK_PENDING_WAKEUP is set without
a lock in bust_spinlocks() via wake_up_klogd(). The above
code read and clears the state non-atomically.


>  	/*
>  	 * We released the console_sem lock, so we need to recheck if
>  	 * cpu is online and (if not) is there at least one CON_ANYTIME
> @@ -2330,15 +2337,16 @@ void console_unlock(void)
>  	 * flush, no worries.
>  	 */
>  	raw_spin_lock(&logbuf_lock);
> -	retry = console_seq != log_next_seq;
> +	if (printk_pending != 0 || console_seq != log_next_seq)

printk_pending != 0 also when PRINTK_PENDING_WAKEUP is set.
I would do it the other way. I would clear PRINTK_PENDING_OUTPUT
when console_seq == log_next_seq and keep the check as is here.

> +		retry = true;
>  	raw_spin_unlock(&logbuf_lock);
>  	printk_safe_exit_irqrestore(flags);
>  
> -	if (retry && console_trylock())
> -		goto again;
> -
>  	if (wake_klogd)
>  		wake_up_klogd();
> +
> +	if (retry && console_trylock())
> +		goto again;

Why do you actually modify the logic for klogd()?
It might make sense but it is questionable. For example,
klogd() will need logbuf_lock as well. It might fight over
it with the console when the again target is used.
I would do it in separate patch and probably not
in this patchset.


>  }
>  EXPORT_SYMBOL(console_unlock);
>  
> @@ -2722,19 +2730,9 @@ static int __init printk_late_init(void)
>  late_initcall(printk_late_init);
>  
>  #if defined CONFIG_PRINTK
> -/*
> - * Delayed printk version, for scheduler-internal messages:
> - */
> -#define PRINTK_PENDING_WAKEUP	0x01
> -#define PRINTK_PENDING_OUTPUT	0x02
> -
> -static DEFINE_PER_CPU(int, printk_pending);

BTW: wake_up_klogd_work does not need to be per-CPU as well.
irq_work infrastructure heavily uses per-CPU variables.
But a global struct irq_work is safe, see irq_work_claim().

> 
> 
> [..]
> > If I remember correctly, you were not much happy with this
> > solution because it did spread the logic. I think that you did not
> > believe that it was worth fixing the second problem.
> 
> hm, I think Jan Kara was the first one who said that we
> are overcomplicating the whole thing... or may be it was me.
> don't deny it either.

I do not remember as well :-) Anyway, it really looks more
complicated than I thought.

I think that some clean up and optimization of the printk_pending
stuff is needed and worth it. I am just not sure whether to do it
before or after the printk kthread patchset.

I would slightly prefer to clean the printk_pending stuff first.
It might delay printk kthread patchset a bit but it will be cleaner.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC][PATCH 2/4] printk: offload printing from wake_up_klogd_work_func()
  2017-03-20 16:09       ` Petr Mladek
@ 2017-03-21  4:01         ` Sergey Senozhatsky
  2017-03-23  9:00         ` Sergey Senozhatsky
  1 sibling, 0 replies; 29+ messages in thread
From: Sergey Senozhatsky @ 2017-03-21  4:01 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Andrew Morton,
	Linus Torvalds, Peter Zijlstra, Rafael J . Wysocki, linux-kernel,
	Sergey Senozhatsky

On (03/20/17 17:09), Petr Mladek wrote:
[..]
> > I don't want that printk_kthread_need_flush_console to exist. instead,
> > I think, I want to move printk_pending out of per-cpu memory and use a
> > global printk_pending. set PRINTK_PENDING_OUTPUT bit to true in
> > vprintk_emit(), clear it in console_unlock(). and make both printk_kthread
> > scheduling condition and console_unlock() retry path depend on
> > `printk_pending == 0' being true.
> 
> I like the idea. The things closely related.
>  
> > something like below (the code is ugly and lacks a ton of barriers, etc.
> > etc.)
> 
> Sigh, I wanted to add few comments and it got me deeper than I wanted.

no worries, Petr.


[..]
> Anyway, it might make sense to do the change in more steps.

yes, sure. "per-CPU -> global printk_pending" transition first,
and then printk kthread.


[..]
> > +#define PRINTK_PENDING_WAKEUP	0x01
> > +#define PRINTK_PENDING_OUTPUT	0x02
> > +
> > +static int printk_pending = 0;
> 
> Something tells me that we need to use atomic_t. Otherwise, we could
> not safely manipulate the bits withtout a lock.

yes, I'm doing atomic set_bit/test_bit/clear_bit in current (unpublished) version.


> Alternative solution would be to use two separate variables.
> This might make the code easier to read. I think that they
> were combined only to safe space in the per-CPU area.

hm. I think one variable still can work for us; but can split it.

as of rename. dunno. I'm kinda OK with its current name.
PENDING_OUTPUT looks a bit better that POKE_CONSOLE to me.


[..]
> >  	if (console_suspended) {
> > +		printk_pending &= ~PRINTK_PENDING_OUTPUT;
> 
> Hmm, this is pretty non-intuitive. I guess that it is needed to
> avoid a busy cycle in the printk kthread?

it absolutely is.
sorry, the "code" I posted was too cryptic.

> >  		up_console_sem();
> >  		return;
> >  	}
> > @@ -2242,6 +2247,8 @@ void console_unlock(void)
> >  	console_may_schedule = 0;
> >  
> >  again:
> > +	wake_klogd = printk_pending & PRINTK_PENDING_WAKEUP;
> > +	printk_pending = 0;
> 
> This might be racy. PRINTK_PENDING_WAKEUP is set without
> a lock in bust_spinlocks() via wake_up_klogd(). The above
> code read and clears the state non-atomically.

the patch I'm looking at right now does atomic set_bit() and a bunch of
atomic test_and_clear_bit/test_bit/etc.

> >  	/*
> >  	 * We released the console_sem lock, so we need to recheck if
> >  	 * cpu is online and (if not) is there at least one CON_ANYTIME
> > @@ -2330,15 +2337,16 @@ void console_unlock(void)
> >  	 * flush, no worries.
> >  	 */
> >  	raw_spin_lock(&logbuf_lock);
> > -	retry = console_seq != log_next_seq;
> > +	if (printk_pending != 0 || console_seq != log_next_seq)
> 
> printk_pending != 0 also when PRINTK_PENDING_WAKEUP is set.

yes.

> I would do it the other way. I would clear PRINTK_PENDING_OUTPUT
> when console_seq == log_next_seq and keep the check as is here.
[..]
> > +		retry = true;
> >  	raw_spin_unlock(&logbuf_lock);
> >  	printk_safe_exit_irqrestore(flags);
> >  
> > -	if (retry && console_trylock())
> > -		goto again;
> > -
> >  	if (wake_klogd)
> >  		wake_up_klogd();
> > +
> > +	if (retry && console_trylock())
> > +		goto again;
> 
> Why do you actually modify the logic for klogd()?
> It might make sense but it is questionable. For example,
> klogd() will need logbuf_lock as well. It might fight over
> it with the console when the again target is used.
> I would do it in separate patch and probably not
> in this patchset.

I just wanted to keep printk_prnding check simpler and I figured out
that klogd logbuf_lock contention will not be something new, because
of the while() loop in kthread_printk function

	printk_tkread func
		while (1) {
			if (!pending_output)
				schedule();

			console_lock()
			console_unlock()
				wake_up klogd	/*
						 * and may be do another
						 * console_lock() straight ahead if pending_output != 0
						 */
		}

but yes. I'll drop that part and will handle only PRINTK_PENDING_OUTPUT
bit in console_unlock(), leaving the PRINTK_PENDING_WAKEUP stuff to
irq work.

I'll try to send out a refreshed version soon.


> >  EXPORT_SYMBOL(console_unlock);
> >  
> > @@ -2722,19 +2730,9 @@ static int __init printk_late_init(void)
> >  late_initcall(printk_late_init);
> >  
> >  #if defined CONFIG_PRINTK
> > -/*
> > - * Delayed printk version, for scheduler-internal messages:
> > - */
> > -#define PRINTK_PENDING_WAKEUP	0x01
> > -#define PRINTK_PENDING_OUTPUT	0x02
> > -
> > -static DEFINE_PER_CPU(int, printk_pending);
> 
> BTW: wake_up_klogd_work does not need to be per-CPU as well.
> irq_work infrastructure heavily uses per-CPU variables.
> But a global struct irq_work is safe, see irq_work_claim().

interesting. need to look at it.

we also can move printk_kthread irq_work out of per-CPU and drop
the whole 'if (printk_safe_irq_ready) smp_rmb() ' thing in this case
and simplify printk_safe_init().

may be in a separate patch set, though. since this is not really
related to printk kthread.


> > [..]
> > > If I remember correctly, you were not much happy with this
> > > solution because it did spread the logic. I think that you did not
> > > believe that it was worth fixing the second problem.
> > 
> > hm, I think Jan Kara was the first one who said that we
> > are overcomplicating the whole thing... or may be it was me.
> > don't deny it either.
> 
> I do not remember as well :-) Anyway, it really looks more
> complicated than I thought.
> 
> I think that some clean up and optimization of the printk_pending
> stuff is needed and worth it. I am just not sure whether to do it
> before or after the printk kthread patchset.
> 
> I would slightly prefer to clean the printk_pending stuff first.
> It might delay printk kthread patchset a bit but it will be cleaner.

absolutely agree. and thanks for looking into it.

	-ss

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC][PATCH 3/4] kernel, power: disable printk_kthread in unsafe places
  2017-03-06 12:45 ` [RFC][PATCH 3/4] kernel, power: disable printk_kthread in unsafe places Sergey Senozhatsky
@ 2017-03-22 15:38     ` Petr Mladek
  0 siblings, 0 replies; 29+ messages in thread
From: Petr Mladek @ 2017-03-22 15:38 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Steven Rostedt, Andrew Morton, Linus Torvalds, Peter Zijlstra,
	Rafael J . Wysocki, linux-kernel, linux-pm, Eric Biederman,
	kexec, Sergey Senozhatsky

(added Eric, kexec mailing likst, and linux-pm mailing list
into CC because we touch their code)

On Mon 2017-03-06 21:45:53, Sergey Senozhatsky wrote:
> It's not always possible/safe to wake_up() printk kernel
> thread. For example, late suspend/early resume may printk()
> while timekeeping is not initialized yet, so calling into the
> scheduler may result in recursive warnings. Thus we need
> printk() to operate in old mode there and attempt to immediately
> flush pending kernel message to the console. This patch adds
> console_printing_thread_off/on sections.

This patch helps to avoid the warning. But IMHO, even more
important effect is that it helps to actually see the
massages when things go wrong, the kthread is never
called later, and we even do not end up in panic()
that would flush them to the console.

I am afraid that there are more locations where
the old mode need to be forced. It might take
a while until they are found and fixed.

I would really like to have an option that would
force the old mode all the time. It might help
a lot when you debug a system freeze and the messages
are not printed because of the deferred console handling.

Otherwise, the location of console_printing_thread_off()/on()
calls looks reasonable to me. But it would be great to get
some confirmation from kexec and pm people.

Best Regards,
Petr


> Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
> ---
>  kernel/kexec_core.c      | 4 ++++
>  kernel/power/hibernate.c | 8 ++++++++
>  kernel/power/suspend.c   | 4 ++++
>  3 files changed, 16 insertions(+)
> 
> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
> index bfe62d5b3872..b40fba31150c 100644
> --- a/kernel/kexec_core.c
> +++ b/kernel/kexec_core.c
> @@ -1496,6 +1496,8 @@ int kernel_kexec(void)
>  		goto Unlock;
>  	}
>  
> +	console_printing_thread_off();
> +
>  #ifdef CONFIG_KEXEC_JUMP
>  	if (kexec_image->preserve_context) {
>  		lock_system_sleep();
> @@ -1565,6 +1567,8 @@ int kernel_kexec(void)
>  	}
>  #endif
>  
> +	console_printing_thread_on();
> +
>   Unlock:
>  	mutex_unlock(&kexec_mutex);
>  	return error;
> diff --git a/kernel/power/hibernate.c b/kernel/power/hibernate.c
> index a8b978c35a6a..15634841ccc6 100644
> --- a/kernel/power/hibernate.c
> +++ b/kernel/power/hibernate.c
> @@ -502,6 +502,7 @@ int hibernation_restore(int platform_mode)
>  {
>  	int error;
>  
> +	console_printing_thread_off();
>  	pm_prepare_console();
>  	suspend_console();
>  	pm_restrict_gfp_mask();
> @@ -519,6 +520,7 @@ int hibernation_restore(int platform_mode)
>  	pm_restore_gfp_mask();
>  	resume_console();
>  	pm_restore_console();
> +	console_printing_thread_on();
>  	return error;
>  }
>  
> @@ -542,6 +544,7 @@ int hibernation_platform_enter(void)
>  		goto Close;
>  
>  	entering_platform_hibernation = true;
> +	console_printing_thread_off();
>  	suspend_console();
>  	error = dpm_suspend_start(PMSG_HIBERNATE);
>  	if (error) {
> @@ -589,6 +592,7 @@ int hibernation_platform_enter(void)
>  	entering_platform_hibernation = false;
>  	dpm_resume_end(PMSG_RESTORE);
>  	resume_console();
> +	console_printing_thread_on();
>  
>   Close:
>  	hibernation_ops->end();
> @@ -692,6 +696,7 @@ int hibernate(void)
>  		goto Unlock;
>  	}
>  
> +	console_printing_thread_off();
>  	pm_prepare_console();
>  	error = __pm_notifier_call_chain(PM_HIBERNATION_PREPARE, -1, &nr_calls);
>  	if (error) {
> @@ -759,6 +764,7 @@ int hibernate(void)
>   Exit:
>  	__pm_notifier_call_chain(PM_POST_HIBERNATION, nr_calls, NULL);
>  	pm_restore_console();
> +	console_printing_thread_on();
>  	atomic_inc(&snapshot_device_available);
>   Unlock:
>  	unlock_system_sleep();
> @@ -868,6 +874,7 @@ static int software_resume(void)
>  		goto Unlock;
>  	}
>  
> +	console_printing_thread_off();
>  	pm_prepare_console();
>  	error = __pm_notifier_call_chain(PM_RESTORE_PREPARE, -1, &nr_calls);
>  	if (error) {
> @@ -884,6 +891,7 @@ static int software_resume(void)
>   Finish:
>  	__pm_notifier_call_chain(PM_POST_RESTORE, nr_calls, NULL);
>  	pm_restore_console();
> +	console_printing_thread_on();
>  	atomic_inc(&snapshot_device_available);
>  	/* For success case, the suspend path will release the lock */
>   Unlock:
> diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c
> index 15e6baef5c73..7c6e228b7de4 100644
> --- a/kernel/power/suspend.c
> +++ b/kernel/power/suspend.c
> @@ -433,6 +433,7 @@ int suspend_devices_and_enter(suspend_state_t state)
>  	if (!sleep_state_supported(state))
>  		return -ENOSYS;
>  
> +	console_printing_thread_off();
>  	error = platform_suspend_begin(state);
>  	if (error)
>  		goto Close;
> @@ -462,6 +463,7 @@ int suspend_devices_and_enter(suspend_state_t state)
>  
>   Close:
>  	platform_resume_end(state);
> +	console_printing_thread_on();
>  	return error;
>  
>   Recover_platform:
> @@ -520,6 +522,7 @@ static int enter_state(suspend_state_t state)
>  #endif
>  
>  	pr_debug("PM: Preparing system for sleep (%s)\n", pm_states[state]);
> +	console_printing_thread_off();
>  	pm_suspend_clear_flags();
>  	error = suspend_prepare(state);
>  	if (error)
> @@ -537,6 +540,7 @@ static int enter_state(suspend_state_t state)
>   Finish:
>  	pr_debug("PM: Finishing wakeup.\n");
>  	suspend_finish();
> +	console_printing_thread_on();
>   Unlock:
>  	mutex_unlock(&pm_mutex);
>  	return error;
> -- 
> 2.12.0
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC][PATCH 3/4] kernel, power: disable printk_kthread in unsafe places
@ 2017-03-22 15:38     ` Petr Mladek
  0 siblings, 0 replies; 29+ messages in thread
From: Petr Mladek @ 2017-03-22 15:38 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: linux-pm, Peter Zijlstra, kexec, Rafael J . Wysocki,
	linux-kernel, Steven Rostedt, Sergey Senozhatsky, Eric Biederman,
	Andrew Morton, Linus Torvalds

(added Eric, kexec mailing likst, and linux-pm mailing list
into CC because we touch their code)

On Mon 2017-03-06 21:45:53, Sergey Senozhatsky wrote:
> It's not always possible/safe to wake_up() printk kernel
> thread. For example, late suspend/early resume may printk()
> while timekeeping is not initialized yet, so calling into the
> scheduler may result in recursive warnings. Thus we need
> printk() to operate in old mode there and attempt to immediately
> flush pending kernel message to the console. This patch adds
> console_printing_thread_off/on sections.

This patch helps to avoid the warning. But IMHO, even more
important effect is that it helps to actually see the
massages when things go wrong, the kthread is never
called later, and we even do not end up in panic()
that would flush them to the console.

I am afraid that there are more locations where
the old mode need to be forced. It might take
a while until they are found and fixed.

I would really like to have an option that would
force the old mode all the time. It might help
a lot when you debug a system freeze and the messages
are not printed because of the deferred console handling.

Otherwise, the location of console_printing_thread_off()/on()
calls looks reasonable to me. But it would be great to get
some confirmation from kexec and pm people.

Best Regards,
Petr


> Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
> ---
>  kernel/kexec_core.c      | 4 ++++
>  kernel/power/hibernate.c | 8 ++++++++
>  kernel/power/suspend.c   | 4 ++++
>  3 files changed, 16 insertions(+)
> 
> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
> index bfe62d5b3872..b40fba31150c 100644
> --- a/kernel/kexec_core.c
> +++ b/kernel/kexec_core.c
> @@ -1496,6 +1496,8 @@ int kernel_kexec(void)
>  		goto Unlock;
>  	}
>  
> +	console_printing_thread_off();
> +
>  #ifdef CONFIG_KEXEC_JUMP
>  	if (kexec_image->preserve_context) {
>  		lock_system_sleep();
> @@ -1565,6 +1567,8 @@ int kernel_kexec(void)
>  	}
>  #endif
>  
> +	console_printing_thread_on();
> +
>   Unlock:
>  	mutex_unlock(&kexec_mutex);
>  	return error;
> diff --git a/kernel/power/hibernate.c b/kernel/power/hibernate.c
> index a8b978c35a6a..15634841ccc6 100644
> --- a/kernel/power/hibernate.c
> +++ b/kernel/power/hibernate.c
> @@ -502,6 +502,7 @@ int hibernation_restore(int platform_mode)
>  {
>  	int error;
>  
> +	console_printing_thread_off();
>  	pm_prepare_console();
>  	suspend_console();
>  	pm_restrict_gfp_mask();
> @@ -519,6 +520,7 @@ int hibernation_restore(int platform_mode)
>  	pm_restore_gfp_mask();
>  	resume_console();
>  	pm_restore_console();
> +	console_printing_thread_on();
>  	return error;
>  }
>  
> @@ -542,6 +544,7 @@ int hibernation_platform_enter(void)
>  		goto Close;
>  
>  	entering_platform_hibernation = true;
> +	console_printing_thread_off();
>  	suspend_console();
>  	error = dpm_suspend_start(PMSG_HIBERNATE);
>  	if (error) {
> @@ -589,6 +592,7 @@ int hibernation_platform_enter(void)
>  	entering_platform_hibernation = false;
>  	dpm_resume_end(PMSG_RESTORE);
>  	resume_console();
> +	console_printing_thread_on();
>  
>   Close:
>  	hibernation_ops->end();
> @@ -692,6 +696,7 @@ int hibernate(void)
>  		goto Unlock;
>  	}
>  
> +	console_printing_thread_off();
>  	pm_prepare_console();
>  	error = __pm_notifier_call_chain(PM_HIBERNATION_PREPARE, -1, &nr_calls);
>  	if (error) {
> @@ -759,6 +764,7 @@ int hibernate(void)
>   Exit:
>  	__pm_notifier_call_chain(PM_POST_HIBERNATION, nr_calls, NULL);
>  	pm_restore_console();
> +	console_printing_thread_on();
>  	atomic_inc(&snapshot_device_available);
>   Unlock:
>  	unlock_system_sleep();
> @@ -868,6 +874,7 @@ static int software_resume(void)
>  		goto Unlock;
>  	}
>  
> +	console_printing_thread_off();
>  	pm_prepare_console();
>  	error = __pm_notifier_call_chain(PM_RESTORE_PREPARE, -1, &nr_calls);
>  	if (error) {
> @@ -884,6 +891,7 @@ static int software_resume(void)
>   Finish:
>  	__pm_notifier_call_chain(PM_POST_RESTORE, nr_calls, NULL);
>  	pm_restore_console();
> +	console_printing_thread_on();
>  	atomic_inc(&snapshot_device_available);
>  	/* For success case, the suspend path will release the lock */
>   Unlock:
> diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c
> index 15e6baef5c73..7c6e228b7de4 100644
> --- a/kernel/power/suspend.c
> +++ b/kernel/power/suspend.c
> @@ -433,6 +433,7 @@ int suspend_devices_and_enter(suspend_state_t state)
>  	if (!sleep_state_supported(state))
>  		return -ENOSYS;
>  
> +	console_printing_thread_off();
>  	error = platform_suspend_begin(state);
>  	if (error)
>  		goto Close;
> @@ -462,6 +463,7 @@ int suspend_devices_and_enter(suspend_state_t state)
>  
>   Close:
>  	platform_resume_end(state);
> +	console_printing_thread_on();
>  	return error;
>  
>   Recover_platform:
> @@ -520,6 +522,7 @@ static int enter_state(suspend_state_t state)
>  #endif
>  
>  	pr_debug("PM: Preparing system for sleep (%s)\n", pm_states[state]);
> +	console_printing_thread_off();
>  	pm_suspend_clear_flags();
>  	error = suspend_prepare(state);
>  	if (error)
> @@ -537,6 +540,7 @@ static int enter_state(suspend_state_t state)
>   Finish:
>  	pr_debug("PM: Finishing wakeup.\n");
>  	suspend_finish();
> +	console_printing_thread_on();
>   Unlock:
>  	mutex_unlock(&pm_mutex);
>  	return error;
> -- 
> 2.12.0
> 

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC][PATCH 4/4] printk: enable printk offloading
  2017-03-06 12:45 ` [RFC][PATCH 4/4] printk: enable printk offloading Sergey Senozhatsky
@ 2017-03-22 15:43   ` Petr Mladek
  2017-03-22 16:40     ` Sergey Senozhatsky
  0 siblings, 1 reply; 29+ messages in thread
From: Petr Mladek @ 2017-03-22 15:43 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Steven Rostedt, Andrew Morton, Linus Torvalds, Peter Zijlstra,
	Rafael J . Wysocki, linux-kernel, Sergey Senozhatsky

On Mon 2017-03-06 21:45:54, Sergey Senozhatsky wrote:
> Initialize the kernel printing thread and enable printk()
> offloading.

The patch looks fine to me. But it might get more complicated
if we decide to have a global option to avoid the kthread.
Let's discuss this in the 3rd patch first.

Best Regards,
Petr

> Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
> ---
>  kernel/printk/printk.c | 19 +++++++++++++++++++
>  1 file changed, 19 insertions(+)
> 
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 6e00073a7331..601a9ef6db89 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -2779,6 +2779,25 @@ static int printk_kthread_func(void *data)
>  	return 0;
>  }
>  
> +/*
> + * Init printk kthread at late_initcall stage, after core/arch/device/etc.
> + * initialization.
> + */
> +static int __init init_printk_kthread(void)
> +{
> +	struct task_struct *thread;
> +
> +	thread = kthread_run(printk_kthread_func, NULL, "printk");
> +	if (IS_ERR(thread)) {
> +		pr_err("printk: unable to create printing thread\n");
> +		return PTR_ERR(thread);
> +	}
> +
> +	printk_kthread = thread;
> +	return 0;
> +}
> +late_initcall(init_printk_kthread);
> +
>  void wake_up_klogd(void)
>  {
>  	preempt_disable();
> -- 
> 2.12.0
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC][PATCH 4/4] printk: enable printk offloading
  2017-03-22 15:43   ` Petr Mladek
@ 2017-03-22 16:40     ` Sergey Senozhatsky
  0 siblings, 0 replies; 29+ messages in thread
From: Sergey Senozhatsky @ 2017-03-22 16:40 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Andrew Morton,
	Linus Torvalds, Peter Zijlstra, Rafael J . Wysocki, linux-kernel,
	Sergey Senozhatsky

Hello,

On (03/22/17 16:43), Petr Mladek wrote:
> The patch looks fine to me. But it might get more complicated
> if we decide to have a global option to avoid the kthread.
> Let's discuss this in the 3rd patch first.

thanks for taking a look.

so I was going to publish a new patch set tonight, but now I think I'll
wait for a day or two, since there are potentially more people looking
at 0003 now.

in the meantime, I uploaded the patches to
https://github.com/sergey-senozhatsky/linux-next-ss/commits/printk-kthread

there are some splits/reworks/etc.

	-ss

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC][PATCH 1/4] printk: introduce printing kernel thread
  2017-03-06 12:45 ` [RFC][PATCH 1/4] " Sergey Senozhatsky
@ 2017-03-22 16:40   ` Petr Mladek
  2017-03-23  5:12     ` Sergey Senozhatsky
  0 siblings, 1 reply; 29+ messages in thread
From: Petr Mladek @ 2017-03-22 16:40 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Steven Rostedt, Andrew Morton, Linus Torvalds, Peter Zijlstra,
	Rafael J . Wysocki, linux-kernel, Sergey Senozhatsky, Jan Kara

On Mon 2017-03-06 21:45:51, Sergey Senozhatsky wrote:
> This patch introduces a dedicated printing kernel thread - printk_kthread.
> The main purpose of this kthread is to offload printing to a non-atomic
> and always scheduleable context, which eliminates 4) and makes 1)-3) less
> critical. printk() now just appends log messages to the kernel log buffer
> and wake_up()s printk_kthread instead of locking console_sem and calling
> into potentially unsafe console_unlock().
> 
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> +/*
> + * This disables printing offloading and instead attempts
> + * to do the usual console_trylock()->console_unlock().
> + *
> + * Note, this does not stop the printk_kthread if it's already
> + * printing logbuf messages.
> + */
> +void console_printing_thread_off(void)
> +{
> +	printk_kthread_disable++;
> +	barrier();
> +}
> +
> +/* This re-enables printk_kthread offloading. */
> +void console_printing_thread_on(void)
> +{
> +	barrier();
> +	printk_kthread_disable--;
> +}

I really like that these functions are re-entrant. It will make
our life much easier.

Just a small nitpicking. I would prefer to use the name
console_printk_kthread_off()/on(). I was several times confused
by "printing_thread" when searching the sources. The common
sub-string "printk_kthread" for all the related stuff would
make my life easier ;-)

Just an idea. printk() and printk_deferred() behave the same way
when the kthread is enabled. I wonder if we should make it more
explicit by using names like:

   printk_deferred_mode_disabled++;
   printk_deferred_mode_off();
   printk_deferred_mode_on();

Also it is an already know term and a more generic name. This API
is used globally while the kthread is an implementation detail.
The offloading might be done another way in the future.

Finally, I think about using this variable instead of the ugly
LOGLEVEL_SCHED. The catch is that LOGLEVEL_SCHED forces
the deferred mode while these functions force the opposite.

I just think loudly. I wonder what is better to help
people understand the code in the future.


Otherwise, the patch looks fine to me. Well, there are some
related things discussed in the other patches that might
affect it.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC][PATCH 0/4] printk: introduce printing kernel thread
  2017-03-06 12:45 [RFC][PATCH 0/4] printk: introduce printing kernel thread Sergey Senozhatsky
                   ` (3 preceding siblings ...)
  2017-03-06 12:45 ` [RFC][PATCH 4/4] printk: enable printk offloading Sergey Senozhatsky
@ 2017-03-22 17:59 ` Peter Zijlstra
  2017-03-23  4:09   ` Sergey Senozhatsky
  2017-03-23 12:01   ` Petr Mladek
  2017-04-03 11:53 ` Sergey Senozhatsky
  5 siblings, 2 replies; 29+ messages in thread
From: Peter Zijlstra @ 2017-03-22 17:59 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Steven Rostedt, Petr Mladek, Andrew Morton, Linus Torvalds,
	Rafael J . Wysocki, linux-kernel, Sergey Senozhatsky

On Mon, Mar 06, 2017 at 09:45:50PM +0900, Sergey Senozhatsky wrote:
>  sysrq is potentially even trickier. can we always wake_up() kernel
>  thread from sysrq? there probably might be cases when we can't rely
>  on the scheduler.

sysrq runs from interrupt context, right? Should be able to do wakeups.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC][PATCH 0/4] printk: introduce printing kernel thread
  2017-03-22 17:59 ` [RFC][PATCH 0/4] printk: introduce printing kernel thread Peter Zijlstra
@ 2017-03-23  4:09   ` Sergey Senozhatsky
  2017-03-23  8:51     ` Peter Zijlstra
  2017-03-23 12:01   ` Petr Mladek
  1 sibling, 1 reply; 29+ messages in thread
From: Sergey Senozhatsky @ 2017-03-23  4:09 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Sergey Senozhatsky, Steven Rostedt, Petr Mladek, Andrew Morton,
	Linus Torvalds, Rafael J . Wysocki, linux-kernel,
	Sergey Senozhatsky

Hello Peter,

thanks for taking a look.

On (03/22/17 18:59), Peter Zijlstra wrote:
> On Mon, Mar 06, 2017 at 09:45:50PM +0900, Sergey Senozhatsky wrote:
> >  sysrq is potentially even trickier. can we always wake_up() kernel
> >  thread from sysrq? there probably might be cases when we can't rely
> >  on the scheduler.
> 
> sysrq runs from interrupt context, right? Should be able to do wakeups.

what I though about was -
	what if there are 'misbehaving' higher prio tasks all the time?
	the existing sysrq would attempt to do printing from irq context
	so it doesn't care about run queues.

does it make sense to you?

so what I have currently is something like this:
(not so sure about sysrq_handle_showstate_blocked())

---
 drivers/tty/sysrq.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/tty/sysrq.c b/drivers/tty/sysrq.c
index c6fc7141d7b2..f0d2684fa99c 100644
--- a/drivers/tty/sysrq.c
+++ b/drivers/tty/sysrq.c
@@ -49,6 +49,7 @@
 #include <linux/syscalls.h>
 #include <linux/of.h>
 #include <linux/rcupdate.h>
+#include <linux/console.h>
 
 #include <asm/ptrace.h>
 #include <asm/irq_regs.h>
@@ -239,6 +240,7 @@ static DECLARE_WORK(sysrq_showallcpus, sysrq_showregs_othercpus);
 
 static void sysrq_handle_showallcpus(int key)
 {
+	console_printing_thread_off();
 	/*
 	 * Fall back to the workqueue based printing if the
 	 * backtrace printing did not succeed or the
@@ -253,6 +255,7 @@ static void sysrq_handle_showallcpus(int key)
 		}
 		schedule_work(&sysrq_showallcpus);
 	}
+	console_printing_thread_on();
 }
 
 static struct sysrq_key_op sysrq_showallcpus_op = {
@@ -279,8 +282,10 @@ static struct sysrq_key_op sysrq_showregs_op = {
 
 static void sysrq_handle_showstate(int key)
 {
+	console_printing_thread_off();
 	show_state();
 	show_workqueue_state();
+	console_printing_thread_on();
 }
 static struct sysrq_key_op sysrq_showstate_op = {
 	.handler	= sysrq_handle_showstate,
@@ -291,7 +296,9 @@ static struct sysrq_key_op sysrq_showstate_op = {
 
 static void sysrq_handle_showstate_blocked(int key)
 {
+	console_printing_thread_off();
 	show_state_filter(TASK_UNINTERRUPTIBLE);
+	console_printing_thread_on();
 }
 static struct sysrq_key_op sysrq_showstate_blocked_op = {
 	.handler	= sysrq_handle_showstate_blocked,
-- 
2.12.1

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [RFC][PATCH 1/4] printk: introduce printing kernel thread
  2017-03-22 16:40   ` Petr Mladek
@ 2017-03-23  5:12     ` Sergey Senozhatsky
  2017-03-23 10:40       ` Petr Mladek
  0 siblings, 1 reply; 29+ messages in thread
From: Sergey Senozhatsky @ 2017-03-23  5:12 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Andrew Morton,
	Linus Torvalds, Peter Zijlstra, Rafael J . Wysocki, linux-kernel,
	Sergey Senozhatsky, Jan Kara

On (03/22/17 17:40), Petr Mladek wrote:
[..]
> > +void console_printing_thread_off(void)
> > +{
> > +	printk_kthread_disable++;
> > +	barrier();
> > +}
> > +
> > +/* This re-enables printk_kthread offloading. */
> > +void console_printing_thread_on(void)
> > +{
> > +	barrier();
> > +	printk_kthread_disable--;
> > +}
> 
> I really like that these functions are re-entrant. It will make
> our life much easier.
>
> Just a small nitpicking. I would prefer to use the name
> console_printk_kthread_off()/on(). I was several times confused
> by "printing_thread" when searching the sources. The common
> sub-string "printk_kthread" for all the related stuff would
> make my life easier ;-)

well, I guess I can rename it.

one observation here is that those functions neither turn off nor
disable printk_kthread. we mark the point after which we will not
wake_up printk_kthread, but that does not mean that printk_kthread
is already or soon will be inactive. it actually can be in running
state. so the name

console_do_not_offload_printing_to_printk_kthread_unless_its_already_running()

will describe it better.  :)

---

overall the whole need of off()/on() sections is slightly worrisome.
there are probably not so many cases when we need to forcibly avoid
wake_up(printk_kthread) calls, but we don't have any checks/mechanisms
to verify it (like, for example, might_sleep()). so it's case by case.


> Just an idea. printk() and printk_deferred() behave the same way
> when the kthread is enabled. I wonder if we should make it more
> explicit by using names like:
> 
>    printk_deferred_mode_disabled++;
>    printk_deferred_mode_off();
>    printk_deferred_mode_on();

hm, I certainly see what you meant here, but I suspect this naming may
be a bit misleading - "so printk_deferred_mode_off() disables printk_deferred()?"


> Also it is an already know term and a more generic name. This API
> is used globally while the kthread is an implementation detail.
> The offloading might be done another way in the future.

yes, this is why I avoided mentioning "printk_kthread" (directly)
in API naming. console_printing_thread is sort of neutral (well,
sort of). not insisting that the naming is perfect, of course.


> Finally, I think about using this variable instead of the ugly
> LOGLEVEL_SCHED. The catch is that LOGLEVEL_SCHED forces
> the deferred mode while these functions force the opposite.

hm, sorry. not sure I see how we can do this.
LOGLEVEL_SCHED hack is... hacky. yes.

when we examine `printk_kthread_disable' we decide wether we must
	a) wake_up() printk_kthread
or
	b) do console_trylock()

the LOGLEVEL_SCHED thing is completely different tho. it tells us that
neither of the above is safe -- both wake_up() and console_trylock()
can potentially call into the scheduler. so I'm not sure we can easily
replace LOGLEVEL_SCHED with `printk_kthread_disable'.

	-ss

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC][PATCH 0/4] printk: introduce printing kernel thread
  2017-03-23  4:09   ` Sergey Senozhatsky
@ 2017-03-23  8:51     ` Peter Zijlstra
  2017-03-24  1:59       ` Sergey Senozhatsky
  0 siblings, 1 reply; 29+ messages in thread
From: Peter Zijlstra @ 2017-03-23  8:51 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Steven Rostedt, Petr Mladek, Andrew Morton, Linus Torvalds,
	Rafael J . Wysocki, linux-kernel, Sergey Senozhatsky

On Thu, Mar 23, 2017 at 01:09:58PM +0900, Sergey Senozhatsky wrote:
> Hello Peter,
> 
> thanks for taking a look.
> 
> On (03/22/17 18:59), Peter Zijlstra wrote:
> > On Mon, Mar 06, 2017 at 09:45:50PM +0900, Sergey Senozhatsky wrote:
> > >  sysrq is potentially even trickier. can we always wake_up() kernel
> > >  thread from sysrq? there probably might be cases when we can't rely
> > >  on the scheduler.
> > 
> > sysrq runs from interrupt context, right? Should be able to do wakeups.
> 
> what I though about was -
> 	what if there are 'misbehaving' higher prio tasks all the time?
> 	the existing sysrq would attempt to do printing from irq context
> 	so it doesn't care about run queues.
> 
> does it make sense to you?

Ah, that's what you meant. Yeah, dunno, I'm still unconvinced about the
whole printk thread thing. Also those function names are horrifically
long.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC][PATCH 2/4] printk: offload printing from wake_up_klogd_work_func()
  2017-03-20 16:09       ` Petr Mladek
  2017-03-21  4:01         ` Sergey Senozhatsky
@ 2017-03-23  9:00         ` Sergey Senozhatsky
  2017-03-23 12:11           ` Petr Mladek
  1 sibling, 1 reply; 29+ messages in thread
From: Sergey Senozhatsky @ 2017-03-23  9:00 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Andrew Morton,
	Linus Torvalds, Peter Zijlstra, Rafael J . Wysocki, linux-kernel,
	Sergey Senozhatsky

On (03/20/17 17:09), Petr Mladek wrote:
[..]
> 
> BTW: wake_up_klogd_work does not need to be per-CPU as well.
> irq_work infrastructure heavily uses per-CPU variables.
> But a global struct irq_work is safe, see irq_work_claim().

so I have a patch that turns wake_up_klogd_work into a global variable,
out of curiosity, but I'm not entire sure about it. the sort of a problem
is that queued irq_works still go into a per-CPU run_lists.  per-CPU
wake_up_klogd_work permits us to queue irq work on several CPUs so we
might have better chances to execute wake_up_klogd_work_func(), while
global wake_up_klogd_work will be only in one run_list. this can defer
wake_up_klogd_work processing until that particular single CPU handles
its interrupt and calls irq_work_run_list(). what do you think?

	-ss

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC][PATCH 1/4] printk: introduce printing kernel thread
  2017-03-23  5:12     ` Sergey Senozhatsky
@ 2017-03-23 10:40       ` Petr Mladek
  2017-03-24  5:20         ` Sergey Senozhatsky
  0 siblings, 1 reply; 29+ messages in thread
From: Petr Mladek @ 2017-03-23 10:40 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Steven Rostedt, Andrew Morton, Linus Torvalds, Peter Zijlstra,
	Rafael J . Wysocki, linux-kernel, Sergey Senozhatsky, Jan Kara

On Thu 2017-03-23 14:12:42, Sergey Senozhatsky wrote:
> On (03/22/17 17:40), Petr Mladek wrote:
> [..]
> > > +void console_printing_thread_off(void)
> > > +{
> > > +	printk_kthread_disable++;
> > > +	barrier();
> > > +}
> > > +
> > > +/* This re-enables printk_kthread offloading. */
> > > +void console_printing_thread_on(void)
> > > +{
> > > +	barrier();
> > > +	printk_kthread_disable--;
> > > +}
> > 
> 
> one observation here is that those functions neither turn off nor
> disable printk_kthread. we mark the point after which we will not
> wake_up printk_kthread, but that does not mean that printk_kthread
> is already or soon will be inactive. it actually can be in running
> state. so the name

Yup, allow/deny might might be better than on/off.

> hm, I certainly see what you meant here, but I suspect this naming may
> be a bit misleading - "so printk_deferred_mode_off() disables printk_deferred()?"

True, sigh.

> > Also it is an already know term and a more generic name. This API
> > is used globally while the kthread is an implementation detail.
> > The offloading might be done another way in the future.
> 
> yes, this is why I avoided mentioning "printk_kthread" (directly)
> in API naming. console_printing_thread is sort of neutral (well,
> sort of). not insisting that the naming is perfect, of course.

It is too close to "printk_thread" used on different place.
So, it confuses me :-)


> the LOGLEVEL_SCHED thing is completely different tho. it tells us that
> neither of the above is safe -- both wake_up() and console_trylock()
> can potentially call into the scheduler. so I'm not sure we can easily
> replace LOGLEVEL_SCHED with `printk_kthread_disable'.

Grr, you are right. LOGLEVEL_SHED is deferred another way.

Let me do one more attempt for a generic name. What about?

    printk_console_press();    or try_harder() or push()
    printk_console_relax();

or something like this?

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC][PATCH 0/4] printk: introduce printing kernel thread
  2017-03-22 17:59 ` [RFC][PATCH 0/4] printk: introduce printing kernel thread Peter Zijlstra
  2017-03-23  4:09   ` Sergey Senozhatsky
@ 2017-03-23 12:01   ` Petr Mladek
  1 sibling, 0 replies; 29+ messages in thread
From: Petr Mladek @ 2017-03-23 12:01 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Sergey Senozhatsky, Steven Rostedt, Andrew Morton,
	Linus Torvalds, Rafael J . Wysocki, linux-kernel,
	Sergey Senozhatsky

On Wed 2017-03-22 18:59:20, Peter Zijlstra wrote:
> On Mon, Mar 06, 2017 at 09:45:50PM +0900, Sergey Senozhatsky wrote:
> >  sysrq is potentially even trickier. can we always wake_up() kernel
> >  thread from sysrq? there probably might be cases when we can't rely
> >  on the scheduler.
> 
> sysrq runs from interrupt context, right? Should be able to do wakeups.

It would make sense to actually switch to the old mode when
handling sysrq. At least for some requests that are used
for debugging when the system is not responsible.

It is pity that it is the irq context that is prone to
softlocks. But this might be the only way to actually
see the messages.

Tetsuo already suggested to use the old mode for SysRq-t, see
https://lkml.kernel.org/r/201612261954.FJE69201.OFLVtFJSQFOHMO@I-love.SAKURA.ne.jp

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC][PATCH 2/4] printk: offload printing from wake_up_klogd_work_func()
  2017-03-23  9:00         ` Sergey Senozhatsky
@ 2017-03-23 12:11           ` Petr Mladek
  0 siblings, 0 replies; 29+ messages in thread
From: Petr Mladek @ 2017-03-23 12:11 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Steven Rostedt, Andrew Morton, Linus Torvalds, Peter Zijlstra,
	Rafael J . Wysocki, linux-kernel, Sergey Senozhatsky

On Thu 2017-03-23 18:00:42, Sergey Senozhatsky wrote:
> On (03/20/17 17:09), Petr Mladek wrote:
> [..]
> > 
> > BTW: wake_up_klogd_work does not need to be per-CPU as well.
> > irq_work infrastructure heavily uses per-CPU variables.
> > But a global struct irq_work is safe, see irq_work_claim().
> 
> so I have a patch that turns wake_up_klogd_work into a global variable,
> out of curiosity, but I'm not entire sure about it. the sort of a problem
> is that queued irq_works still go into a per-CPU run_lists.  per-CPU
> wake_up_klogd_work permits us to queue irq work on several CPUs so we
> might have better chances to execute wake_up_klogd_work_func(), while
> global wake_up_klogd_work will be only in one run_list. this can defer
> wake_up_klogd_work processing until that particular single CPU handles
> its interrupt and calls irq_work_run_list(). what do you think?

Good question! I personally think that it should not cause a big harm
but I am not completely sure. It might need some more testing.

Let's postpone this change and do it alone in the future.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC][PATCH 0/4] printk: introduce printing kernel thread
  2017-03-23  8:51     ` Peter Zijlstra
@ 2017-03-24  1:59       ` Sergey Senozhatsky
  2017-03-24  4:43         ` Sergey Senozhatsky
  2017-03-24 14:43         ` Petr Mladek
  0 siblings, 2 replies; 29+ messages in thread
From: Sergey Senozhatsky @ 2017-03-24  1:59 UTC (permalink / raw)
  To: Peter Zijlstra, Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Andrew Morton,
	Linus Torvalds, Rafael J . Wysocki, linux-kernel,
	Sergey Senozhatsky

On (03/23/17 09:51), Peter Zijlstra wrote:
[..]
> > > sysrq runs from interrupt context, right? Should be able to do wakeups.
> > 
> > what I though about was -
> > 	what if there are 'misbehaving' higher prio tasks all the time?
> > 	the existing sysrq would attempt to do printing from irq context
> > 	so it doesn't care about run queues.
> > 
> > does it make sense to you?
> 
> Ah, that's what you meant. Yeah, dunno, I'm still unconvinced about the
> whole printk thread thing.

I see your point.
but I can't think of alternatives that would fix all those lockups and
stalls and at the same time have better guarantees than printk_kthread.


> Also those function names are horrifically long.

right. not happy with the naming either.

so what I'm thinking about right now is:

we have that thing which we call "old printk" mode, which is not
really informative. and my proposal is rename "old" mode and use
"printk rescue" mode instead. because we switch to that mode when
we are trying to "rescue" kernel logs. so the API can be something
like
		printk_rescue_on()
		printk_rescue_off()

opinions?



--- random thoughts ---

another thing that bothers me a bit is that we need to place those
printk_rescue_on/printk_rescue_off switches all over the kernel.
sort of a root cause [in some of the cases] here is the fact that
we don't have any feedback from printk_kthread in vprintk_emit():
	does printk_kthread make any progress?
	do we flush messages to the serial console?
	etc.

and we've got everything we need to have such a feedback in
vprintk_emit():

	a) console is not suspended so console_unlock() can call console drivers
	b) printk_kthread != NULL
	c) we are not in enforced rescue/emergency mode
	d) `log_next_seq' moves forward (always `true', we are in vprintk_emit())
	e) `console_seq' stands still

so we can have an automatic rescue mode fallback in vprintk_emit().
if (a)-(e) are true then we give up on waking up printk_kthread,
switch to rescue mode and attempt to console_trylock() directly from
vprintk_emit(). the part that sucks here is that we need to give
printk_kthread some time to catch up. for instance, if (e) is true
for the past 50 invocations of vprintk_emit(), IOW:

	- we added 50 lines to printk
	- none have been printed on the serial console

then we
	- declare rescue
	- do console_trylock() instead of wake_up() //unless in deferred vprintk_emit()


50 is completely made up. can be 100, or 10, or anything else.


and we can jump between rescue-normal printk modes, so this should
also return back the throttling of tasks which printk() a lot (item
(2) in patch set cover letter) that we used to have.


thoughts?

	-ss

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC][PATCH 0/4] printk: introduce printing kernel thread
  2017-03-24  1:59       ` Sergey Senozhatsky
@ 2017-03-24  4:43         ` Sergey Senozhatsky
  2017-03-24 14:43         ` Petr Mladek
  1 sibling, 0 replies; 29+ messages in thread
From: Sergey Senozhatsky @ 2017-03-24  4:43 UTC (permalink / raw)
  To: Peter Zijlstra, Petr Mladek
  Cc: Steven Rostedt, Andrew Morton, Linus Torvalds,
	Rafael J . Wysocki, linux-kernel, Sergey Senozhatsky,
	Sergey Senozhatsky

On (03/24/17 10:59), Sergey Senozhatsky wrote:
[..]
> and we can jump between rescue-normal printk modes, so this should
> also return back the throttling of tasks which printk() a lot (item
> (2) in patch set cover letter) that we used to have.

ok, I obviously lied here "this should also return back the throttling".
throttling is a bit hard to implement properly.


automatic printk rescue mode on console output stall [when we can't
wake_up printk_kthread] is not so difficult. something like below.

may be it's too simple minded. // it does not handle throttling.


---
 kernel/printk/printk.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index e84b9fbb298f..005741486376 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -447,6 +447,12 @@ static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN);
 static char *log_buf = __log_buf;
 static u32 log_buf_len = __LOG_BUF_LEN;
 
+/*
+ * How many pending messages we can have in the logbuf before we
+ * switch to printk rescue mode (stop waking up printk_kthread).
+ */
+#define CONSOLE_STALL_MESSAGES_LIMIT	50
+
 static struct task_struct *printk_kthread __read_mostly;
 /*
  * We can't call into the scheduler (wake_up() printk kthread) during
@@ -1739,6 +1745,11 @@ static size_t log_output(int facility, int level, enum log_flags lflags, const c
 	return log_store(facility, level, lflags, 0, dict, dictlen, text, text_len);
 }
 
+static bool console_output_stall(void)
+{
+	return (log_next_seq - console_seq) > CONSOLE_STALL_MESSAGES_LIMIT;
+}
+
 asmlinkage int vprintk_emit(int facility, int level,
 			    const char *dict, size_t dictlen,
 			    const char *fmt, va_list args)
@@ -1750,6 +1761,7 @@ asmlinkage int vprintk_emit(int facility, int level,
 	unsigned long flags;
 	int printed_len = 0;
 	bool in_sched = false;
+	bool printk_stall;
 
 	if (level == LOGLEVEL_SCHED) {
 		level = LOGLEVEL_DEFAULT;
@@ -1811,6 +1823,7 @@ asmlinkage int vprintk_emit(int facility, int level,
 
 	printed_len += log_output(facility, level, lflags, dict, dictlen, text, text_len);
 
+	printk_stall = console_output_stall();
 	set_bit(PRINTK_PENDING_OUTPUT, &printk_pending);
 	logbuf_unlock_irqrestore(flags);
 
@@ -1824,7 +1837,7 @@ asmlinkage int vprintk_emit(int facility, int level,
 		 * from a dedicated printk_kthread, which always runs in
 		 * schedulable context.
 		 */
-		if (printk_kthread_enabled()) {
+		if (!printk_stall && printk_kthread_enabled()) {
 			printk_safe_enter_irqsave(flags);
 			wake_up_process(printk_kthread);
 			printk_safe_exit_irqrestore(flags);
-- 
2.12.1

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [RFC][PATCH 1/4] printk: introduce printing kernel thread
  2017-03-23 10:40       ` Petr Mladek
@ 2017-03-24  5:20         ` Sergey Senozhatsky
  0 siblings, 0 replies; 29+ messages in thread
From: Sergey Senozhatsky @ 2017-03-24  5:20 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Andrew Morton,
	Linus Torvalds, Peter Zijlstra, Rafael J . Wysocki, linux-kernel,
	Sergey Senozhatsky, Jan Kara

On (03/23/17 11:40), Petr Mladek wrote:
[..]
> Let me do one more attempt for a generic name. What about?
> 
>     printk_console_press();    or try_harder() or push()
>     printk_console_relax();
> 
> or something like this?

I replied in another thread.
how about printk_rescue?

	-ss

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC][PATCH 0/4] printk: introduce printing kernel thread
  2017-03-24  1:59       ` Sergey Senozhatsky
  2017-03-24  4:43         ` Sergey Senozhatsky
@ 2017-03-24 14:43         ` Petr Mladek
  2017-03-25  0:18           ` Sergey Senozhatsky
  1 sibling, 1 reply; 29+ messages in thread
From: Petr Mladek @ 2017-03-24 14:43 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Peter Zijlstra, Steven Rostedt, Andrew Morton, Linus Torvalds,
	Rafael J . Wysocki, linux-kernel, Sergey Senozhatsky

On Fri 2017-03-24 10:59:36, Sergey Senozhatsky wrote:
> On (03/23/17 09:51), Peter Zijlstra wrote:
> [..]
> > > > sysrq runs from interrupt context, right? Should be able to do wakeups.
> > > 
> > > what I though about was -
> > > 	what if there are 'misbehaving' higher prio tasks all the time?
> > > 	the existing sysrq would attempt to do printing from irq context
> > > 	so it doesn't care about run queues.
> > > 
> > > does it make sense to you?
> > 
> > Ah, that's what you meant. Yeah, dunno, I'm still unconvinced about the
> > whole printk thread thing.
> 
> I see your point.
> but I can't think of alternatives that would fix all those lockups and
> stalls and at the same time have better guarantees than printk_kthread.
> 
> 
> > Also those function names are horrifically long.
> 
> right. not happy with the naming either.
> 
> so what I'm thinking about right now is:
> 
> we have that thing which we call "old printk" mode, which is not
> really informative. and my proposal is rename "old" mode and use
> "printk rescue" mode instead. because we switch to that mode when
> we are trying to "rescue" kernel logs. so the API can be something
> like
> 		printk_rescue_on()
> 		printk_rescue_off()

Sounds good to me. Slight problem is that off() does not cause
stopping the mode if we are nested.

Just one more attempt inspired by this:

		printk_emergency_begin()
		printk_emergency_end()

Note that we actually start this mode automatically also
with pr_emerg() message.

But I am fine with whatever from the mentioned generic names.

> 
> --- random thoughts ---
> 
> another thing that bothers me a bit is that we need to place those
> printk_rescue_on/printk_rescue_off switches all over the kernel.
> sort of a root cause [in some of the cases] here is the fact that
> we don't have any feedback from printk_kthread in vprintk_emit():
> 	does printk_kthread make any progress?
> 	do we flush messages to the serial console?
> 	etc.
> 
> and we've got everything we need to have such a feedback in
> vprintk_emit():
> 
> 	a) console is not suspended so console_unlock() can call console drivers
> 	b) printk_kthread != NULL
> 	c) we are not in enforced rescue/emergency mode
> 	d) `log_next_seq' moves forward (always `true', we are in vprintk_emit())
> 	e) `console_seq' stands still
> 
> so we can have an automatic rescue mode fallback in vprintk_emit().
> if (a)-(e) are true then we give up on waking up printk_kthread,
> switch to rescue mode and attempt to console_trylock() directly from
> vprintk_emit(). the part that sucks here is that we need to give
> printk_kthread some time to catch up. for instance, if (e) is true
> for the past 50 invocations of vprintk_emit(), IOW:
> 
> 	- we added 50 lines to printk
> 	- none have been printed on the serial console
>
> then we
> 	- declare rescue
> 	- do console_trylock() instead of wake_up() //unless in deferred vprintk_emit()

I am not sure if we are able to distinguish a flood of messages
from a real emergency situation.

If we start flushing messages directly when there is a flood
of messages, we will put back the original problem with soft
lookups.

Well, there is a handful of annotated locations at the moment.
I would start thinking of an automatic detection once we have
more of them and have more data for a good heuristic.

I still would like to see the kernel parameter/sysfs knob
that would allow to force the rescue/emergency mode all
the time ;-)

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC][PATCH 0/4] printk: introduce printing kernel thread
  2017-03-24 14:43         ` Petr Mladek
@ 2017-03-25  0:18           ` Sergey Senozhatsky
  0 siblings, 0 replies; 29+ messages in thread
From: Sergey Senozhatsky @ 2017-03-25  0:18 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Peter Zijlstra, Steven Rostedt,
	Andrew Morton, Linus Torvalds, Rafael J . Wysocki, linux-kernel,
	Sergey Senozhatsky

On (03/24/17 15:43), Petr Mladek wrote:
[..]
> > we have that thing which we call "old printk" mode, which is not
> > really informative. and my proposal is rename "old" mode and use
> > "printk rescue" mode instead. because we switch to that mode when
> > we are trying to "rescue" kernel logs. so the API can be something
> > like
> > 		printk_rescue_on()
> > 		printk_rescue_off()
> 
> Sounds good to me. Slight problem is that off() does not cause
> stopping the mode if we are nested.
> 
> Just one more attempt inspired by this:
> 
> 		printk_emergency_begin()
> 		printk_emergency_end()
> 
> Note that we actually start this mode automatically also
> with pr_emerg() message.

good. printk_emergency sounds OK to me.

[..]
> > and we've got everything we need to have such a feedback in
> > vprintk_emit():
> > 
> > 	a) console is not suspended so console_unlock() can call console drivers
> > 	b) printk_kthread != NULL
> > 	c) we are not in enforced rescue/emergency mode
> > 	d) `log_next_seq' moves forward (always `true', we are in vprintk_emit())
> > 	e) `console_seq' stands still
> > 
> > so we can have an automatic rescue mode fallback in vprintk_emit().
> > if (a)-(e) are true then we give up on waking up printk_kthread,
> > switch to rescue mode and attempt to console_trylock() directly from
> > vprintk_emit(). the part that sucks here is that we need to give
> > printk_kthread some time to catch up. for instance, if (e) is true
> > for the past 50 invocations of vprintk_emit(), IOW:
> > 
> > 	- we added 50 lines to printk
> > 	- none have been printed on the serial console
> >
> > then we
> > 	- declare rescue
> > 	- do console_trylock() instead of wake_up() //unless in deferred vprintk_emit()
> 
> I am not sure if we are able to distinguish a flood of messages
> from a real emergency situation.

that's one of problems. yes.


> Well, there is a handful of annotated locations at the moment.
> I would start thinking of an automatic detection once we have
> more of them and have more data for a good heuristic.

so my 'automatic emergency' switch is for another case. when we
wake_up() printk_kthread, but it never prints anything. could be
caused by a missing printk_emergency annotation, or something else.


> I still would like to see the kernel parameter/sysfs knob
> that would allow to force the rescue/emergency mode all
> the time ;-)

sure.

	-ss

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC][PATCH 0/4] printk: introduce printing kernel thread
  2017-03-06 12:45 [RFC][PATCH 0/4] printk: introduce printing kernel thread Sergey Senozhatsky
                   ` (4 preceding siblings ...)
  2017-03-22 17:59 ` [RFC][PATCH 0/4] printk: introduce printing kernel thread Peter Zijlstra
@ 2017-04-03 11:53 ` Sergey Senozhatsky
  2017-04-04 12:59   ` Petr Mladek
  5 siblings, 1 reply; 29+ messages in thread
From: Sergey Senozhatsky @ 2017-04-03 11:53 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Steven Rostedt, Andrew Morton, Linus Torvalds, Peter Zijlstra,
	Rafael J . Wysocki, linux-kernel, Sergey Senozhatsky,
	Sergey Senozhatsky

On (03/06/17 21:45), Sergey Senozhatsky wrote:
[..]
>  printk kthread changes the behavior of printk in one _corner case_.
>  The corner case is quite interesting and actually consists of two corner
>  cases. Suppose on SMP system there is only one CPU that printk()-s a lot,
>  the rest of CPUs don't lock console_sem and don't printk(). Previously
>  that printing CPU had been throttling itself (*) because of console drivers
>  call for every printk():
> 
>           CPU0
> 
>        printk("a")
>         console_unlock()
>          call_console_drivers("a")
> 
>          ...
> 
>        printk("z")
>         console_unlock()
>          call_console_drivers("z")
> 
>  * Given that no other CPU locks the console_sem.
> 
>  With printk kthread the case turns into this one:
> 
>           CPU0                               CPU1
> 
>        printk("a")
>        wake_up printk_kthread
>        ...                               printk_kthread
>        printk("k")                        console_unlock()
>        ...                                 call_console_drivers("a")
>        printk("z")                         call_console_drivers("b")
>                                            call_console_drivers("c")
>                                            ...
> 
> 
>  The second 'corner case' part here is that CPU0 may be much faster
>  than printing CPU, which may result in dropped printk messages.
> 
>  This all is absolutely possible even with out the printk-kthread.
>  A single console_lock() call from CPUx will result in exactly the
>  same condition. So it's not necessarily a regression. But there may
>  be some scenarios in the kernel that may suffer from this change.
>  From the top of my head -- sysrq backtrace dump, and, probably, OOM
>  print out and backtrace dump.

there is another possibility here.

being always reschedulable potentially can put us at risk of having
unpleasant situations when printk_kthread is getting preempted too
often (well, who knows what can happen on the system), which can slow
down logbuf emit process (printk_kthread) up to the point when printk()
CPUs will force log_store() to begin dropping the messages. this can
happen.

	-ss

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC][PATCH 0/4] printk: introduce printing kernel thread
  2017-04-03 11:53 ` Sergey Senozhatsky
@ 2017-04-04 12:59   ` Petr Mladek
  0 siblings, 0 replies; 29+ messages in thread
From: Petr Mladek @ 2017-04-04 12:59 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Steven Rostedt, Andrew Morton, Linus Torvalds, Peter Zijlstra,
	Rafael J . Wysocki, linux-kernel, Sergey Senozhatsky

On Mon 2017-04-03 20:53:11, Sergey Senozhatsky wrote:
> On (03/06/17 21:45), Sergey Senozhatsky wrote:
> [..]
> >  printk kthread changes the behavior of printk in one _corner case_.
> >  The corner case is quite interesting and actually consists of two corner
> >  cases. Suppose on SMP system there is only one CPU that printk()-s a lot,
> >  the rest of CPUs don't lock console_sem and don't printk(). Previously
> >  that printing CPU had been throttling itself (*) because of console drivers
> >  call for every printk():
> > 
> >           CPU0
> > 
> >        printk("a")
> >         console_unlock()
> >          call_console_drivers("a")
> > 
> >          ...
> > 
> >        printk("z")
> >         console_unlock()
> >          call_console_drivers("z")
> > 
> >  * Given that no other CPU locks the console_sem.
> > 
> >  With printk kthread the case turns into this one:
> > 
> >           CPU0                               CPU1
> > 
> >        printk("a")
> >        wake_up printk_kthread
> >        ...                               printk_kthread
> >        printk("k")                        console_unlock()
> >        ...                                 call_console_drivers("a")
> >        printk("z")                         call_console_drivers("b")
> >                                            call_console_drivers("c")
> >                                            ...
> > 
> > 
> >  The second 'corner case' part here is that CPU0 may be much faster
> >  than printing CPU, which may result in dropped printk messages.
> > 
> >  This all is absolutely possible even with out the printk-kthread.
> >  A single console_lock() call from CPUx will result in exactly the
> >  same condition. So it's not necessarily a regression. But there may
> >  be some scenarios in the kernel that may suffer from this change.
> >  From the top of my head -- sysrq backtrace dump, and, probably, OOM
> >  print out and backtrace dump.
> 
> there is another possibility here.
> 
> being always reschedulable potentially can put us at risk of having
> unpleasant situations when printk_kthread is getting preempted too
> often (well, who knows what can happen on the system), which can slow
> down logbuf emit process (printk_kthread) up to the point when printk()
> CPUs will force log_store() to begin dropping the messages. this can
> happen.

I believe that this will be rather a corner case. If it happens, we
could do something with scheduling priority and policy. Also there
is the possibility to fallback to the old mode.

We use some variants of the printk offload on SLE for years and
I am not aware of any complains of this sort.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2017-04-04 13:00 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-06 12:45 [RFC][PATCH 0/4] printk: introduce printing kernel thread Sergey Senozhatsky
2017-03-06 12:45 ` [RFC][PATCH 1/4] " Sergey Senozhatsky
2017-03-22 16:40   ` Petr Mladek
2017-03-23  5:12     ` Sergey Senozhatsky
2017-03-23 10:40       ` Petr Mladek
2017-03-24  5:20         ` Sergey Senozhatsky
2017-03-06 12:45 ` [RFC][PATCH 2/4] printk: offload printing from wake_up_klogd_work_func() Sergey Senozhatsky
2017-03-17 12:19   ` Petr Mladek
2017-03-18  9:57     ` Sergey Senozhatsky
2017-03-20 16:09       ` Petr Mladek
2017-03-21  4:01         ` Sergey Senozhatsky
2017-03-23  9:00         ` Sergey Senozhatsky
2017-03-23 12:11           ` Petr Mladek
2017-03-06 12:45 ` [RFC][PATCH 3/4] kernel, power: disable printk_kthread in unsafe places Sergey Senozhatsky
2017-03-22 15:38   ` Petr Mladek
2017-03-22 15:38     ` Petr Mladek
2017-03-06 12:45 ` [RFC][PATCH 4/4] printk: enable printk offloading Sergey Senozhatsky
2017-03-22 15:43   ` Petr Mladek
2017-03-22 16:40     ` Sergey Senozhatsky
2017-03-22 17:59 ` [RFC][PATCH 0/4] printk: introduce printing kernel thread Peter Zijlstra
2017-03-23  4:09   ` Sergey Senozhatsky
2017-03-23  8:51     ` Peter Zijlstra
2017-03-24  1:59       ` Sergey Senozhatsky
2017-03-24  4:43         ` Sergey Senozhatsky
2017-03-24 14:43         ` Petr Mladek
2017-03-25  0:18           ` Sergey Senozhatsky
2017-03-23 12:01   ` Petr Mladek
2017-04-03 11:53 ` Sergey Senozhatsky
2017-04-04 12:59   ` Petr Mladek

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.