[PATCH 1/2] printk: Relocate wake_klogd check close to the end of console

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 1/2] printk: Relocate wake_klogd check close to the end of console_unlock()
@ 2017-11-02 13:51 Tejun Heo
  2017-11-02 13:52 ` [PATCH 2/2] Subject: printk: Don't trap random context in infinite log_buf flush Tejun Heo
  2017-11-07  4:40 ` [PATCH 1/2] printk: Relocate wake_klogd check close to the end of console_unlock() Sergey Senozhatsky
  0 siblings, 2 replies; 13+ messages in thread
From: Tejun Heo @ 2017-11-02 13:51 UTC (permalink / raw)
  To: Petr Mladek, Sergey Senozhatsky, Steven Rostedt; +Cc: linux-kernel, kernel-team

We mark for waking up klogd whenever we see a new message sequence in
the main loop.  However, the actual wakeup is always at the end of the
function and we can easily test for the wakeup condition when we do
the final should-we-repeat check.

Move the wake_klogd condition check out of the main loop.  This avoids
doing the same thing repeatedly and groups similar checks into a
common place.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/printk/printk.c |   17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2184,10 +2184,6 @@ again:
 
 		printk_safe_enter_irqsave(flags);
 		raw_spin_lock(&logbuf_lock);
-		if (seen_seq != log_next_seq) {
-			wake_klogd = true;
-			seen_seq = log_next_seq;
-		}
 
 		if (console_seq < log_first_seq) {
 			len = sprintf(text, "** %u printk messages dropped ** ",
@@ -2248,12 +2244,17 @@ skip:
 	up_console_sem();
 
 	/*
-	 * Someone could have filled up the buffer again, so re-check if there's
-	 * something to flush. In case we cannot trylock the console_sem again,
-	 * there's a new owner and the console_unlock() from them will do the
-	 * flush, no worries.
+	 * Check whether userland needs notification.  Also, someone could
+	 * have filled up the buffer again, so re-check if there's
+	 * something to flush. In case we cannot trylock the console_sem
+	 * again, there's a new owner and the console_unlock() from them
+	 * will do the flush, no worries.
 	 */
 	raw_spin_lock(&logbuf_lock);
+	if (seen_seq != log_next_seq) {
+		wake_klogd = true;
+		seen_seq = log_next_seq;
+	}
 	retry = console_seq != log_next_seq;
 	raw_spin_unlock(&logbuf_lock);
 	printk_safe_exit_irqrestore(flags);

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 2/2] Subject: printk: Don't trap random context in infinite log_buf flush
  2017-11-02 13:51 [PATCH 1/2] printk: Relocate wake_klogd check close to the end of console_unlock() Tejun Heo
@ 2017-11-02 13:52 ` Tejun Heo
  2017-11-02 14:51   ` [PATCH TRIVIAL UPDATE 2/2] " Tejun Heo
  2017-11-04  4:24   ` [PATCH 2/2] Subject: " Sergey Senozhatsky
  2017-11-07  4:40 ` [PATCH 1/2] printk: Relocate wake_klogd check close to the end of console_unlock() Sergey Senozhatsky
  1 sibling, 2 replies; 13+ messages in thread
From: Tejun Heo @ 2017-11-02 13:52 UTC (permalink / raw)
  To: Petr Mladek, Sergey Senozhatsky, Steven Rostedt
  Cc: linux-kernel, kernel-team, Linus Torvalds, Andrew Morton

When printk flushing isn't contended, whoever creates messages get to
flush them, which is great in both fairness and keeping log delivery
synchronous.  However, when console drivers can't keep up with the
rate of new messages, which happens a lot, who ends up with the
flushing duty is determined by timing and everyone else's messages
become asynchronous.

Unfortunately, the arbitrarily chosen flusher can easily be someone
calling printk from a non-sleepable and/or non-preemptible context,
like while holding a spinlock.  This means that the flusher, who can
be stuck in the flushing duty for a long time can't yield or be
preempted, pegging the CPU.  This can lead to RCU stall warnings, hung
task warnings and work_on_cpu() stalls and all other sorts of
failures, some of which may generate more printk messages to warn
about the condition further pegging the flusher and grinding the whole
system to a halt.

Even if preemptible, it can be really harmful to peg a random task in
flushing duty for a very long time.  There can easily be locking or
other dependency chains where such pegging can lead to more messages
leading to meltdown of the system.

The problem can be solved by introducing a dedicated aync flush worker
and without harming any of the synchroncity guarantees.  This patch
udpates console_unlock() that each non-flusher caller is responsible
for only upto the message that it sees on the first iteration which is
guaranteed to include the message it printed, if it did any.

If more messages came in while flushing was in progress, which means
that those messages are already async, it schedules the dedicated
flusher which keeps flushing until empty.

The following repro makes the system completely unusable without this
patch.  After the patch, the system stays completely functional no
matter how long the console stays saturated.

  #include <linux/module.h>
  #include <linux/delay.h>
  #include <linux/sched.h>
  #include <linux/mutex.h>
  #include <linux/workqueue.h>

  static DEFINE_MUTEX(test_mutex);
  static bool stop_testing;

  static void atomic_printk_workfn(struct work_struct *work)
  {
	  while (!READ_ONCE(stop_testing)) {
		  msleep(100);
		  mutex_lock(&test_mutex);
		  mutex_unlock(&test_mutex);
		  preempt_disable();
		  printk("XXX ATOMIC\n");
		  preempt_enable();
		  cond_resched();
	  }
  }
  static DECLARE_WORK(atomic_printk_work, atomic_printk_workfn);

  static void hog_printk_workfn(struct work_struct *work)
  {
	  while (!READ_ONCE(stop_testing)) {
		  mutex_lock(&test_mutex);
		  printk("XXX HOG\n");
		  mutex_unlock(&test_mutex);
		  cond_resched();
	  }
  }
  static DECLARE_WORK(hog_printk_work, hog_printk_workfn);

  static int __init test_init(void)
  {
	  queue_work_on(0, system_wq, &atomic_printk_work);
	  msleep(100);
	  queue_work_on(1, system_wq, &hog_printk_work);

	  return 0;
  }

  static void __exit test_exit(void)
  {
	  WRITE_ONCE(stop_testing, true);
	  flush_work(&atomic_printk_work);
	  flush_work(&hog_printk_work);
  }

  module_init(test_init);
  module_exit(test_exit);

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
---
 kernel/printk/printk.c |   66 +++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 58 insertions(+), 8 deletions(-)

--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -45,6 +45,7 @@
 #include <linux/utsname.h>
 #include <linux/ctype.h>
 #include <linux/uio.h>
+#include <linux/kthread.h>
 #include <linux/sched/clock.h>
 #include <linux/sched/debug.h>
 #include <linux/sched/task_stack.h>
@@ -436,6 +437,12 @@ static char __log_buf[__LOG_BUF_LEN] __a
 static char *log_buf = __log_buf;
 static u32 log_buf_len = __LOG_BUF_LEN;
 
+/* async flush */
+static struct kthread_worker *console_async_worker;
+
+static void console_async_workfn(struct kthread_work *work);
+static DEFINE_KTHREAD_WORK(console_async_work, console_async_workfn);
+
 /* Return log buffer address */
 char *log_buf_addr_get(void)
 {
@@ -2092,6 +2099,17 @@ int is_console_locked(void)
 	return console_locked;
 }
 
+static void console_async_workfn(struct kthread_work *work)
+{
+	console_lock();
+	console_unlock();
+}
+
+static bool current_is_console_async(void)
+{
+	return console_async_worker && console_async_worker->task == current;
+}
+
 /*
  * Check if we have any console that is capable of printing while cpu is
  * booting or shutting down. Requires console_sem.
@@ -2141,7 +2159,8 @@ void console_unlock(void)
 	static u64 seen_seq;
 	unsigned long flags;
 	bool wake_klogd = false;
-	bool do_cond_resched, retry;
+	bool do_cond_resched, request_async;
+	u64 target_seq;
 
 	if (console_suspended) {
 		up_console_sem();
@@ -2165,6 +2184,7 @@ void console_unlock(void)
 	do_cond_resched = console_may_schedule;
 again:
 	console_may_schedule = 0;
+	target_seq = 0;
 
 	/*
 	 * We released the console_sem lock, so we need to recheck if
@@ -2185,6 +2205,18 @@ again:
 		printk_safe_enter_irqsave(flags);
 		raw_spin_lock(&logbuf_lock);
 
+		/*
+		 * This function can be called from any context and we
+		 * don't wanna get live-locked by others' messages.  Unless
+		 * we're the async worker, flush upto whatever is visible
+		 * on the first iteration which is guaranteed to include
+		 * the message this task printed if it did.  If there are
+		 * more messages to be printed, we'll punt to the async
+		 * worker.
+		 */
+		if (!target_seq || current_is_console_async())
+			target_seq = log_next_seq;
+
 		if (console_seq < log_first_seq) {
 			len = sprintf(text, "** %u printk messages dropped ** ",
 				      (unsigned)(log_first_seq - console_seq));
@@ -2196,7 +2228,7 @@ again:
 			len = 0;
 		}
 skip:
-		if (console_seq == log_next_seq)
+		if (console_seq >= target_seq)
 			break;
 
 		msg = log_from_idx(console_idx);
@@ -2246,21 +2278,33 @@ skip:
 	/*
 	 * Check whether userland needs notification.  Also, someone could
 	 * have filled up the buffer again, so re-check if there's
-	 * something to flush. In case we cannot trylock the console_sem
-	 * again, there's a new owner and the console_unlock() from them
-	 * will do the flush, no worries.
+	 * something to flush.
 	 */
 	raw_spin_lock(&logbuf_lock);
 	if (seen_seq != log_next_seq) {
 		wake_klogd = true;
 		seen_seq = log_next_seq;
 	}
-	retry = console_seq != log_next_seq;
+	request_async = console_seq != log_next_seq;
 	raw_spin_unlock(&logbuf_lock);
 	printk_safe_exit_irqrestore(flags);
 
-	if (retry && console_trylock())
-		goto again;
+	if (request_async) {
+		if (console_async_worker) {
+			/* async is online, punt */
+			kthread_queue_work(console_async_worker,
+					   &console_async_work);
+		} else {
+			/*
+			 * We're responsible for flushing.  In case we
+			 * cannot trylock the console_sem again, there's a
+			 * new owner and the console_unlock() from them
+			 * will do the flush, no worries.
+			 */
+			if (console_trylock())
+				goto again;
+		}
+	}
 
 	if (wake_klogd)
 		wake_up_klogd();
@@ -2647,6 +2691,12 @@ static int __init printk_late_init(void)
 	struct console *con;
 	int ret;
 
+	console_async_worker = kthread_create_worker(0 , "console_async");
+	if (IS_ERR(console_async_worker)) {
+		pr_err("console: failed to initialize async flusher\n");
+		console_async_worker = NULL;
+	}
+
 	for_each_console(con) {
 		if (!(con->flags & CON_BOOT))
 			continue;

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH TRIVIAL UPDATE 2/2] printk: Don't trap random context in infinite log_buf flush
  2017-11-02 13:52 ` [PATCH 2/2] Subject: printk: Don't trap random context in infinite log_buf flush Tejun Heo
@ 2017-11-02 14:51   ` Tejun Heo
  2017-11-04  4:24   ` [PATCH 2/2] Subject: " Sergey Senozhatsky
  1 sibling, 0 replies; 13+ messages in thread
From: Tejun Heo @ 2017-11-02 14:51 UTC (permalink / raw)
  To: Petr Mladek, Sergey Senozhatsky, Steven Rostedt
  Cc: linux-kernel, kernel-team, Linus Torvalds, Andrew Morton

When printk flushing isn't contended, whoever creates messages get to
flush them, which is great in both fairness and keeping log delivery
synchronous.  However, when console drivers can't keep up with the
rate of new messages, which happens a lot, who ends up with the
flushing duty is determined by timing and everyone else's messages
become asynchronous.

Unfortunately, the arbitrarily chosen flusher can easily be someone
calling printk from a non-sleepable and/or non-preemptible context,
like while holding a spinlock.  This means that the flusher, who can
be stuck in the flushing duty for a long time can't yield or be
preempted, pegging the CPU.  This can lead to RCU stall warnings, hung
task warnings and work_on_cpu() stalls and all other sorts of
failures, some of which may generate more printk messages to warn
about the condition further pegging the flusher and grinding the whole
system to a halt.

Even if preemptible, it can be really harmful to peg a random task in
flushing duty for a very long time.  There can easily be locking or
other dependency chains where such pegging can lead to more messages
leading to meltdown of the system.

The problem can be solved by introducing a dedicated aync flush worker
and without harming any of the synchroncity guarantees.  This patch
udpates console_unlock() that each non-flusher caller is responsible
for only upto the message that it sees on the first iteration which is
guaranteed to include the message it printed, if it did any.

If more messages came in while flushing was in progress, which means
that those messages are already async, it schedules the dedicated
flusher which keeps flushing until empty.

The following repro makes the system completely unusable without this
patch.  After the patch, the system stays completely functional no
matter how long the console stays saturated.

  #include <linux/module.h>
  #include <linux/delay.h>
  #include <linux/sched.h>
  #include <linux/mutex.h>
  #include <linux/workqueue.h>

  static DEFINE_MUTEX(test_mutex);
  static bool stop_testing;

  static void atomic_printk_workfn(struct work_struct *work)
  {
	  while (!READ_ONCE(stop_testing)) {
		  msleep(100);
		  mutex_lock(&test_mutex);
		  mutex_unlock(&test_mutex);
		  preempt_disable();
		  printk("XXX ATOMIC\n");
		  preempt_enable();
		  cond_resched();
	  }
  }
  static DECLARE_WORK(atomic_printk_work, atomic_printk_workfn);

  static void hog_printk_workfn(struct work_struct *work)
  {
	  while (!READ_ONCE(stop_testing)) {
		  mutex_lock(&test_mutex);
		  printk("XXX HOG\n");
		  mutex_unlock(&test_mutex);
		  cond_resched();
	  }
  }
  static DECLARE_WORK(hog_printk_work, hog_printk_workfn);

  static int __init test_init(void)
  {
	  queue_work_on(0, system_wq, &atomic_printk_work);
	  msleep(100);
	  queue_work_on(1, system_wq, &hog_printk_work);

	  return 0;
  }

  static void __exit test_exit(void)
  {
	  WRITE_ONCE(stop_testing, true);
	  flush_work(&atomic_printk_work);
	  flush_work(&hog_printk_work);
  }

  module_init(test_init);
  module_exit(test_exit);

v2: checkpatch style fix and comment added to console_async_workfn().

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
---
 kernel/printk/printk.c |   70 +++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 62 insertions(+), 8 deletions(-)

--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -45,6 +45,7 @@
 #include <linux/utsname.h>
 #include <linux/ctype.h>
 #include <linux/uio.h>
+#include <linux/kthread.h>
 #include <linux/sched/clock.h>
 #include <linux/sched/debug.h>
 #include <linux/sched/task_stack.h>
@@ -436,6 +437,12 @@ static char __log_buf[__LOG_BUF_LEN] __a
 static char *log_buf = __log_buf;
 static u32 log_buf_len = __LOG_BUF_LEN;
 
+/* async flush */
+static struct kthread_worker *console_async_worker;
+
+static void console_async_workfn(struct kthread_work *work);
+static DEFINE_KTHREAD_WORK(console_async_work, console_async_workfn);
+
 /* Return log buffer address */
 char *log_buf_addr_get(void)
 {
@@ -2092,6 +2099,21 @@ int is_console_locked(void)
 	return console_locked;
 }
 
+static void console_async_workfn(struct kthread_work *work)
+{
+	console_lock();
+	/*
+	 * The following unlock combined with the true return from
+	 * current_is_console_async() keeps flushing log_buf until drained.
+	 */
+	console_unlock();
+}
+
+static bool current_is_console_async(void)
+{
+	return console_async_worker && console_async_worker->task == current;
+}
+
 /*
  * Check if we have any console that is capable of printing while cpu is
  * booting or shutting down. Requires console_sem.
@@ -2141,7 +2163,8 @@ void console_unlock(void)
 	static u64 seen_seq;
 	unsigned long flags;
 	bool wake_klogd = false;
-	bool do_cond_resched, retry;
+	bool do_cond_resched, request_async;
+	u64 target_seq;
 
 	if (console_suspended) {
 		up_console_sem();
@@ -2165,6 +2188,7 @@ void console_unlock(void)
 	do_cond_resched = console_may_schedule;
 again:
 	console_may_schedule = 0;
+	target_seq = 0;
 
 	/*
 	 * We released the console_sem lock, so we need to recheck if
@@ -2185,6 +2209,18 @@ again:
 		printk_safe_enter_irqsave(flags);
 		raw_spin_lock(&logbuf_lock);
 
+		/*
+		 * This function can be called from any context and we
+		 * don't wanna get live-locked by others' messages.  Unless
+		 * we're the async worker, flush upto whatever is visible
+		 * on the first iteration which is guaranteed to include
+		 * the message this task printed if it did.  If there are
+		 * more messages to be printed, we'll punt to the async
+		 * worker.
+		 */
+		if (!target_seq || current_is_console_async())
+			target_seq = log_next_seq;
+
 		if (console_seq < log_first_seq) {
 			len = sprintf(text, "** %u printk messages dropped ** ",
 				      (unsigned)(log_first_seq - console_seq));
@@ -2196,7 +2232,7 @@ again:
 			len = 0;
 		}
 skip:
-		if (console_seq == log_next_seq)
+		if (console_seq >= target_seq)
 			break;
 
 		msg = log_from_idx(console_idx);
@@ -2246,21 +2282,33 @@ skip:
 	/*
 	 * Check whether userland needs notification.  Also, someone could
 	 * have filled up the buffer again, so re-check if there's
-	 * something to flush. In case we cannot trylock the console_sem
-	 * again, there's a new owner and the console_unlock() from them
-	 * will do the flush, no worries.
+	 * something to flush.
 	 */
 	raw_spin_lock(&logbuf_lock);
 	if (seen_seq != log_next_seq) {
 		wake_klogd = true;
 		seen_seq = log_next_seq;
 	}
-	retry = console_seq != log_next_seq;
+	request_async = console_seq != log_next_seq;
 	raw_spin_unlock(&logbuf_lock);
 	printk_safe_exit_irqrestore(flags);
 
-	if (retry && console_trylock())
-		goto again;
+	if (request_async) {
+		if (console_async_worker) {
+			/* async is online, punt */
+			kthread_queue_work(console_async_worker,
+					   &console_async_work);
+		} else {
+			/*
+			 * We're responsible for flushing.  In case we
+			 * cannot trylock the console_sem again, there's a
+			 * new owner and the console_unlock() from them
+			 * will do the flush, no worries.
+			 */
+			if (console_trylock())
+				goto again;
+		}
+	}
 
 	if (wake_klogd)
 		wake_up_klogd();
@@ -2647,6 +2695,12 @@ static int __init printk_late_init(void)
 	struct console *con;
 	int ret;
 
+	console_async_worker = kthread_create_worker(0, "console_async");
+	if (IS_ERR(console_async_worker)) {
+		pr_err("console: failed to initialize async flusher\n");
+		console_async_worker = NULL;
+	}
+
 	for_each_console(con) {
 		if (!(con->flags & CON_BOOT))
 			continue;

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] Subject: printk: Don't trap random context in infinite log_buf flush
  2017-11-02 13:52 ` [PATCH 2/2] Subject: printk: Don't trap random context in infinite log_buf flush Tejun Heo
  2017-11-02 14:51   ` [PATCH TRIVIAL UPDATE 2/2] " Tejun Heo
@ 2017-11-04  4:24   ` Sergey Senozhatsky
  2017-11-07  0:22     ` Tejun Heo
  1 sibling, 1 reply; 13+ messages in thread
From: Sergey Senozhatsky @ 2017-11-04  4:24 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Petr Mladek, Sergey Senozhatsky, Steven Rostedt, linux-kernel,
	kernel-team, Linus Torvalds, Andrew Morton

On (11/02/17 06:52), Tejun Heo wrote:
> 
> When printk flushing isn't contended, whoever creates messages get to
> flush them, which is great in both fairness and keeping log delivery
> synchronous.  However, when console drivers can't keep up with the
> rate of new messages, which happens a lot, who ends up with the
> flushing duty is determined by timing and everyone else's messages
> become asynchronous.


Hello Tejun,

thanks for the patch set. we are currently looking at another approach:
lkml.kernel.org/r/20171102134515.6eef16de@gandalf.local.home

would you be interested in taking a look?

there are some concerns, like a huge number of printk-s happening while
console_sem is locked. e.g. console_lock()/console_unlock() on one of the
CPUs, or console_lock(); printk(); ... printk(); console_unlock();

>
> the problem of "the last printk()", which will take
> over and do the flush.
> 
> CPU0                                    CPU1  ~  CPU99
> console_lock();
>                                        printk(); ... printk();
> console_unlock();
>                                            IRQ on CPU2
>                                             printk()
>                                              // take over console_sem
>                                              console_unlock()
>

	-ss

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] Subject: printk: Don't trap random context in infinite log_buf flush
  2017-11-04  4:24   ` [PATCH 2/2] Subject: " Sergey Senozhatsky
@ 2017-11-07  0:22     ` Tejun Heo
  2017-11-07  2:04       ` Sergey Senozhatsky
  2017-11-08 16:20       ` Steven Rostedt
  0 siblings, 2 replies; 13+ messages in thread
From: Tejun Heo @ 2017-11-07  0:22 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Petr Mladek, Steven Rostedt, linux-kernel, kernel-team,
	Linus Torvalds, Andrew Morton

Hello,

On Sat, Nov 04, 2017 at 01:24:08PM +0900, Sergey Senozhatsky wrote:
> thanks for the patch set. we are currently looking at another approach:
> lkml.kernel.org/r/20171102134515.6eef16de@gandalf.local.home
> 
> would you be interested in taking a look?
 
Hmm... It took some tweaking but the code at the end locks up the
machine w/ Steven's patch applied and it's not that contrived a case
(e.g. out of memory messages from packet tx/rx paths while OOM is in
progress).

> there are some concerns, like a huge number of printk-s happening while
> console_sem is locked. e.g. console_lock()/console_unlock() on one of the
> CPUs, or console_lock(); printk(); ... printk(); console_unlock();

Unless we make all messages fully synchronous, I don't think there's a
good solution for that and I don't think we wanna make everything
fully synchronous.  The information value of messages go down pretty
fast in a message deluge and as long as we stay synchronous in the
beginning, information loss usually isn't the problem.

The current code can lose a lot of messages but the way it loses
messages usually doesn't lead to loss of important information.  The
proposed patch doesn't materially change when and what we lose.  It
just moves the flush infinite loop into a safe context.

Thanks.

#include <linux/module.h>
#include <linux/delay.h>
#include <linux/sched.h>
#include <linux/mutex.h>
#include <linux/workqueue.h>
#include <linux/hrtimer.h>

static bool in_printk;
static bool stop_testing;
static struct hrtimer printk_timer;
static ktime_t timer_interval;

static enum hrtimer_restart printk_timerfn(struct hrtimer *timer)
{
	int i;

	if (READ_ONCE(in_printk))
		for (i = 0; i < 1000; i++)
			printk("%-80s\n", "XXX TIMER");

	hrtimer_forward_now(&printk_timer, timer_interval);
	return READ_ONCE(stop_testing) ? HRTIMER_NORESTART : HRTIMER_RESTART;
}	

static void preempt_printk_workfn(struct work_struct *work)
{
	int i;

	hrtimer_init(&printk_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
	printk_timer.function = printk_timerfn;
	timer_interval = ktime_set(0, NSEC_PER_MSEC);
	hrtimer_start(&printk_timer, timer_interval, HRTIMER_MODE_REL);

	while (!READ_ONCE(stop_testing)) {
		preempt_disable();
		WRITE_ONCE(in_printk, true);
		for (i = 0; i < 100; i++)
			printk("%-80s\n", "XXX PREEMPT");
		WRITE_ONCE(in_printk, false);
		preempt_enable();
		msleep(1);
	}
}
static DECLARE_WORK(preempt_printk_work, preempt_printk_workfn);

static int __init test_init(void)
{
	queue_work_on(0, system_wq, &preempt_printk_work);
	return 0;
}

static void __exit test_exit(void)
{
	WRITE_ONCE(stop_testing, true);
	flush_work(&preempt_printk_work);
	hrtimer_cancel(&printk_timer);
}

module_init(test_init);
module_exit(test_exit);
MODULE_LICENSE("GPL");

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] Subject: printk: Don't trap random context in infinite log_buf flush
  2017-11-07  0:22     ` Tejun Heo
@ 2017-11-07  2:04       ` Sergey Senozhatsky
  2017-11-07 13:23         ` Tejun Heo
  2017-11-08 16:20       ` Steven Rostedt
  1 sibling, 1 reply; 13+ messages in thread
From: Sergey Senozhatsky @ 2017-11-07  2:04 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Sergey Senozhatsky, Petr Mladek, Steven Rostedt, linux-kernel,
	kernel-team, Linus Torvalds, Andrew Morton

Hi Tejun,

On (11/06/17 16:22), Tejun Heo wrote:
> Hello,
> 
> On Sat, Nov 04, 2017 at 01:24:08PM +0900, Sergey Senozhatsky wrote:
> > thanks for the patch set. we are currently looking at another approach:
> > lkml.kernel.org/r/20171102134515.6eef16de@gandalf.local.home
> > 
> > would you be interested in taking a look?
>  
> Hmm... It took some tweaking but the code at the end locks up the
> machine w/ Steven's patch applied and it's not that contrived a case
> (e.g. out of memory messages from packet tx/rx paths while OOM is in
> progress).

thanks!

just to make sure. there is a typo in Steven's patch:

	while (!READ_ONCE(console_waiter))

should be

	while (READ_ONCE(console_waiter))

is this the "tweaking" you are talking about?

> > there are some concerns, like a huge number of printk-s happening while
> > console_sem is locked. e.g. console_lock()/console_unlock() on one of the
> > CPUs, or console_lock(); printk(); ... printk(); console_unlock();
> 
> Unless we make all messages fully synchronous, I don't think there's a
> good solution for that and I don't think we wanna make everything
> fully synchronous.

this is where it becomes complicated. offloading logic is not binary,
unfortunately. we normally want to offload; but not always. things
like sysrq or late PM warnings, or kexec, etc. want to stay fully sync,
regardless the consequences. some of sysrq prints out even do
touch_nmi_watchdog() and touch_all_softlockup_watchdogs(). current
printk-kthread patch set tries to consider those cases and to avoid
any offloading.

	-ss

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] printk: Relocate wake_klogd check close to the end of console_unlock()
  2017-11-02 13:51 [PATCH 1/2] printk: Relocate wake_klogd check close to the end of console_unlock() Tejun Heo
  2017-11-02 13:52 ` [PATCH 2/2] Subject: printk: Don't trap random context in infinite log_buf flush Tejun Heo
@ 2017-11-07  4:40 ` Sergey Senozhatsky
  1 sibling, 0 replies; 13+ messages in thread
From: Sergey Senozhatsky @ 2017-11-07  4:40 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Petr Mladek, Sergey Senozhatsky, Steven Rostedt, linux-kernel,
	kernel-team

On (11/02/17 06:51), Tejun Heo wrote:
> We mark for waking up klogd whenever we see a new message sequence in
> the main loop.  However, the actual wakeup is always at the end of the
> function and we can easily test for the wakeup condition when we do
> the final should-we-repeat check.
> 
> Move the wake_klogd condition check out of the main loop.  This avoids
> doing the same thing repeatedly and groups similar checks into a
> common place.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Cc: Petr Mladek <pmladek@suse.com>
> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>

looks good to me.

FWIW,

Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>

	-ss

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] Subject: printk: Don't trap random context in infinite log_buf flush
  2017-11-07  2:04       ` Sergey Senozhatsky
@ 2017-11-07 13:23         ` Tejun Heo
  2017-11-08  5:29           ` Sergey Senozhatsky
  2017-11-08 16:22           ` Steven Rostedt
  0 siblings, 2 replies; 13+ messages in thread
From: Tejun Heo @ 2017-11-07 13:23 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Sergey Senozhatsky, Petr Mladek, Steven Rostedt, linux-kernel,
	kernel-team, Linus Torvalds, Andrew Morton

Hello, Sergey.

On Tue, Nov 07, 2017 at 11:04:34AM +0900, Sergey Senozhatsky wrote:
> just to make sure. there is a typo in Steven's patch:
> 
> 	while (!READ_ONCE(console_waiter))
> 
> should be
> 
> 	while (READ_ONCE(console_waiter))
> 
> is this the "tweaking" you are talking about?

Oh, I was talking about tweaking the repro, but I'm not sure the above
would change anything.  The problem that the repro demonstrates is a
message deluge involving an non-sleepable flusher + local irq (or
other atomic contexts) message producer.

In the above case, none of the involved contexts can serve as the
flusher for a long time without messing up the system.  If you wanna
allow printks to be async without falling into these lockups, you
gotta introduce an independent safe context to flush from.

> > > there are some concerns, like a huge number of printk-s happening while
> > > console_sem is locked. e.g. console_lock()/console_unlock() on one of the
> > > CPUs, or console_lock(); printk(); ... printk(); console_unlock();
> > 
> > Unless we make all messages fully synchronous, I don't think there's a
> > good solution for that and I don't think we wanna make everything
> > fully synchronous.
> 
> this is where it becomes complicated. offloading logic is not binary,
> unfortunately. we normally want to offload; but not always. things
> like sysrq or late PM warnings, or kexec, etc. want to stay fully sync,
> regardless the consequences. some of sysrq prints out even do
> touch_nmi_watchdog() and touch_all_softlockup_watchdogs(). current
> printk-kthread patch set tries to consider those cases and to avoid
> any offloading.

Yeah, sure, selectively opting out of asynchronous operation is a
different (solvable) issue.  Also, just to be clear, the proposed
patch doesn't make any of these worse in any meaningful way - e.g. we
could end up trapping a nice 20 task pinned to an overloaded CPU in
the flusher role.

The following is a completely untested patch to show how we can put
the console in full sync mode, just the general idea.  I'm a bit
skeptical we really wanna do this given that we already (with or
without the patch) stay sync for most of these events due to the way
we go async, but, yeah, if we wanna do that, we can do that.

Thanks.

---
 kernel/printk/printk.c |   40 +++++++++++++++++++++++++++++++++++-----
 1 file changed, 35 insertions(+), 5 deletions(-)

--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -438,6 +438,7 @@ static char *log_buf = __log_buf;
 static u32 log_buf_len = __LOG_BUF_LEN;
 
 /* async flush */
+static bool console_sync_mode;
 static struct kthread_worker *console_async_worker;
 
 static void console_async_workfn(struct kthread_work *work);
@@ -2051,7 +2052,8 @@ void console_lock(void)
 {
 	might_sleep();
 
-	down_console_sem();
+	down_console_sem();	// make it return w/ irq disabled on locked
+				// and don't reenable till unlock if sync mode
 	if (console_suspended)
 		return;
 	console_locked = 1;
@@ -2069,8 +2071,22 @@ EXPORT_SYMBOL(console_lock);
  */
 int console_trylock(void)
 {
-	if (down_trylock_console_sem())
+	unsigned long flags;
+	bool locked;
+
+	printk_safe_enter_irqsave(flags);
+	do {
+		// we can make this nest safe if necessary by remembering the holding cpu
+		locked = down_trylock_console_sem();
+		if (locked)
+			break;
+		cpu_relax();
+	} while (!locked && READ_ONCE(console_sync_mode));
+	printk_safe_exit_irqrestore(flags);
+
+	if (!locked)
 		return 0;
+
 	if (console_suspended) {
 		up_console_sem();
 		return 0;
@@ -2087,7 +2103,8 @@ int console_trylock(void)
 	 * rcu_preempt_depth(), otherwise RCU read sections modify
 	 * preempt_count().
 	 */
-	console_may_schedule = !oops_in_progress &&
+	console_may_schedule = !console_sync_mode &&
+			!oops_in_progress &&
 			preemptible() &&
 			!rcu_preempt_depth();
 	return 1;
@@ -2218,7 +2235,8 @@ again:
 		 * more messages to be printed, we'll punt to the async
 		 * worker.
 		 */
-		if (!target_seq || current_is_console_async())
+		if (!target_seq || current_is_console_async() ||
+		    !console_sync_mode)
 			target_seq = log_next_seq;
 
 		if (console_seq < log_first_seq) {
@@ -2293,7 +2311,7 @@ skip:
 	raw_spin_unlock(&logbuf_lock);
 	printk_safe_exit_irqrestore(flags);
 
-	if (request_async) {
+	if (request_async && !console_sync_mode) {
 		if (console_async_worker) {
 			/* async is online, punt */
 			kthread_queue_work(console_async_worker,
@@ -2413,6 +2431,18 @@ void console_start(struct console *conso
 }
 EXPORT_SYMBOL(console_start);
 
+void console_enter_sync_mode(void)
+{
+	console_lock();
+	WRITE_ONCE(console_sync_mode, true);
+	console_unlock();
+}
+
+void console_leave_sync_mode(void)
+{
+	WRITE_ONCE(console_sync_mode, false);
+}
+
 static int __read_mostly keep_bootcon;
 
 static int __init keep_bootcon_setup(char *str)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] Subject: printk: Don't trap random context in infinite log_buf flush
  2017-11-07 13:23         ` Tejun Heo
@ 2017-11-08  5:29           ` Sergey Senozhatsky
  2017-11-08 14:33             ` Tejun Heo
  2017-11-08 16:22           ` Steven Rostedt
  1 sibling, 1 reply; 13+ messages in thread
From: Sergey Senozhatsky @ 2017-11-08  5:29 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Petr Mladek,
	Steven Rostedt, linux-kernel, kernel-team, Linus Torvalds,
	Andrew Morton

Hello Tejun,

On (11/07/17 05:23), Tejun Heo wrote:
> Hello, Sergey.
> 
> On Tue, Nov 07, 2017 at 11:04:34AM +0900, Sergey Senozhatsky wrote:
> > just to make sure. there is a typo in Steven's patch:
> > 
> > 	while (!READ_ONCE(console_waiter))
> > 
> > should be
> > 
> > 	while (READ_ONCE(console_waiter))
> > 
> > is this the "tweaking" you are talking about?
> 
> Oh, I was talking about tweaking the repro, but I'm not sure the above
> would change anything.  The problem that the repro demonstrates is a
> message deluge involving an non-sleepable flusher + local irq (or
> other atomic contexts) message producer.
> 
> In the above case, none of the involved contexts can serve as the
> flusher for a long time without messing up the system.  If you wanna
> allow printks to be async without falling into these lockups, you
> gotta introduce an independent safe context to flush from.

we are in agreement.
I Cc-ed you to another thread, let's merge discussions.

> > > > there are some concerns, like a huge number of printk-s happening while
> > > > console_sem is locked. e.g. console_lock()/console_unlock() on one of the
> > > > CPUs, or console_lock(); printk(); ... printk(); console_unlock();
> > > 
> > > Unless we make all messages fully synchronous, I don't think there's a
> > > good solution for that and I don't think we wanna make everything
> > > fully synchronous.
> > 
> > this is where it becomes complicated. offloading logic is not binary,
> > unfortunately. we normally want to offload; but not always. things
> > like sysrq or late PM warnings, or kexec, etc. want to stay fully sync,
> > regardless the consequences. some of sysrq prints out even do
> > touch_nmi_watchdog() and touch_all_softlockup_watchdogs(). current
> > printk-kthread patch set tries to consider those cases and to avoid
> > any offloading.
> 
> Yeah, sure, selectively opting out of asynchronous operation is a
> different (solvable) issue.  Also, just to be clear, the proposed
> patch doesn't make any of these worse in any meaningful way - e.g. we
> could end up trapping a nice 20 task pinned to an overloaded CPU in
> the flusher role.
> 
> The following is a completely untested patch to show how we can put
> the console in full sync mode, just the general idea.  I'm a bit
> skeptical we really wanna do this given that we already (with or
> without the patch) stay sync for most of these events due to the way
> we go async, but, yeah, if we wanna do that, we can do that.

we've been going in a slightly different direction in printk-kthread.

we keep printk sync by default [as opposed to previous "immediately
offload" approach]. people asked for it, some people demanded it. we
offload to printk-kthread only when we detect that this particular
task on this particular CPU has been doing printing (without rescheduling)
for 1/2 of watchdog threshold value. IOW, if we see that we are heading
towards the lockup limit then we offload. otherwise - we let it loop in
console_unlock().

	-ss

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] Subject: printk: Don't trap random context in infinite log_buf flush
  2017-11-08  5:29           ` Sergey Senozhatsky
@ 2017-11-08 14:33             ` Tejun Heo
  0 siblings, 0 replies; 13+ messages in thread
From: Tejun Heo @ 2017-11-08 14:33 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Sergey Senozhatsky, Petr Mladek, Steven Rostedt, linux-kernel,
	kernel-team, Linus Torvalds, Andrew Morton

Hello,

On Wed, Nov 08, 2017 at 02:29:35PM +0900, Sergey Senozhatsky wrote:
> > The following is a completely untested patch to show how we can put
> > the console in full sync mode, just the general idea.  I'm a bit
> > skeptical we really wanna do this given that we already (with or
> > without the patch) stay sync for most of these events due to the way
> > we go async, but, yeah, if we wanna do that, we can do that.
> 
> we've been going in a slightly different direction in printk-kthread.
> 
> we keep printk sync by default [as opposed to previous "immediately
> offload" approach]. people asked for it, some people demanded it. we
> offload to printk-kthread only when we detect that this particular
> task on this particular CPU has been doing printing (without rescheduling)
> for 1/2 of watchdog threshold value. IOW, if we see that we are heading
> towards the lockup limit then we offload. otherwise - we let it loop in
> console_unlock().

FWIW, I'm still a bit skeptical whether that's the right direction.
People asking for it doesn't indicate that the flipside isn't worse.
Hmm... I think my impression is mostly from the fact that throughout
the years I can only count few times where loss of printk messages was
an actual issue.  The system getting slowed down / locking up from
printk message deluge was *far* more prevalent.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] Subject: printk: Don't trap random context in infinite log_buf flush
  2017-11-07  0:22     ` Tejun Heo
  2017-11-07  2:04       ` Sergey Senozhatsky
@ 2017-11-08 16:20       ` Steven Rostedt
  1 sibling, 0 replies; 13+ messages in thread
From: Steven Rostedt @ 2017-11-08 16:20 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Sergey Senozhatsky, Petr Mladek, linux-kernel, kernel-team,
	Linus Torvalds, Andrew Morton

On Mon, 6 Nov 2017 16:22:54 -0800
Tejun Heo <tj@kernel.org> wrote:

> Hello,
> 
> On Sat, Nov 04, 2017 at 01:24:08PM +0900, Sergey Senozhatsky wrote:
> > thanks for the patch set. we are currently looking at another approach:
> > lkml.kernel.org/r/20171102134515.6eef16de@gandalf.local.home
> > 
> > would you be interested in taking a look?  
>  
> Hmm... It took some tweaking but the code at the end locks up the
> machine w/ Steven's patch applied and it's not that contrived a case
> (e.g. out of memory messages from packet tx/rx paths while OOM is in
> progress).

Did you apply it with or without the while (READ_ONCE(console_waiter))
fix?

Because that is crucial.

-- Steve

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] Subject: printk: Don't trap random context in infinite log_buf flush
  2017-11-07 13:23         ` Tejun Heo
  2017-11-08  5:29           ` Sergey Senozhatsky
@ 2017-11-08 16:22           ` Steven Rostedt
  2017-11-08 16:28             ` Tejun Heo
  1 sibling, 1 reply; 13+ messages in thread
From: Steven Rostedt @ 2017-11-08 16:22 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Petr Mladek,
	linux-kernel, kernel-team, Linus Torvalds, Andrew Morton

On Tue, 7 Nov 2017 05:23:50 -0800
Tejun Heo <tj@kernel.org> wrote:

> Hello, Sergey.
> 
> On Tue, Nov 07, 2017 at 11:04:34AM +0900, Sergey Senozhatsky wrote:
> > just to make sure. there is a typo in Steven's patch:
> > 
> > 	while (!READ_ONCE(console_waiter))
> > 
> > should be
> > 
> > 	while (READ_ONCE(console_waiter))
> > 
> > is this the "tweaking" you are talking about?  
> 
> Oh, I was talking about tweaking the repro, but I'm not sure the above
> would change anything. 

No, it would change an awful lot.

It would cause two printers to access the consoles at the same time,
which would lead to unpredictable behavior.

-- Steve

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] Subject: printk: Don't trap random context in infinite log_buf flush
  2017-11-08 16:22           ` Steven Rostedt
@ 2017-11-08 16:28             ` Tejun Heo
  0 siblings, 0 replies; 13+ messages in thread
From: Tejun Heo @ 2017-11-08 16:28 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Petr Mladek,
	linux-kernel, kernel-team, Linus Torvalds, Andrew Morton

Hello, Steven.

On Wed, Nov 08, 2017 at 11:22:29AM -0500, Steven Rostedt wrote:
> > Oh, I was talking about tweaking the repro, but I'm not sure the above
> > would change anything. 
> 
> No, it would change an awful lot.
> 
> It would cause two printers to access the consoles at the same time,
> which would lead to unpredictable behavior.

Sure, I'll give it a shot later but in the repro there literally is no
context which is safe, so I kinda have a difficult time imagining how
that will make things materially better.  Neither of the producers can
get scheduled out no matter they switch back and forth between them or
not, so rcu stalls seem rather unavoidable without introducing an
independent context.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2017-11-08 16:28 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-02 13:51 [PATCH 1/2] printk: Relocate wake_klogd check close to the end of console_unlock() Tejun Heo
2017-11-02 13:52 ` [PATCH 2/2] Subject: printk: Don't trap random context in infinite log_buf flush Tejun Heo
2017-11-02 14:51   ` [PATCH TRIVIAL UPDATE 2/2] " Tejun Heo
2017-11-04  4:24   ` [PATCH 2/2] Subject: " Sergey Senozhatsky
2017-11-07  0:22     ` Tejun Heo
2017-11-07  2:04       ` Sergey Senozhatsky
2017-11-07 13:23         ` Tejun Heo
2017-11-08  5:29           ` Sergey Senozhatsky
2017-11-08 14:33             ` Tejun Heo
2017-11-08 16:22           ` Steven Rostedt
2017-11-08 16:28             ` Tejun Heo
2017-11-08 16:20       ` Steven Rostedt
2017-11-07  4:40 ` [PATCH 1/2] printk: Relocate wake_klogd check close to the end of console_unlock() Sergey Senozhatsky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).