All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC][PATCH] printk: do not flush printk_safe from irq_work
@ 2018-01-24  9:37 Sergey Senozhatsky
  2018-01-26 15:26 ` Petr Mladek
  0 siblings, 1 reply; 10+ messages in thread
From: Sergey Senozhatsky @ 2018-01-24  9:37 UTC (permalink / raw)
  To: Steven Rostedt, Petr Mladek, Tejun Heo
  Cc: linux-kernel, Sergey Senozhatsky, Sergey Senozhatsky

Tejun Heo reported that net_console can cause recursive printk()-s
from call_console_drivers() (under OOM/etc.) and this can lockup that
CPU, because CPU calls call_console_drivers() to print pending logbufs,
but recursive printk() issued by one of console drivers adds several
new messages to the logbuf via recursive printk().

Note, this is not a printk_safe problem. In fact printk_safe helps
us to address the issue here and to rate-limit recursive printk()-s.

Schematically, what we have at the moment is:

==================================================
printk()
 preempt_disable()
  console_unlock()
  {
    for (;;) {
     local_irq_save()
     printk_safe_enter()

     call_console_drivers()
         printk()
          vprintk_safe()
           irq_work_queue()

     printk_safe_exit()
     local_irq_restore()

     << irq work >>
       printk_safe_flush()
         printk()
   }
  }
 preempt_enable()
==================================================

So CPU triggers printk() recursion in call_console_drivers(),
immediately after local_irq_restore() it executes printk_safe
flush irq_work, which adds even more pending messages to the
logbuf which it has to print. CPU hits printk() recursion again
while it prints pending messages, which results in even more
pending messages flushed from printk_safe via irq_work. And so
on.

The idea is to delay printk_safe flush until CPU is in preemptible
context, so we won't lockup it up.

The new behaviour is:

==================================================
printk()
 preempt_disable()
  console_unlock()
  {
    for (;;) {
     local_irq_save()
     printk_safe_enter()

     call_console_drivers()
         printk()
          vprintk_safe()
           queue_work_on(smp_processor_id())

     printk_safe_exit()
     local_irq_restore()
   }
  }
 preempt_enable()

 << work queue >>
   printk_safe_flush()
     printk()
==================================================

Since console_unlock() is called under preemption_disabled() now
we cannot schedule flush work as long as CPU is printing messages.
We only can flush printk_safe messages after CPU returns from
console_unlock() and enables preemption.

This might look like with the delayed flush we increase the possibility
of lost printk_safe messages (printk_safe per-CPU buffers are pretty
limited in size). But prinkt_safe was never meant to be used for huge
amounts of data which can generate e.g. debugging of serial consoles,
etc. Its goal is to make recursive printk()-s less deadlock prone.
With this patch we add one more thing: avoid CPU lockup which might
be caused by printk_safe_flush().

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reported-by: Tejun Heo <tj@kernel.org>
---
 kernel/printk/printk_safe.c | 90 +++++++++++++++++++++++++++++++++++++++------
 1 file changed, 78 insertions(+), 12 deletions(-)

diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c
index 3e3c2004bb23..eb5a4049314a 100644
--- a/kernel/printk/printk_safe.c
+++ b/kernel/printk/printk_safe.c
@@ -22,6 +22,7 @@
 #include <linux/cpumask.h>
 #include <linux/irq_work.h>
 #include <linux/printk.h>
+#include <linux/workqueue.h>
 
 #include "internal.h"
 
@@ -49,7 +50,10 @@ static int printk_safe_irq_ready __read_mostly;
 struct printk_safe_seq_buf {
 	atomic_t		len;	/* length of written data */
 	atomic_t		message_lost;
-	struct irq_work		work;	/* IRQ work that flushes the buffer */
+	/* IRQ work that flushes printk_nmi buffer */
+	struct irq_work		irq_flush_work;
+	/* WQ work that flushes printk_safe buffer */
+	struct work_struct	wq_flush_work;
 	unsigned char		buffer[SAFE_LOG_BUF_LEN];
 };
 
@@ -61,10 +65,33 @@ static DEFINE_PER_CPU(struct printk_safe_seq_buf, nmi_print_seq);
 #endif
 
 /* Get flushed in a more safe context. */
-static void queue_flush_work(struct printk_safe_seq_buf *s)
+static void queue_irq_flush_work(struct printk_safe_seq_buf *s)
 {
 	if (printk_safe_irq_ready)
-		irq_work_queue(&s->work);
+		irq_work_queue(&s->irq_flush_work);
+}
+
+/*
+ * Get flushed from a process context. printk() recursion may happen in
+ * call_console_drivers(), while CPU that hit printk() recursion is
+ * printing pending logbud messages. In this case flushing printk_safe buffer
+ * from IRQ work may result in CPU lockup - we flush printk_safe every time
+ * we enable local IRQs after call_console_drivers(), which adds new messages
+ * to the logbuf and, thus, CPU has more messages to print, triggering more
+ * recursive printk()-s in call_console_drivers() and appending even more
+ * messages to the logbuf.
+ *
+ * Process context cannot flush messages while CPU is in console_unlock(),
+ * because printk() calls console_unlock() with preemption disabled. So
+ * we can flush recursive call_console_drivers() printk()-s only after CPU
+ * leaves console_unlock() printing loop and enables preemption.
+ */
+static void queue_wq_flush_work(struct printk_safe_seq_buf *s)
+{
+	if (printk_safe_irq_ready)
+		queue_work_on(smp_processor_id(),
+				system_wq,
+				&s->wq_flush_work);
 }
 
 /*
@@ -89,7 +116,7 @@ static __printf(2, 0) int printk_safe_log_store(struct printk_safe_seq_buf *s,
 	/* The trailing '\0' is not counted into len. */
 	if (len >= sizeof(s->buffer) - 1) {
 		atomic_inc(&s->message_lost);
-		queue_flush_work(s);
+		queue_irq_flush_work(s);
 		return 0;
 	}
 
@@ -112,7 +139,7 @@ static __printf(2, 0) int printk_safe_log_store(struct printk_safe_seq_buf *s,
 	if (atomic_cmpxchg(&s->len, len, len + add) != len)
 		goto again;
 
-	queue_flush_work(s);
+	queue_irq_flush_work(s);
 	return add;
 }
 
@@ -186,12 +213,10 @@ static void report_message_lost(struct printk_safe_seq_buf *s)
  * Flush data from the associated per-CPU buffer. The function
  * can be called either via IRQ work or independently.
  */
-static void __printk_safe_flush(struct irq_work *work)
+static void __printk_safe_flush(struct printk_safe_seq_buf *s)
 {
 	static raw_spinlock_t read_lock =
 		__RAW_SPIN_LOCK_INITIALIZER(read_lock);
-	struct printk_safe_seq_buf *s =
-		container_of(work, struct printk_safe_seq_buf, work);
 	unsigned long flags;
 	size_t len;
 	int i;
@@ -243,6 +268,46 @@ static void __printk_safe_flush(struct irq_work *work)
 	raw_spin_unlock_irqrestore(&read_lock, flags);
 }
 
+static void irq_flush_work_fn(struct irq_work *work)
+{
+	struct printk_safe_seq_buf *s =
+		container_of(work, struct printk_safe_seq_buf, irq_flush_work);
+
+	__printk_safe_flush(s);
+}
+
+static void wq_flush_work_fn(struct work_struct *work)
+{
+	struct printk_safe_seq_buf *s =
+		container_of(work, struct printk_safe_seq_buf, wq_flush_work);
+
+	__printk_safe_flush(s);
+}
+
+/*
+ * We can't queue wq work directly from vprintk_safe(), because we can
+ * deadlock. For instance:
+ *
+ * queue_work()
+ *  spin_lock(pool->lock)
+ *   printk()
+ *    call_console_drivers()
+ *     vprintk_safe()
+ *      queue_work()
+ *       spin_lock(pool->lock)
+ *
+ * So we use irq_work, from which we queue wq work. WQ disables local IRQs
+ * while it works with a pool, so if we have irq_work on this CPU then we
+ * can expect that pool->lock is not locked on this CPU.
+ */
+static void irq_to_wq_flush_work_fn(struct irq_work *work)
+{
+	struct printk_safe_seq_buf *s =
+		container_of(work, struct printk_safe_seq_buf, irq_flush_work);
+
+	queue_wq_flush_work(s);
+}
+
 /**
  * printk_safe_flush - flush all per-cpu nmi buffers.
  *
@@ -256,9 +321,9 @@ void printk_safe_flush(void)
 
 	for_each_possible_cpu(cpu) {
 #ifdef CONFIG_PRINTK_NMI
-		__printk_safe_flush(&per_cpu(nmi_print_seq, cpu).work);
+		__printk_safe_flush(this_cpu_ptr(&nmi_print_seq));
 #endif
-		__printk_safe_flush(&per_cpu(safe_print_seq, cpu).work);
+		__printk_safe_flush(this_cpu_ptr(&safe_print_seq));
 	}
 }
 
@@ -387,11 +452,12 @@ void __init printk_safe_init(void)
 		struct printk_safe_seq_buf *s;
 
 		s = &per_cpu(safe_print_seq, cpu);
-		init_irq_work(&s->work, __printk_safe_flush);
+		init_irq_work(&s->irq_flush_work, irq_to_wq_flush_work_fn);
+		INIT_WORK(&s->wq_flush_work, wq_flush_work_fn);
 
 #ifdef CONFIG_PRINTK_NMI
 		s = &per_cpu(nmi_print_seq, cpu);
-		init_irq_work(&s->work, __printk_safe_flush);
+		init_irq_work(&s->irq_flush_work, irq_flush_work_fn);
 #endif
 	}
 
-- 
2.16.1

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [RFC][PATCH] printk: do not flush printk_safe from irq_work
  2018-01-24  9:37 [RFC][PATCH] printk: do not flush printk_safe from irq_work Sergey Senozhatsky
@ 2018-01-26 15:26 ` Petr Mladek
  2018-01-29  2:29   ` Sergey Senozhatsky
  0 siblings, 1 reply; 10+ messages in thread
From: Petr Mladek @ 2018-01-26 15:26 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Steven Rostedt, Tejun Heo, linux-kernel, Sergey Senozhatsky

On Wed 2018-01-24 18:37:23, Sergey Senozhatsky wrote:
> Tejun Heo reported that net_console can cause recursive printk()-s
> from call_console_drivers() (under OOM/etc.) and this can lockup that
> CPU, because CPU calls call_console_drivers() to print pending logbufs,
> but recursive printk() issued by one of console drivers adds several
> new messages to the logbuf via recursive printk().

> Note, this is not a printk_safe problem. In fact printk_safe helps
> us to address the issue here and to rate-limit recursive printk()-s.
> 
> Schematically, what we have at the moment is:
> 
> ==================================================
> printk()
>  preempt_disable()
>   console_unlock()
>   {
>     for (;;) {
>      local_irq_save()
>      printk_safe_enter()
> 
>      call_console_drivers()
>          printk()
>           vprintk_safe()
>            irq_work_queue()
> 
>      printk_safe_exit()
>      local_irq_restore()
> 
>      << irq work >>
>        printk_safe_flush()
>          printk()
>    }
>   }
>  preempt_enable()
> ==================================================
> 
> So CPU triggers printk() recursion in call_console_drivers(),
> immediately after local_irq_restore() it executes printk_safe
> flush irq_work, which adds even more pending messages to the
> logbuf which it has to print. CPU hits printk() recursion again
> while it prints pending messages, which results in even more
> pending messages flushed from printk_safe via irq_work. And so
> on.
> 
> The idea is to delay printk_safe flush until CPU is in preemptible
> context, so we won't lockup it up.
> 
> The new behaviour is:
> 
> ==================================================
> printk()
>  preempt_disable()
>   console_unlock()
>   {
>     for (;;) {
>      local_irq_save()
>      printk_safe_enter()
> 
>      call_console_drivers()
>          printk()
>           vprintk_safe()
>            queue_work_on(smp_processor_id())
> 
>      printk_safe_exit()
>      local_irq_restore()
>    }
>   }
>  preempt_enable()
> 
>  << work queue >>
>    printk_safe_flush()
>      printk()
> ==================================================
> 
> Since console_unlock() is called under preemption_disabled() now
> we cannot schedule flush work as long as CPU is printing messages.
> We only can flush printk_safe messages after CPU returns from
> console_unlock() and enables preemption.
> 
> This might look like with the delayed flush we increase the possibility
> of lost printk_safe messages (printk_safe per-CPU buffers are pretty
> limited in size). But prinkt_safe was never meant to be used for huge
> amounts of data which can generate e.g. debugging of serial consoles,
> etc. Its goal is to make recursive printk()-s less deadlock prone.
> With this patch we add one more thing: avoid CPU lockup which might
> be caused by printk_safe_flush().

I have two concerns about this approach:

First, this delays showing eventually valuable information until
the preemption is enabled. It might never happen if the system
is in big troubles. In each case, it might be much longer delay
than it was before.

Second, it makes printk() dependent on another non-trivial subsystem.
I mean workqueues.

If I got it right, we want to limit the number of recursively added
messages that might be printed() from inside console_unlock(),
especially from console_drivers().


The following, a bit ugly, solution has came to my mind. We could
think about it like extending the printk_context. It counts
printks called in this context and does nothing when we reach
the limit. The difference is that the context is task-specific
instead of CPU-specific.

I used only one compiler barrier which looks strange. Well, it
is late Friday here and it is just a POC. I wonder what you
think about this approach.


Just a compile tested code:

diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h
index 2a7d04049af4..747098310179 100644
--- a/kernel/printk/internal.h
+++ b/kernel/printk/internal.h
@@ -23,6 +23,9 @@
 #define PRINTK_NMI_CONTEXT_MASK		 0x80000000
 
 extern raw_spinlock_t logbuf_lock;
+extern struct task_struct *console_unlock_owner;
+extern int console_recursion_count;
+extern int console_recursion_limit;
 
 __printf(1, 0) int vprintk_default(const char *fmt, va_list args);
 __printf(1, 0) int vprintk_deferred(const char *fmt, va_list args);
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index db4b9b8929eb..4508bb579c5b 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -81,6 +81,9 @@ EXPORT_SYMBOL(oops_in_progress);
 static DEFINE_SEMAPHORE(console_sem);
 struct console *console_drivers;
 EXPORT_SYMBOL_GPL(console_drivers);
+struct task_struct *console_unlock_owner;
+int console_recursion_count;
+int console_recursion_limit = 100;
 
 #ifdef CONFIG_LOCKDEP
 static struct lockdep_map console_lock_dep_map = {
@@ -2293,6 +2296,7 @@ void console_unlock(void)
 	unsigned long flags;
 	bool wake_klogd = false;
 	bool do_cond_resched, retry;
+	int recursion_count = 0;
 
 	if (console_suspended) {
 		up_console_sem();
@@ -2328,6 +2332,20 @@ void console_unlock(void)
 		return;
 	}
 
+	/* Start section where we prevent recursion */
+	console_unlock_owner = current;
+	/*
+	 * Counter is preserved even when console_lock is released,
+	 * re-taken and we end up here again.
+	 */
+	console_recursion_count = recursion_count;
+	/*
+	 * The above values are meaningful only for the current task.
+	 * Make only sure that this is set before calling anything that
+	 * might call printk().
+	 */
+	barrier();
+
 	for (;;) {
 		struct printk_log *msg;
 		size_t ext_len = 0;
@@ -2414,6 +2432,8 @@ void console_unlock(void)
 
 	raw_spin_unlock(&logbuf_lock);
 
+	recursion_count = console_recursion_count;
+	console_unlock_owner = NULL;
 	up_console_sem();
 
 	/*
diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c
index 3e3c2004bb23..46209b4fd960 100644
--- a/kernel/printk/printk_safe.c
+++ b/kernel/printk/printk_safe.c
@@ -364,6 +364,14 @@ __printf(1, 0) int vprintk_func(const char *fmt, va_list args)
 	if (this_cpu_read(printk_context) & PRINTK_NMI_CONTEXT_MASK)
 		return vprintk_nmi(fmt, args);
 
+	/*
+	 * No barrier is needed. The recursion counter is increased only by
+	 * the task owning the console lock.
+	 */
+	if (console_unlock_owner == current &&
+	    console_recursion_count++ > console_recursion_limit)
+		return 0;
+
 	/* Use extra buffer to prevent a recursion deadlock in safe mode. */
 	if (this_cpu_read(printk_context) & PRINTK_SAFE_CONTEXT_MASK)
 		return vprintk_safe(fmt, args);



PS: I answered this mail because the original discussion[1] has
already been way too long and covered many other issues. But feel
free to ask me to move it there.

[1] https://lkml.kernel.org/r/20180117121251.7283a56e@gandalf.local.home

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [RFC][PATCH] printk: do not flush printk_safe from irq_work
  2018-01-26 15:26 ` Petr Mladek
@ 2018-01-29  2:29   ` Sergey Senozhatsky
  2018-01-30 12:23     ` Petr Mladek
  2018-02-01 18:00     ` Steven Rostedt
  0 siblings, 2 replies; 10+ messages in thread
From: Sergey Senozhatsky @ 2018-01-29  2:29 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Tejun Heo, linux-kernel,
	Sergey Senozhatsky

On (01/26/18 16:26), Petr Mladek wrote:
[..]
> First, this delays showing eventually valuable information until
> the preemption is enabled. It might never happen if the system
> is in big troubles. In each case, it might be much longer delay
> than it was before.

If the system is in "big troubles" then what makes irq_work more
possible? Local IRQs can stay disabled, just like preemption. I
guess when the troubles are really big our strategy is the same
for both wq and irq_work solutions - we keep the printk_safe buffer
and wait for panic()->flush.

> Second, it makes printk() dependent on another non-trivial subsystem.
> I mean workqueues.
[..]
> The following, a bit ugly, solution has came to my mind. We could
> think about it like extending the printk_context. It counts
> printks called in this context and does nothing when we reach
> the limit. The difference is that the context is task-specific
> instead of CPU-specific.
[..]
> +int console_recursion_count;
> +int console_recursion_limit = 100;

Hm... I'm not entirely happy with magic constants, to be honest.
Why 100? One of the printk_safe lockdep reports I saw was ~270
lines long: https://marc.info/?l=linux-kernel&m=150659041411473&w=2

`console_recursion_limit' also makes PRINTK_SAFE_LOG_BUF_SHIFT
a bit useless and hard to understand - despite its value we will
store only 100 lines.

We probably can replace `console_recursion_limit' with the following:
- in the current `console_recursion' section we let only SAFE_LOG_BUF_LEN
  chars to be stored in printk-safe buffer and, once we reached the limit,
  don't append any new messages until we are out of `console_recursion'
  context. Which is somewhat close to wq solution, the difference is that
  printk_safe can happen earlier if local IRQs are enabled. But at the
  same time someone might set PRINTK_SAFE_LOG_BUF_SHIFT big enough to
  still cause troubles, just because printk-deadlock errors sound scary
  and important enough.

I guess I'm OK with the wq dependency after all, but I may be mistaken.
printk_safe was never about "immediately flush the buffer", it was about
"avoid deadlocks", which was extended to "flush from any context which
will let us to avoid deadlock". It just happened that it inherited
irq_work dependency from printk_nmi.

We also probably can add printk_safe flush to some sysrq handler
may be.

> PS: I answered this mail because the original discussion[1] has
> already been way too long and covered many other issues.

Yep, that's I started it as a new discussion.

	-ss

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC][PATCH] printk: do not flush printk_safe from irq_work
  2018-01-29  2:29   ` Sergey Senozhatsky
@ 2018-01-30 12:23     ` Petr Mladek
  2018-02-01  2:46       ` Sergey Senozhatsky
  2018-02-01 18:00     ` Steven Rostedt
  1 sibling, 1 reply; 10+ messages in thread
From: Petr Mladek @ 2018-01-30 12:23 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Steven Rostedt, Tejun Heo, linux-kernel, Sergey Senozhatsky

On Mon 2018-01-29 11:29:18, Sergey Senozhatsky wrote:
> On (01/26/18 16:26), Petr Mladek wrote:
> [..]
> > First, this delays showing eventually valuable information until
> > the preemption is enabled. It might never happen if the system
> > is in big troubles. In each case, it might be much longer delay
> > than it was before.
> 
> If the system is in "big troubles" then what makes irq_work more
> possible? Local IRQs can stay disabled, just like preemption. I
> guess when the troubles are really big our strategy is the same
> for both wq and irq_work solutions - we keep the printk_safe buffer
> and wait for panic()->flush.

But the patch still uses irq work because queue_work_on() could not
be safely called from printk_safe(). By other words, it requires
both irq_work and workqueues to be functional. Note that there
might be deadlock or livelock in the workqueues subsystem. It is
just another non-trivial thingy that might get wrong.

Also interrupts are handled immediately when they are enabled.
On the other hand, a workqueue work is proceed only when the worker
is scheduled and the work is first in the queue. It might
take ages if there is a high load on the CPU or on the given
workqueue.


> > Second, it makes printk() dependent on another non-trivial subsystem.
> > I mean workqueues.
> [..]
> > The following, a bit ugly, solution has came to my mind. We could
> > think about it like extending the printk_context. It counts
> > printks called in this context and does nothing when we reach
> > the limit. The difference is that the context is task-specific
> > instead of CPU-specific.
> [..]
> > +int console_recursion_count;
> > +int console_recursion_limit = 100;
> 
> Hm... I'm not entirely happy with magic constants, to be honest.
> Why 100? One of the printk_safe lockdep reports I saw was ~270
> lines long: https://marc.info/?l=linux-kernel&m=150659041411473&w=2

I am not happy with this constant either. It was used just for
a simplicity of the RFC.


> `console_recursion_limit' also makes PRINTK_SAFE_LOG_BUF_SHIFT
> a bit useless and hard to understand - despite its value we will
> store only 100 lines.
>
> We probably can replace `console_recursion_limit' with the following:
> - in the current `console_recursion' section we let only SAFE_LOG_BUF_LEN
>   chars to be stored in printk-safe buffer and, once we reached the limit,
>   don't append any new messages until we are out of `console_recursion'
>   context. Which is somewhat close to wq solution, the difference is that
>   printk_safe can happen earlier if local IRQs are enabled.

I like this idea. It would actually make perfect sense to use the same
limit for PRINTK_SAFE buffer size and for the printk recursion.

They both should be big enough to allow a meaningful report. On
the other hand, they both should be relatively small. One because
of memory constrains, the other because of reducing redundancy.

In each case, there is a direct dependency. The recursive messages
are stored into the printk_safe buffer.


>   same time someone might set PRINTK_SAFE_LOG_BUF_SHIFT big enough to
>   still cause troubles, just because printk-deadlock errors sound scary
>   and important enough.

We could always make it more complicated if people come up with
a reasonable use case. IMHO, most people will not care about
these limits.


> I guess I'm OK with the wq dependency after all, but I may be mistaken.
> printk_safe was never about "immediately flush the buffer", it was about
> "avoid deadlocks", which was extended to "flush from any context which
> will let us to avoid deadlock". It just happened that it inherited
> irq_work dependency from printk_nmi.

I see the point. But if I remember correctly, it was also designed
before we started to be concerned about a sudden death and "get
printks out ASAP" mantra.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC][PATCH] printk: do not flush printk_safe from irq_work
  2018-01-30 12:23     ` Petr Mladek
@ 2018-02-01  2:46       ` Sergey Senozhatsky
  2018-02-02 12:17         ` Petr Mladek
  0 siblings, 1 reply; 10+ messages in thread
From: Sergey Senozhatsky @ 2018-02-01  2:46 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Tejun Heo, linux-kernel,
	Sergey Senozhatsky

On (01/30/18 13:23), Petr Mladek wrote:
[..]
> > If the system is in "big troubles" then what makes irq_work more
> > possible? Local IRQs can stay disabled, just like preemption. I
> > guess when the troubles are really big our strategy is the same
> > for both wq and irq_work solutions - we keep the printk_safe buffer
> > and wait for panic()->flush.
> 
> But the patch still uses irq work because queue_work_on() could not
> be safely called from printk_safe(). By other words, it requires
> both irq_work and workqueues to be functional.

Right, that's all true. The reason it's done this way is because buffers can
be big and we still flush under console_sem in console_unlock() loop, which
can in theory be problematic. In other words, I wanted to remove the root
cause - irq flush of printk_safe while we are still in printing loop.
Technically, we minimize the probability by throttling down printk_safe flush,
but we don't eliminate the possibility entirely. Maybe it is good enough,
maybe not. Opinions?

[..]
> > `console_recursion_limit' also makes PRINTK_SAFE_LOG_BUF_SHIFT
> > a bit useless and hard to understand - despite its value we will
> > store only 100 lines.
> >
> > We probably can replace `console_recursion_limit' with the following:
> > - in the current `console_recursion' section we let only SAFE_LOG_BUF_LEN
> >   chars to be stored in printk-safe buffer and, once we reached the limit,
> >   don't append any new messages until we are out of `console_recursion'
> >   context. Which is somewhat close to wq solution, the difference is that
> >   printk_safe can happen earlier if local IRQs are enabled.

      ^^^^^ printk_safe flush

> I like this idea. It would actually make perfect sense to use the same
> limit for PRINTK_SAFE buffer size and for the printk recursion.

Yes, we probably can do it that way, but this thing

    " They both should be big enough to "

is a bit of a concern. The "big enough to" can lead to different things.

> > I guess I'm OK with the wq dependency after all, but I may be mistaken.
> > printk_safe was never about "immediately flush the buffer", it was about
> > "avoid deadlocks", which was extended to "flush from any context which
> > will let us to avoid deadlock". It just happened that it inherited
> > irq_work dependency from printk_nmi.
> 
> I see the point. But if I remember correctly, it was also designed
> before we started to be concerned about a sudden death and "get
> printks out ASAP" mantra.

Can you elaborate a bit?

	-ss

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC][PATCH] printk: do not flush printk_safe from irq_work
  2018-01-29  2:29   ` Sergey Senozhatsky
  2018-01-30 12:23     ` Petr Mladek
@ 2018-02-01 18:00     ` Steven Rostedt
  2018-02-02  1:07       ` Sergey Senozhatsky
  1 sibling, 1 reply; 10+ messages in thread
From: Steven Rostedt @ 2018-02-01 18:00 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Petr Mladek, Tejun Heo, linux-kernel, Sergey Senozhatsky

On Mon, 29 Jan 2018 11:29:18 +0900
Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> wrote:

> On (01/26/18 16:26), Petr Mladek wrote:
> [..]
> > First, this delays showing eventually valuable information until
> > the preemption is enabled. It might never happen if the system
> > is in big troubles. In each case, it might be much longer delay
> > than it was before.  
> 
> If the system is in "big troubles" then what makes irq_work more
> possible? Local IRQs can stay disabled, just like preemption. I
> guess when the troubles are really big our strategy is the same
> for both wq and irq_work solutions - we keep the printk_safe buffer
> and wait for panic()->flush.

Working on the RT kernel, I would tell you there's a huge difference
getting an irq_work to trigger than to expect something to schedule.

It takes much less to disable the systems scheduling than it does to
disable interrupts. If interrupts are disabled, then so is scheduling.
But if scheduling is disabled, interrupts may still trigger.

But if printk_safe() is just for recursion protection, how important is
it to get out?

-- Steve

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC][PATCH] printk: do not flush printk_safe from irq_work
  2018-02-01 18:00     ` Steven Rostedt
@ 2018-02-02  1:07       ` Sergey Senozhatsky
  2018-02-02 10:23         ` Petr Mladek
  0 siblings, 1 reply; 10+ messages in thread
From: Sergey Senozhatsky @ 2018-02-02  1:07 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Sergey Senozhatsky, Petr Mladek, Tejun Heo, linux-kernel,
	Sergey Senozhatsky

On (02/01/18 13:00), Steven Rostedt wrote:
> On Mon, 29 Jan 2018 11:29:18 +0900
> Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> wrote:
[..]
> > If the system is in "big troubles" then what makes irq_work more
> > possible? Local IRQs can stay disabled, just like preemption. I
> > guess when the troubles are really big our strategy is the same
> > for both wq and irq_work solutions - we keep the printk_safe buffer
> > and wait for panic()->flush.
> 
> Working on the RT kernel, I would tell you there's a huge difference
> getting an irq_work to trigger than to expect something to schedule.
> 
> It takes much less to disable the systems scheduling than it does to
> disable interrupts. If interrupts are disabled, then so is scheduling.
> But if scheduling is disabled, interrupts may still trigger.

Sure, I understand those points and don't object any of those. Just
weighing the risks and benefits.

> But if printk_safe() is just for recursion protection, how important is
> it to get out?

Well, it depends. printk_safe can protect us against... what should I
call it... let's call it first order, or direct, printk recursion. The
one which involve locks internal to print. For instance,

	vprintk_emit()
	 spin_lock_irqsave(&logbuf_lock)
	  spin_lock_debug(...)
	   spin_dump()
	    printk()
	     vprintk_emit()
	      spin_lock_irqsave(&logbuf_lock)  << deadlock

printk_safe will save us by redirecting spin_dump()->printk().

printk_safe cannot protect us against second order, or indirect,
printk recursion. The one that involves locks which are external
to printk. In case of indirect recursion we can store lockdep report
in printk_safe, so we will not immediately recurse back to printk,
but in general it does not mean that we will be actually safe from
any deadlocks. For instance,

   timekeeping
    spin_lock_irqsave timer base
     printk
      call_console_drivers
       foo_console_driver
        mod_timer
	 spin_lock_irqsave timer base   << deadlock

printk_safe will not save us here, and most likely irq_work will
not happen on this CPU, because we deadlock with IRQs disabled.

So printk_safe output is in general of some interest, but we don't
have guarantees that it will be printed: if it was the direct printk
recursion - then it's all good, if indirect - then it may not be good.

	-ss

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC][PATCH] printk: do not flush printk_safe from irq_work
  2018-02-02  1:07       ` Sergey Senozhatsky
@ 2018-02-02 10:23         ` Petr Mladek
  0 siblings, 0 replies; 10+ messages in thread
From: Petr Mladek @ 2018-02-02 10:23 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Steven Rostedt, Tejun Heo, linux-kernel, Sergey Senozhatsky

On Fri 2018-02-02 10:07:20, Sergey Senozhatsky wrote:
> On (02/01/18 13:00), Steven Rostedt wrote:
> > On Mon, 29 Jan 2018 11:29:18 +0900
> > Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> wrote:
> [..]
> > > If the system is in "big troubles" then what makes irq_work more
> > > possible? Local IRQs can stay disabled, just like preemption. I
> > > guess when the troubles are really big our strategy is the same
> > > for both wq and irq_work solutions - we keep the printk_safe buffer
> > > and wait for panic()->flush.
> > 
> > Working on the RT kernel, I would tell you there's a huge difference
> > getting an irq_work to trigger than to expect something to schedule.

And this is not only about scheduling. It is also about having
a worker ready to handle the work. So, there is yet another
level that can eventually fail.


> > But if printk_safe() is just for recursion protection, how important is
> > it to get out?

Good question!

> Well, it depends. printk_safe can protect us against... what should I
> call it... let's call it first order, or direct, printk recursion. The
> one which involve locks internal to print. For instance,
> 
> 	vprintk_emit()
> 	 spin_lock_irqsave(&logbuf_lock)
> 	  spin_lock_debug(...)
> 	   spin_dump()
> 	    printk()
> 	     vprintk_emit()
> 	      spin_lock_irqsave(&logbuf_lock)  << deadlock
> 
> printk_safe will save us by redirecting spin_dump()->printk().
> 
> So printk_safe output is in general of some interest, but we don't
> have guarantees that it will be printed: if it was the direct printk
> recursion - then it's all good, if indirect - then it may not be good.

IMHO, the question is if the information about printk recursion would
help to understand the situation on the system or if it would just
add a noise to the original problem.

I would personally prefer to know about it. But I do not feel
experienced enough to make a generic decision.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC][PATCH] printk: do not flush printk_safe from irq_work
  2018-02-01  2:46       ` Sergey Senozhatsky
@ 2018-02-02 12:17         ` Petr Mladek
  2018-02-02 14:18           ` Petr Mladek
  0 siblings, 1 reply; 10+ messages in thread
From: Petr Mladek @ 2018-02-02 12:17 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Steven Rostedt, Tejun Heo, linux-kernel, Sergey Senozhatsky

On Thu 2018-02-01 11:46:47, Sergey Senozhatsky wrote:
> On (01/30/18 13:23), Petr Mladek wrote:
> [..]
> > > If the system is in "big troubles" then what makes irq_work more
> > > possible? Local IRQs can stay disabled, just like preemption. I
> > > guess when the troubles are really big our strategy is the same
> > > for both wq and irq_work solutions - we keep the printk_safe buffer
> > > and wait for panic()->flush.
> > 
> > But the patch still uses irq work because queue_work_on() could not
> > be safely called from printk_safe(). By other words, it requires
> > both irq_work and workqueues to be functional.
> 
> Right, that's all true. The reason it's done this way is because buffers can
> be big and we still flush under console_sem in console_unlock() loop, which
> can in theory be problematic. In other words, I wanted to remove the root
> cause - irq flush of printk_safe while we are still in printing
> loop.

Good point! We know that we would eventually push non-trivial amount
of messages and it would make sense to do it from non-atomic context.

On the other hand, this does not solve the same problem with printk
NMI buffer. And I guess that we do not want to risk offloading to
workqueues for NMI messages.


> > > I guess I'm OK with the wq dependency after all, but I may be mistaken.
> > > printk_safe was never about "immediately flush the buffer", it was about
> > > "avoid deadlocks", which was extended to "flush from any context which
> > > will let us to avoid deadlock". It just happened that it inherited
> > > irq_work dependency from printk_nmi.
> > 
> > I see the point. But if I remember correctly, it was also designed
> > before we started to be concerned about a sudden death and "get
> > printks out ASAP" mantra.
> 
> Can you elaborate a bit?

The pull request with printk_safe was sent on February 22, 2017, see
https://lkml.kernel.org/r/20170222114705.GA30336@linux.suse

The printk softlockup was still being solved by an immediate offload
from vprintk_emit() on March 29, 2017, see
https://lkml.kernel.org/r/20170329092511.3958-3-sergey.senozhatsky@gmail.com

I believe that it was the mail from Pavel Machek that made us
thinking about the sudden death. It was sent on April 7, 2017,
see https://lkml.kernel.org/r/20170407120642.GB4756@amd

The first version with the offload from console_unlock was
sent on May 9, 2017, see
https://lkml.kernel.org/r/20170509082859.854-3-sergey.senozhatsky@gmail.com

I am not exactly sure when the "get printks out ASAP" mantra started
but I cannot forget the mail from June 30, 2017, see
https://lkml.kernel.org/r/20170630070131.GA474@jagdpanzerIV.localdomain

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC][PATCH] printk: do not flush printk_safe from irq_work
  2018-02-02 12:17         ` Petr Mladek
@ 2018-02-02 14:18           ` Petr Mladek
  0 siblings, 0 replies; 10+ messages in thread
From: Petr Mladek @ 2018-02-02 14:18 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Steven Rostedt, Tejun Heo, linux-kernel, Sergey Senozhatsky

On Fri 2018-02-02 13:17:08, Petr Mladek wrote:
> On Thu 2018-02-01 11:46:47, Sergey Senozhatsky wrote:
> > On (01/30/18 13:23), Petr Mladek wrote:
> > [..]
> > > > If the system is in "big troubles" then what makes irq_work more
> > > > possible? Local IRQs can stay disabled, just like preemption. I
> > > > guess when the troubles are really big our strategy is the same
> > > > for both wq and irq_work solutions - we keep the printk_safe buffer
> > > > and wait for panic()->flush.
> > > 
> > > But the patch still uses irq work because queue_work_on() could not
> > > be safely called from printk_safe(). By other words, it requires
> > > both irq_work and workqueues to be functional.
> > 
> > Right, that's all true. The reason it's done this way is because buffers can
> > be big and we still flush under console_sem in console_unlock() loop, which
> > can in theory be problematic. In other words, I wanted to remove the root
> > cause - irq flush of printk_safe while we are still in printing
> > loop.
> 
> Good point! We know that we would eventually push non-trivial amount
> of messages and it would make sense to do it from non-atomic context.
> 
> On the other hand, this does not solve the same problem with printk
> NMI buffer. And I guess that we do not want to risk offloading to
> workqueues for NMI messages.

By the above I want to say that I am not sure if it is worth having
different solution for flushing printk_safe and nmi buffer. We might
be forced to debug why a solution does not work. Shared solution and
shared issues might safe us some work.

I know that my alternative solution needed some extra code for
printk_safe as well. But it is pretty straightforward. It basically
just a check of a counter.

Note that the WQ-based solution is kind of tricky. There is
the journey printk() -> irq_work -> wq. Also it relies on
the fact that console_unlock() is called with disabled preemption.


> > > > I guess I'm OK with the wq dependency after all, but I may be mistaken.
> > > > printk_safe was never about "immediately flush the buffer", it was about
> > > > "avoid deadlocks", which was extended to "flush from any context which
> > > > will let us to avoid deadlock". It just happened that it inherited
> > > > irq_work dependency from printk_nmi.
> > > 
> > > I see the point. But if I remember correctly, it was also designed
> > > before we started to be concerned about a sudden death and "get
> > > printks out ASAP" mantra.
> > 
> > Can you elaborate a bit?

You probably knew the dates. Anyway, I want to say. Even when
the messages stored in the printk_safe buffer might be less
important, they still might be interesting and we might want
to get them out rather sooner than later.

All in all, it seems that we are basically down to the dilemma
between "get messages out ASAP" vs. "avoid softlockups".
We need to take into account:

   + printk_safe messages are more or less printk-subsystem
     specific. They might be "less" important for others.
     But who knows for sure?

   + the code complexity


With the above in mind, I still prefer the "easier" solution based
on the counter. If people had problems with softlockups,
we would need to solve them. But then they probably will have
this problem also with normal printk or in NMI and we would
probably need a solution that is not using workqueues.

Does this makes any sense?

Best Regards,
Petr


PS: I am sorry for so many mails today. Sometimes (often?), it is so
hard to understand and say things clearly. And it is Friday.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2018-02-02 14:18 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-24  9:37 [RFC][PATCH] printk: do not flush printk_safe from irq_work Sergey Senozhatsky
2018-01-26 15:26 ` Petr Mladek
2018-01-29  2:29   ` Sergey Senozhatsky
2018-01-30 12:23     ` Petr Mladek
2018-02-01  2:46       ` Sergey Senozhatsky
2018-02-02 12:17         ` Petr Mladek
2018-02-02 14:18           ` Petr Mladek
2018-02-01 18:00     ` Steven Rostedt
2018-02-02  1:07       ` Sergey Senozhatsky
2018-02-02 10:23         ` Petr Mladek

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.