All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/10] printk: Avoid deadlock in NMI + vprintk_emit() cleanup
@ 2015-05-25 12:46 Petr Mladek
  2015-05-25 12:46 ` [PATCH 01/10] printk: Avoid deadlock in NMI context Petr Mladek
                   ` (10 more replies)
  0 siblings, 11 replies; 25+ messages in thread
From: Petr Mladek @ 2015-05-25 12:46 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Frederic Weisbecker, Steven Rostedt, Dave Anderson,
	Paul E. McKenney, Kay Sievers, Jiri Kosina, Michal Hocko,
	Jan Kara, linux-kernel, Wang Long, peifeiyue, dzickus,
	morgan.wang, sasha.levin, Petr Mladek

The main source of deadlocks caused by printk() in NMI context has been
solved by the commit a9edc88093287 ("x86/nmi: Perform a safe NMI stack
trace on all CPUs").

But there are still few warnings printed in the NMI code that could
case a deadlock. For example, see the freeze discussed at
https://lkml.org/lkml/2015/5/20/481

The warnings are not easy to avoid. We could not postpone them easily.
We could not allocate buffers in NMI context. Therefore we would need
some lockless mechanism to share a buffer between NMI and normal context.
But this would make printk() code much more complicated and it is not
worth it. There has already been an attempt to do so and it has been
rejected, see https://lkml.org/lkml/2014/6/10/388


This patch set provides much easier solution. It prevents from
the deadlock on logbuf_lock by using trylock rather than spin_lock.
If the lock can not be taken, the message is ignored and some
warning is printed later on.

1st patch adds the simple solution.

2nd, 4th, 5th patches makes the solution a bit more robust.

3rd patch just shuffle some code to make 4th patch trivial.

6th..10th patches split the spaghetti vprintk_emit() into more
pieces. They basically just shuffle the code to make it
better readable.


The patch have been tested against Linus' tree but it can be applied
also against linux-next.

Petr Mladek (10):
  printk: Avoid deadlock in NMI context
  printk: Try harder to get logbuf_lock on NMI
  printk: Move the deferred printk stuff
  printk: Merge and flush NMI buffer predictably via IRQ work
  printk: Try hard to print Oops message in NMI context
  printk: Split delayed printing of warnings from vprintk_emit()
  printk: Split text formatting and analyze from vprintk_emit()
  printk: Detect scheduler messages in vprintk_format_and_analyze()
  printk: Split text storing logic from vprintk_emit()
  printk: Split console call from vprintk_emit()

 kernel/printk/printk.c | 378 +++++++++++++++++++++++++++++++++++--------------
 1 file changed, 269 insertions(+), 109 deletions(-)

-- 
1.8.5.6


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 01/10] printk: Avoid deadlock in NMI context
  2015-05-25 12:46 [PATCH 00/10] printk: Avoid deadlock in NMI + vprintk_emit() cleanup Petr Mladek
@ 2015-05-25 12:46 ` Petr Mladek
  2015-05-27 23:13   ` Andrew Morton
  2015-05-25 12:46 ` [PATCH 02/10] printk: Try harder to get logbuf_lock on NMI Petr Mladek
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 25+ messages in thread
From: Petr Mladek @ 2015-05-25 12:46 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Frederic Weisbecker, Steven Rostedt, Dave Anderson,
	Paul E. McKenney, Kay Sievers, Jiri Kosina, Michal Hocko,
	Jan Kara, linux-kernel, Wang Long, peifeiyue, dzickus,
	morgan.wang, sasha.levin, Petr Mladek

printk() cannot be used in NMI context safely because it uses an internal
lock and thus could cause a deadlock. This is fine because NMI context
is very special. The handlers should be short, effective, and do not
use printk().

But developers tend to print warnings even from NMI code. They are pretty
hard to debug when they lockup the machine and nothing appears in the logs.

This patch prevents from the deadlock on logbuf_lock by using trylock
rather than spin_lock. If the lock can not be taken, the message is
ignored and some warning is printed later on.

We also must not try to get console from NMI context. It needs
even more locks and there is even higher chance to hung up.

Unfortunately, we could not print more details about the lost message.
We could not alloc a buffer in NMI. Therefore we would need some
lockless mechanism to share a buffer between NMI and normal context.
But this would make printk() code much more complicated and
it is not worth it. There has already been an attempt to do so
and it has been rejected, see https://lkml.org/lkml/2014/6/10/388
This is also the reason why we use the atomic counter.

Hint: If anybody wants to know what NMI message is being lost,
trace_dump_stack(0) can be called when try_lock() fails. Then
trace_printk() could be used on that place.

This is only the basic logic. There will be some improvements in
the followup patches.

Signed-off-by: Petr Mladek <pmladek@suse.cz>
---
 kernel/printk/printk.c | 42 ++++++++++++++++++++++++++++++++++++++----
 1 file changed, 38 insertions(+), 4 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index c099b082cd02..94fcf6f0b542 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1615,6 +1615,7 @@ asmlinkage int vprintk_emit(int facility, int level,
 			    const char *fmt, va_list args)
 {
 	static int recursion_bug;
+	static atomic_t nmi_message_lost;
 	static char textbuf[LOG_LINE_MAX];
 	char *text = textbuf;
 	size_t text_len = 0;
@@ -1641,7 +1642,7 @@ asmlinkage int vprintk_emit(int facility, int level,
 	/*
 	 * Ouch, printk recursed into itself!
 	 */
-	if (unlikely(logbuf_cpu == this_cpu)) {
+	if (unlikely(logbuf_cpu == this_cpu) && !in_nmi()) {
 		/*
 		 * If a crash is occurring during printk() on this CPU,
 		 * then try to get the crash message out but make sure
@@ -1658,7 +1659,26 @@ asmlinkage int vprintk_emit(int facility, int level,
 	}
 
 	lockdep_off();
-	raw_spin_lock(&logbuf_lock);
+
+	/*
+	 * printk() should not be used in NMI context because the interrupt
+	 * might came when lockbuf_lock is taken. Such situation is hard
+	 * to debug. Therefore, we try to avoid the deadlock, use trylock
+	 * in NMI context, and later warn about lost messages.
+	 *
+	 * Hint: If you get into troubles, use trace_dump_stack(0)
+	 * to get location of the lost message and use trace_printk()
+	 * there.
+	 */
+	if (!in_nmi()) {
+		raw_spin_lock(&logbuf_lock);
+	} else if (!raw_spin_trylock(&logbuf_lock)) {
+		atomic_inc(&nmi_message_lost);
+		lockdep_on();
+		local_irq_restore(flags);
+		return 0;
+	}
+
 	logbuf_cpu = this_cpu;
 
 	if (unlikely(recursion_bug)) {
@@ -1672,6 +1692,17 @@ asmlinkage int vprintk_emit(int facility, int level,
 					 strlen(recursion_msg));
 	}
 
+	if (unlikely(atomic_read(&nmi_message_lost))) {
+		int lost = atomic_xchg(&nmi_message_lost, 0);
+
+		text_len = scnprintf(text, sizeof(textbuf),
+				     "BAD LUCK: lost %d message(s) from NMI context!",
+				     lost);
+		/* emit KERN_CRIT message */
+		printed_len += log_store(0, 2, LOG_PREFIX|LOG_NEWLINE, 0,
+					 NULL, 0, text, text_len);
+	}
+
 	/*
 	 * The printf needs to come first; we need the syslog
 	 * prefix which might be passed-in as a parameter.
@@ -1759,8 +1790,11 @@ asmlinkage int vprintk_emit(int facility, int level,
 	lockdep_on();
 	local_irq_restore(flags);
 
-	/* If called from the scheduler, we can not call up(). */
-	if (!in_sched) {
+	/*
+	 * If called from the scheduler or NMI context, we can not get console
+	 * without a possible deadlock.
+	 */
+	if (!in_sched && !in_nmi()) {
 		lockdep_off();
 		/*
 		 * Disable preemption to avoid being preempted while holding
-- 
1.8.5.6


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 02/10] printk: Try harder to get logbuf_lock on NMI
  2015-05-25 12:46 [PATCH 00/10] printk: Avoid deadlock in NMI + vprintk_emit() cleanup Petr Mladek
  2015-05-25 12:46 ` [PATCH 01/10] printk: Avoid deadlock in NMI context Petr Mladek
@ 2015-05-25 12:46 ` Petr Mladek
  2015-05-27 23:14   ` Andrew Morton
  2015-05-25 12:46 ` [PATCH 03/10] printk: Move the deferred printk stuff Petr Mladek
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 25+ messages in thread
From: Petr Mladek @ 2015-05-25 12:46 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Frederic Weisbecker, Steven Rostedt, Dave Anderson,
	Paul E. McKenney, Kay Sievers, Jiri Kosina, Michal Hocko,
	Jan Kara, linux-kernel, Wang Long, peifeiyue, dzickus,
	morgan.wang, sasha.levin, Petr Mladek

If the logbuf_lock is not available immediately, it does not mean
that there is a deadlock. We should try harder and wait a bit.

On the other hand, we must not forget that we are in NMI and the timeout
has to be rather small. It must not cause dangerous stalls.

I even got full system freeze when the timeout was 10ms and I printed
backtraces from all CPUs. In this case, all CPUs were blocked for
too long.

Signed-off-by: Petr Mladek <pmladek@suse.cz>
---
 kernel/printk/printk.c | 38 +++++++++++++++++++++++++++++++++++---
 1 file changed, 35 insertions(+), 3 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 94fcf6f0b542..e6c00d6ee8dc 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -231,6 +231,8 @@ static DEFINE_RAW_SPINLOCK(logbuf_lock);
 
 #ifdef CONFIG_PRINTK
 DECLARE_WAIT_QUEUE_HEAD(log_wait);
+/* cpu currently holding logbuf_lock */
+static unsigned int logbuf_cpu = UINT_MAX;
 /* the next printk record to read by syslog(READ) or /proc/kmsg */
 static u64 syslog_seq;
 static u32 syslog_idx;
@@ -1610,6 +1612,38 @@ static size_t cont_print_text(char *text, size_t size)
 	return textlen;
 }
 
+/*
+ * This value defines the maximum delay that we spend waiting for logbuf_lock
+ * in NMI context. 100us looks like a good compromise. Note that, for example,
+ * syslog_print_all() might hold the lock for quite some time. On the other
+ * hand, waiting 10ms caused system freeze when many backtraces were printed
+ * in NMI.
+ */
+#define TRY_LOCKBUF_LOCK_MAX_DELAY_NS 100000UL
+
+/* We must be careful in NMI when we managed to preempt a running printk */
+static int try_logbuf_lock_in_nmi(void)
+{
+	u64 start_time, current_time;
+	int this_cpu = smp_processor_id();
+
+	/* no way if we are already locked on this CPU */
+	if (logbuf_cpu == this_cpu)
+		return 0;
+
+	/* try hard to get the lock but do not wait forever */
+	start_time = cpu_clock(this_cpu);
+	current_time = start_time;
+	while (current_time - start_time < TRY_LOCKBUF_LOCK_MAX_DELAY_NS) {
+		if (raw_spin_trylock(&logbuf_lock))
+			return 1;
+		cpu_relax();
+		current_time = cpu_clock(this_cpu);
+	}
+
+	return 0;
+}
+
 asmlinkage int vprintk_emit(int facility, int level,
 			    const char *dict, size_t dictlen,
 			    const char *fmt, va_list args)
@@ -1624,8 +1658,6 @@ asmlinkage int vprintk_emit(int facility, int level,
 	int this_cpu;
 	int printed_len = 0;
 	bool in_sched = false;
-	/* cpu currently holding logbuf_lock in this function */
-	static unsigned int logbuf_cpu = UINT_MAX;
 
 	if (level == LOGLEVEL_SCHED) {
 		level = LOGLEVEL_DEFAULT;
@@ -1672,7 +1704,7 @@ asmlinkage int vprintk_emit(int facility, int level,
 	 */
 	if (!in_nmi()) {
 		raw_spin_lock(&logbuf_lock);
-	} else if (!raw_spin_trylock(&logbuf_lock)) {
+	} else if (!try_logbuf_lock_in_nmi()) {
 		atomic_inc(&nmi_message_lost);
 		lockdep_on();
 		local_irq_restore(flags);
-- 
1.8.5.6


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 03/10] printk: Move the deferred printk stuff
  2015-05-25 12:46 [PATCH 00/10] printk: Avoid deadlock in NMI + vprintk_emit() cleanup Petr Mladek
  2015-05-25 12:46 ` [PATCH 01/10] printk: Avoid deadlock in NMI context Petr Mladek
  2015-05-25 12:46 ` [PATCH 02/10] printk: Try harder to get logbuf_lock on NMI Petr Mladek
@ 2015-05-25 12:46 ` Petr Mladek
  2015-05-25 12:46 ` [PATCH 04/10] printk: Merge and flush NMI buffer predictably via IRQ work Petr Mladek
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 25+ messages in thread
From: Petr Mladek @ 2015-05-25 12:46 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Frederic Weisbecker, Steven Rostedt, Dave Anderson,
	Paul E. McKenney, Kay Sievers, Jiri Kosina, Michal Hocko,
	Jan Kara, linux-kernel, Wang Long, peifeiyue, dzickus,
	morgan.wang, sasha.levin, Petr Mladek

We will want to use the workqueue also in NMI because we could not take
the console safely there.

This patch just move the code up in the source file. There is no
other change.

Signed-off-by: Petr Mladek <pmladek@suse.cz>
---
 kernel/printk/printk.c | 108 ++++++++++++++++++++++++-------------------------
 1 file changed, 54 insertions(+), 54 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index e6c00d6ee8dc..bf2abdda5869 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1509,6 +1509,60 @@ static inline void printk_delay(void)
 }
 
 /*
+ * Delayed printk version, for scheduler-internal messages:
+ */
+#define PRINTK_PENDING_WAKEUP	0x01
+#define PRINTK_PENDING_OUTPUT	0x02
+
+static DEFINE_PER_CPU(int, printk_pending);
+
+static void wake_up_klogd_work_func(struct irq_work *irq_work)
+{
+	int pending = __this_cpu_xchg(printk_pending, 0);
+
+	if (pending & PRINTK_PENDING_OUTPUT) {
+		/* If trylock fails, someone else is doing the printing */
+		if (console_trylock())
+			console_unlock();
+	}
+
+	if (pending & PRINTK_PENDING_WAKEUP)
+		wake_up_interruptible(&log_wait);
+}
+
+static DEFINE_PER_CPU(struct irq_work, wake_up_klogd_work) = {
+	.func = wake_up_klogd_work_func,
+	.flags = IRQ_WORK_LAZY,
+};
+
+void wake_up_klogd(void)
+{
+	preempt_disable();
+	if (waitqueue_active(&log_wait)) {
+		this_cpu_or(printk_pending, PRINTK_PENDING_WAKEUP);
+		irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
+	}
+	preempt_enable();
+}
+
+int printk_deferred(const char *fmt, ...)
+{
+	va_list args;
+	int r;
+
+	preempt_disable();
+	va_start(args, fmt);
+	r = vprintk_emit(0, LOGLEVEL_SCHED, NULL, 0, fmt, args);
+	va_end(args);
+
+	__this_cpu_or(printk_pending, PRINTK_PENDING_OUTPUT);
+	irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
+	preempt_enable();
+
+	return r;
+}
+
+/*
  * Continuation lines are buffered, and not committed to the record buffer
  * until the line is complete, or a race forces it. The line fragments
  * though, are printed immediately to the consoles to ensure everything has
@@ -2665,60 +2719,6 @@ late_initcall(printk_late_init);
 
 #if defined CONFIG_PRINTK
 /*
- * Delayed printk version, for scheduler-internal messages:
- */
-#define PRINTK_PENDING_WAKEUP	0x01
-#define PRINTK_PENDING_OUTPUT	0x02
-
-static DEFINE_PER_CPU(int, printk_pending);
-
-static void wake_up_klogd_work_func(struct irq_work *irq_work)
-{
-	int pending = __this_cpu_xchg(printk_pending, 0);
-
-	if (pending & PRINTK_PENDING_OUTPUT) {
-		/* If trylock fails, someone else is doing the printing */
-		if (console_trylock())
-			console_unlock();
-	}
-
-	if (pending & PRINTK_PENDING_WAKEUP)
-		wake_up_interruptible(&log_wait);
-}
-
-static DEFINE_PER_CPU(struct irq_work, wake_up_klogd_work) = {
-	.func = wake_up_klogd_work_func,
-	.flags = IRQ_WORK_LAZY,
-};
-
-void wake_up_klogd(void)
-{
-	preempt_disable();
-	if (waitqueue_active(&log_wait)) {
-		this_cpu_or(printk_pending, PRINTK_PENDING_WAKEUP);
-		irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
-	}
-	preempt_enable();
-}
-
-int printk_deferred(const char *fmt, ...)
-{
-	va_list args;
-	int r;
-
-	preempt_disable();
-	va_start(args, fmt);
-	r = vprintk_emit(0, LOGLEVEL_SCHED, NULL, 0, fmt, args);
-	va_end(args);
-
-	__this_cpu_or(printk_pending, PRINTK_PENDING_OUTPUT);
-	irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
-	preempt_enable();
-
-	return r;
-}
-
-/*
  * printk rate limiting, lifted from the networking subsystem.
  *
  * This enforces a rate limit: not more than 10 kernel messages
-- 
1.8.5.6


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 04/10] printk: Merge and flush NMI buffer predictably via IRQ work
  2015-05-25 12:46 [PATCH 00/10] printk: Avoid deadlock in NMI + vprintk_emit() cleanup Petr Mladek
                   ` (2 preceding siblings ...)
  2015-05-25 12:46 ` [PATCH 03/10] printk: Move the deferred printk stuff Petr Mladek
@ 2015-05-25 12:46 ` Petr Mladek
  2015-05-27 23:14   ` Andrew Morton
  2015-05-25 12:46 ` [PATCH 05/10] printk: Try hard to print Oops message in NMI context Petr Mladek
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 25+ messages in thread
From: Petr Mladek @ 2015-05-25 12:46 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Frederic Weisbecker, Steven Rostedt, Dave Anderson,
	Paul E. McKenney, Kay Sievers, Jiri Kosina, Michal Hocko,
	Jan Kara, linux-kernel, Wang Long, peifeiyue, dzickus,
	morgan.wang, sasha.levin, Petr Mladek

It might take ages until users see messages from NMI context. They cannot
be flushed to the console because the operation involves taking and
releasing a bunch of locks. Everything gets fixed by the followup printk
in normal context but it is not predictable.

The same problem has printk_sched() and this patch reuses the existing
solution.

There is no special printk() variant for NMI context. Hence the IRQ work
need to get queued from vprintk_emit().

Signed-off-by: Petr Mladek <pmladek@suse.cz>
---
 kernel/printk/printk.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index bf2abdda5869..c2ae9ff388ae 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1554,9 +1554,6 @@ int printk_deferred(const char *fmt, ...)
 	va_start(args, fmt);
 	r = vprintk_emit(0, LOGLEVEL_SCHED, NULL, 0, fmt, args);
 	va_end(args);
-
-	__this_cpu_or(printk_pending, PRINTK_PENDING_OUTPUT);
-	irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
 	preempt_enable();
 
 	return r;
@@ -1880,7 +1877,10 @@ asmlinkage int vprintk_emit(int facility, int level,
 	 * If called from the scheduler or NMI context, we can not get console
 	 * without a possible deadlock.
 	 */
-	if (!in_sched && !in_nmi()) {
+	if (in_sched || in_nmi()) {
+		__this_cpu_or(printk_pending, PRINTK_PENDING_OUTPUT);
+		irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
+	} else {
 		lockdep_off();
 		/*
 		 * Disable preemption to avoid being preempted while holding
-- 
1.8.5.6


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 05/10] printk: Try hard to print Oops message in NMI context
  2015-05-25 12:46 [PATCH 00/10] printk: Avoid deadlock in NMI + vprintk_emit() cleanup Petr Mladek
                   ` (3 preceding siblings ...)
  2015-05-25 12:46 ` [PATCH 04/10] printk: Merge and flush NMI buffer predictably via IRQ work Petr Mladek
@ 2015-05-25 12:46 ` Petr Mladek
  2015-05-25 12:46 ` [PATCH 06/10] printk: Split delayed printing of warnings from vprintk_emit() Petr Mladek
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 25+ messages in thread
From: Petr Mladek @ 2015-05-25 12:46 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Frederic Weisbecker, Steven Rostedt, Dave Anderson,
	Paul E. McKenney, Kay Sievers, Jiri Kosina, Michal Hocko,
	Jan Kara, linux-kernel, Wang Long, peifeiyue, dzickus,
	morgan.wang, sasha.levin, Petr Mladek

Oops messages are important for debugging. We should try harder to get
them into the ring buffer and print them to the console. This is
problematic in NMI context because the needed locks might
already be taken.

What we can do, though, is to zap all printk locks. We already do this
when a printk recursion is detected. This should be safe because
the system is crashing and there shouldn't be any printk caller
that would cause the deadlock.

Signed-off-by: Petr Mladek <pmladek@suse.cz>
---
 kernel/printk/printk.c | 19 +++++++++++++++----
 1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index c2ae9ff388ae..8fb0aaaa6258 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1672,14 +1672,20 @@ static size_t cont_print_text(char *text, size_t size)
  */
 #define TRY_LOCKBUF_LOCK_MAX_DELAY_NS 100000UL
 
-/* We must be careful in NMI when we managed to preempt a running printk */
+/*
+ * We must be careful in NMI when we managed to preempt a running printk.
+ * Special case are Oops messages from NMI context. We try hard to print
+ * them and forcefully drop existing locks.
+ */
 static int try_logbuf_lock_in_nmi(void)
 {
 	u64 start_time, current_time;
 	int this_cpu = smp_processor_id();
 
-	/* no way if we are already locked on this CPU */
-	if (logbuf_cpu == this_cpu)
+	if (oops_in_progress)
+		zap_locks();
+	else if (logbuf_cpu == this_cpu)
+		/* no way if we are already locked on this CPU */
 		return 0;
 
 	/* try hard to get the lock but do not wait forever */
@@ -1876,8 +1882,13 @@ asmlinkage int vprintk_emit(int facility, int level,
 	/*
 	 * If called from the scheduler or NMI context, we can not get console
 	 * without a possible deadlock.
+	 *
+	 * The only exception are Oops messages from NMI context where all
+	 * relevant locks have been forcefully dropped in
+	 * try_logbuf_lock_in_nmi(). We have to try to get the console,
+	 * otherwise the last messages would get lost.
 	 */
-	if (in_sched || in_nmi()) {
+	if (in_sched || (in_nmi() && !oops_in_progress)) {
 		__this_cpu_or(printk_pending, PRINTK_PENDING_OUTPUT);
 		irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
 	} else {
-- 
1.8.5.6


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 06/10] printk: Split delayed printing of warnings from vprintk_emit()
  2015-05-25 12:46 [PATCH 00/10] printk: Avoid deadlock in NMI + vprintk_emit() cleanup Petr Mladek
                   ` (4 preceding siblings ...)
  2015-05-25 12:46 ` [PATCH 05/10] printk: Try hard to print Oops message in NMI context Petr Mladek
@ 2015-05-25 12:46 ` Petr Mladek
  2015-05-25 12:46 ` [PATCH 07/10] printk: Split text formatting and analyze " Petr Mladek
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 25+ messages in thread
From: Petr Mladek @ 2015-05-25 12:46 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Frederic Weisbecker, Steven Rostedt, Dave Anderson,
	Paul E. McKenney, Kay Sievers, Jiri Kosina, Michal Hocko,
	Jan Kara, linux-kernel, Wang Long, peifeiyue, dzickus,
	morgan.wang, sasha.levin, Petr Mladek

printk_emit() is too long. Let's split the delayed printing of warnings
into a separate function.

This patch just shuffles the code. There is no change in the functionality.

Signed-off-by: Petr Mladek <pmladek@suse.cz>
---
 kernel/printk/printk.c | 64 +++++++++++++++++++++++++++++++-------------------
 1 file changed, 40 insertions(+), 24 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 8fb0aaaa6258..91b8043044ac 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -233,6 +233,9 @@ static DEFINE_RAW_SPINLOCK(logbuf_lock);
 DECLARE_WAIT_QUEUE_HEAD(log_wait);
 /* cpu currently holding logbuf_lock */
 static unsigned int logbuf_cpu = UINT_MAX;
+/* flags used for delayed printing of warnings */
+static int recursion_bug;
+static atomic_t nmi_message_lost;
 /* the next printk record to read by syslog(READ) or /proc/kmsg */
 static u64 syslog_seq;
 static u32 syslog_idx;
@@ -269,6 +272,8 @@ static u32 clear_idx;
 static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN);
 static char *log_buf = __log_buf;
 static u32 log_buf_len = __LOG_BUF_LEN;
+/* helper buffer to print formatted text */
+static char textbuf[LOG_LINE_MAX];
 
 /* Return log buffer address */
 char *log_buf_addr_get(void)
@@ -1701,13 +1706,44 @@ static int try_logbuf_lock_in_nmi(void)
 	return 0;
 }
 
+/*
+ * This function modifies the global textbuf and therefore it must be called
+ * under lockbuf_lock!
+ */
+static int vprintk_delayed_warnings(void)
+{
+	int printed_len = 0;
+	int text_len;
+
+	if (unlikely(recursion_bug)) {
+		static const char recursion_msg[] =
+			"BUG: recent printk recursion!";
+
+		recursion_bug = 0;
+		/* emit KERN_CRIT message */
+		printed_len += log_store(0, 2, LOG_PREFIX|LOG_NEWLINE, 0,
+					 NULL, 0, recursion_msg,
+					 strlen(recursion_msg));
+	}
+
+	if (unlikely(atomic_read(&nmi_message_lost))) {
+		int lost = atomic_xchg(&nmi_message_lost, 0);
+
+		text_len = scnprintf(textbuf, sizeof(textbuf),
+				     "BAD LUCK: lost %d message(s) from NMI context!",
+				     lost);
+		/* emit KERN_CRIT message */
+		printed_len += log_store(0, 2, LOG_PREFIX|LOG_NEWLINE, 0,
+					 NULL, 0, textbuf, text_len);
+	}
+
+	return printed_len;
+}
+
 asmlinkage int vprintk_emit(int facility, int level,
 			    const char *dict, size_t dictlen,
 			    const char *fmt, va_list args)
 {
-	static int recursion_bug;
-	static atomic_t nmi_message_lost;
-	static char textbuf[LOG_LINE_MAX];
 	char *text = textbuf;
 	size_t text_len = 0;
 	enum log_flags lflags = 0;
@@ -1770,27 +1806,7 @@ asmlinkage int vprintk_emit(int facility, int level,
 
 	logbuf_cpu = this_cpu;
 
-	if (unlikely(recursion_bug)) {
-		static const char recursion_msg[] =
-			"BUG: recent printk recursion!";
-
-		recursion_bug = 0;
-		/* emit KERN_CRIT message */
-		printed_len += log_store(0, 2, LOG_PREFIX|LOG_NEWLINE, 0,
-					 NULL, 0, recursion_msg,
-					 strlen(recursion_msg));
-	}
-
-	if (unlikely(atomic_read(&nmi_message_lost))) {
-		int lost = atomic_xchg(&nmi_message_lost, 0);
-
-		text_len = scnprintf(text, sizeof(textbuf),
-				     "BAD LUCK: lost %d message(s) from NMI context!",
-				     lost);
-		/* emit KERN_CRIT message */
-		printed_len += log_store(0, 2, LOG_PREFIX|LOG_NEWLINE, 0,
-					 NULL, 0, text, text_len);
-	}
+	printed_len += vprintk_delayed_warnings();
 
 	/*
 	 * The printf needs to come first; we need the syslog
-- 
1.8.5.6


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 07/10] printk: Split text formatting and analyze from vprintk_emit()
  2015-05-25 12:46 [PATCH 00/10] printk: Avoid deadlock in NMI + vprintk_emit() cleanup Petr Mladek
                   ` (5 preceding siblings ...)
  2015-05-25 12:46 ` [PATCH 06/10] printk: Split delayed printing of warnings from vprintk_emit() Petr Mladek
@ 2015-05-25 12:46 ` Petr Mladek
  2015-05-25 12:46 ` [PATCH 08/10] printk: Detect scheduler messages in vprintk_format_and_analyze() Petr Mladek
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 25+ messages in thread
From: Petr Mladek @ 2015-05-25 12:46 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Frederic Weisbecker, Steven Rostedt, Dave Anderson,
	Paul E. McKenney, Kay Sievers, Jiri Kosina, Michal Hocko,
	Jan Kara, linux-kernel, Wang Long, peifeiyue, dzickus,
	morgan.wang, sasha.levin, Petr Mladek

vprintk_emit() is too long. Let's move the text formatting and
the text analyze into a separate function.

I am not super happy with the result. The primary aim is to reduce
future mistakes when backporting fixes in this code to older stable
kernel releases. Therefore I want to keep the changes in the code
at minimum. Especially, I want to avoid conversion of @text into
"char **" because it is harder to read.

Well, there is one exception. @level is accessed via a pointer.
It looks safe because compiler warns about type mismatch when
the access is not updated. Also it used only in a very
simple code.

Better ideas are welcome!

This patch just shuffles the code. There is no change in
the functionality.

Signed-off-by: Petr Mladek <pmladek@suse.cz>
---
 kernel/printk/printk.c | 122 +++++++++++++++++++++++++++++++------------------
 1 file changed, 78 insertions(+), 44 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 91b8043044ac..7d009144f97f 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1740,13 +1740,85 @@ static int vprintk_delayed_warnings(void)
 	return printed_len;
 }
 
+/**
+ * vprintk_format_and_analyze - format and analyze the text message
+ * @fmt: text format
+ * @args: argumetns used to generate the formated text
+ * @dict: pointer to the message dictionary
+ * @facility: syslog facility
+ * @level: syslog level; might be replaced by the one found in the fomatted text
+ * @ftext: pointer to the formatted text after the syslog prefix
+ * @ftext_len: length of the formatted text without the syslog prefix
+ *
+ * This function modifies the global textbuf and therefore it must be called
+ * under lockbuf_lock!
+ */
+static enum log_flags vprinkt_format_and_analyze(const char *fmt, va_list args,
+						 const char *dict, int facility,
+						 int *level,
+						 char **ftext,
+						 size_t *ftext_len)
+{
+	char *text = textbuf;
+	size_t text_len;
+	enum log_flags lflags = 0;
+
+	/*
+	 * The printf needs to come first; we need the syslog
+	 * prefix which might be passed-in as a parameter.
+	 */
+	text_len = vscnprintf(text, sizeof(textbuf), fmt, args);
+
+	/* mark and strip a trailing newline */
+	if (text_len && text[text_len-1] == '\n') {
+		text_len--;
+		lflags |= LOG_NEWLINE;
+	}
+
+	/* strip kernel syslog prefix and extract log level or control flags */
+	if (facility == 0) {
+		int kern_level = printk_get_level(text);
+
+		if (kern_level) {
+			const char *end_of_header = printk_skip_level(text);
+
+			switch (kern_level) {
+			case '0' ... '7':
+				if (*level == LOGLEVEL_DEFAULT)
+					*level = kern_level - '0';
+				/* fallthrough */
+			case 'd':	/* KERN_DEFAULT */
+				lflags |= LOG_PREFIX;
+			}
+			/*
+			 * No need to check length here because vscnprintf
+			 * put '\0' at the end of the string. Only valid and
+			 * newly printed level is detected.
+			 */
+			text_len -= end_of_header - text;
+			text = (char *)end_of_header;
+		}
+	}
+
+	if (*level == LOGLEVEL_DEFAULT)
+		*level = default_message_loglevel;
+
+	if (dict)
+		lflags |= LOG_PREFIX|LOG_NEWLINE;
+
+	*ftext = text;
+	*ftext_len = text_len;
+
+	return lflags;
+}
+
 asmlinkage int vprintk_emit(int facility, int level,
 			    const char *dict, size_t dictlen,
 			    const char *fmt, va_list args)
 {
-	char *text = textbuf;
-	size_t text_len = 0;
-	enum log_flags lflags = 0;
+	char *text;
+	size_t text_len;
+	enum log_flags lflags;
 	unsigned long flags;
 	int this_cpu;
 	int printed_len = 0;
@@ -1808,47 +1880,9 @@ asmlinkage int vprintk_emit(int facility, int level,
 
 	printed_len += vprintk_delayed_warnings();
 
-	/*
-	 * The printf needs to come first; we need the syslog
-	 * prefix which might be passed-in as a parameter.
-	 */
-	text_len = vscnprintf(text, sizeof(textbuf), fmt, args);
-
-	/* mark and strip a trailing newline */
-	if (text_len && text[text_len-1] == '\n') {
-		text_len--;
-		lflags |= LOG_NEWLINE;
-	}
-
-	/* strip kernel syslog prefix and extract log level or control flags */
-	if (facility == 0) {
-		int kern_level = printk_get_level(text);
-
-		if (kern_level) {
-			const char *end_of_header = printk_skip_level(text);
-			switch (kern_level) {
-			case '0' ... '7':
-				if (level == LOGLEVEL_DEFAULT)
-					level = kern_level - '0';
-				/* fallthrough */
-			case 'd':	/* KERN_DEFAULT */
-				lflags |= LOG_PREFIX;
-			}
-			/*
-			 * No need to check length here because vscnprintf
-			 * put '\0' at the end of the string. Only valid and
-			 * newly printed level is detected.
-			 */
-			text_len -= end_of_header - text;
-			text = (char *)end_of_header;
-		}
-	}
-
-	if (level == LOGLEVEL_DEFAULT)
-		level = default_message_loglevel;
-
-	if (dict)
-		lflags |= LOG_PREFIX|LOG_NEWLINE;
+	lflags = vprinkt_format_and_analyze(fmt, args, dict, facility,
+					    &level,
+					    &text, &text_len);
 
 	if (!(lflags & LOG_NEWLINE)) {
 		/*
-- 
1.8.5.6


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 08/10] printk: Detect scheduler messages in vprintk_format_and_analyze()
  2015-05-25 12:46 [PATCH 00/10] printk: Avoid deadlock in NMI + vprintk_emit() cleanup Petr Mladek
                   ` (6 preceding siblings ...)
  2015-05-25 12:46 ` [PATCH 07/10] printk: Split text formatting and analyze " Petr Mladek
@ 2015-05-25 12:46 ` Petr Mladek
  2015-05-25 12:46 ` [PATCH 09/10] printk: Split text storing logic from vprintk_emit() Petr Mladek
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 25+ messages in thread
From: Petr Mladek @ 2015-05-25 12:46 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Frederic Weisbecker, Steven Rostedt, Dave Anderson,
	Paul E. McKenney, Kay Sievers, Jiri Kosina, Michal Hocko,
	Jan Kara, linux-kernel, Wang Long, peifeiyue, dzickus,
	morgan.wang, sasha.levin, Petr Mladek

The special hack for messages printed by the scheduler belongs to
the new vprintk_format_and_analyze().

This patch just shuffles the code. There is no change in
the functionality.

Signed-off-by: Petr Mladek <pmladek@suse.cz>
---
 kernel/printk/printk.c | 23 ++++++++++++++---------
 1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 7d009144f97f..6e53b6f60ca3 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1747,15 +1747,16 @@ static int vprintk_delayed_warnings(void)
  * @dict: pointer to the message dictionary
  * @facility: syslog facility
  * @level: syslog level; might be replaced by the one found in the fomatted text
+ * @in_sched: is set to true when the message is printed by the scheduler
  * @ftext: pointer to the formatted text after the syslog prefix
  * @ftext_len: length of the formatted text without the syslog prefix
  *
  * This function modifies the global textbuf and therefore it must be called
  * under lockbuf_lock!
  */
-static enum log_flags vprinkt_format_and_analyze(const char *fmt, va_list args,
+static enum log_flags vprintk_format_and_analyze(const char *fmt, va_list args,
 						 const char *dict, int facility,
-						 int *level,
+						 int *level, bool *in_sched,
 						 char **ftext,
 						 size_t *ftext_len)
 {
@@ -1775,6 +1776,15 @@ static enum log_flags vprinkt_format_and_analyze(const char *fmt, va_list args,
 		lflags |= LOG_NEWLINE;
 	}
 
+	/*
+	 * Messages printed by the scheduler must not wake up the console.
+	 * They are detected via a special @level, see printk_deferred().
+	 */
+	if (*level == LOGLEVEL_SCHED) {
+		*level = LOGLEVEL_DEFAULT;
+		*in_sched = true;
+	}
+
 	/* strip kernel syslog prefix and extract log level or control flags */
 	if (facility == 0) {
 		int kern_level = printk_get_level(text);
@@ -1824,11 +1834,6 @@ asmlinkage int vprintk_emit(int facility, int level,
 	int printed_len = 0;
 	bool in_sched = false;
 
-	if (level == LOGLEVEL_SCHED) {
-		level = LOGLEVEL_DEFAULT;
-		in_sched = true;
-	}
-
 	boot_delay_msec(level);
 	printk_delay();
 
@@ -1880,8 +1885,8 @@ asmlinkage int vprintk_emit(int facility, int level,
 
 	printed_len += vprintk_delayed_warnings();
 
-	lflags = vprinkt_format_and_analyze(fmt, args, dict, facility,
-					    &level,
+	lflags = vprintk_format_and_analyze(fmt, args, dict, facility,
+					    &level, &in_sched,
 					    &text, &text_len);
 
 	if (!(lflags & LOG_NEWLINE)) {
-- 
1.8.5.6


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 09/10] printk: Split text storing logic from vprintk_emit()
  2015-05-25 12:46 [PATCH 00/10] printk: Avoid deadlock in NMI + vprintk_emit() cleanup Petr Mladek
                   ` (7 preceding siblings ...)
  2015-05-25 12:46 ` [PATCH 08/10] printk: Detect scheduler messages in vprintk_format_and_analyze() Petr Mladek
@ 2015-05-25 12:46 ` Petr Mladek
  2015-05-25 12:46 ` [PATCH 10/10] printk: Split console call " Petr Mladek
  2015-05-29 20:50 ` [PATCH 00/10] printk: Avoid deadlock in NMI + vprintk_emit() cleanup Andrew Morton
  10 siblings, 0 replies; 25+ messages in thread
From: Petr Mladek @ 2015-05-25 12:46 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Frederic Weisbecker, Steven Rostedt, Dave Anderson,
	Paul E. McKenney, Kay Sievers, Jiri Kosina, Michal Hocko,
	Jan Kara, linux-kernel, Wang Long, peifeiyue, dzickus,
	morgan.wang, sasha.levin, Petr Mladek

vprintk_emit() is too long. Let's split the code that stores the actual
text buffer with respect to the cont buffer and the detected log flags.

This patch just shuffles the code. There is no change in
the functionality.

Signed-off-by: Petr Mladek <pmladek@suse.cz>
---
 kernel/printk/printk.c | 96 ++++++++++++++++++++++++++++++--------------------
 1 file changed, 57 insertions(+), 39 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 6e53b6f60ca3..c0b97653987a 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1822,6 +1822,61 @@ static enum log_flags vprintk_format_and_analyze(const char *fmt, va_list args,
 	return lflags;
 }
 
+/*
+ * This function stores the given text into the log buffer with respect
+ * to the cont buffer and the text flags.
+ *
+ * It must be called under lockbuf_lock!
+ */
+static int vprintk_store_text(int facility, int level, enum log_flags lflags,
+			      const char *dict, size_t dictlen,
+			      const char *text, size_t text_len)
+{
+	int printed_len = 0;
+
+	if (!(lflags & LOG_NEWLINE)) {
+		/*
+		 * Flush the conflicting buffer. An earlier newline was missing,
+		 * or another task also prints continuation lines.
+		 */
+		if (cont.len && (lflags & LOG_PREFIX || cont.owner != current))
+			cont_flush(LOG_NEWLINE);
+
+		/* buffer line if possible, otherwise store it right away */
+		if (cont_add(facility, level, text, text_len))
+			printed_len += text_len;
+		else
+			printed_len += log_store(facility, level,
+						 lflags | LOG_CONT, 0,
+						 dict, dictlen, text, text_len);
+	} else {
+		bool stored = false;
+
+		/*
+		 * If an earlier newline was missing and it was the same task,
+		 * either merge it with the current buffer and flush, or if
+		 * there was a race with interrupts (prefix == true) then just
+		 * flush it out and store this line separately.
+		 * If the preceding printk was from a different task and missed
+		 * a newline, flush and append the newline.
+		 */
+		if (cont.len) {
+			if (cont.owner == current && !(lflags & LOG_PREFIX))
+				stored = cont_add(facility, level, text,
+						  text_len);
+			cont_flush(LOG_NEWLINE);
+		}
+
+		if (stored)
+			printed_len += text_len;
+		else
+			printed_len += log_store(facility, level, lflags, 0,
+						 dict, dictlen, text, text_len);
+	}
+
+	return printed_len;
+}
+
 asmlinkage int vprintk_emit(int facility, int level,
 			    const char *dict, size_t dictlen,
 			    const char *fmt, va_list args)
@@ -1889,45 +1944,8 @@ asmlinkage int vprintk_emit(int facility, int level,
 					    &level, &in_sched,
 					    &text, &text_len);
 
-	if (!(lflags & LOG_NEWLINE)) {
-		/*
-		 * Flush the conflicting buffer. An earlier newline was missing,
-		 * or another task also prints continuation lines.
-		 */
-		if (cont.len && (lflags & LOG_PREFIX || cont.owner != current))
-			cont_flush(LOG_NEWLINE);
-
-		/* buffer line if possible, otherwise store it right away */
-		if (cont_add(facility, level, text, text_len))
-			printed_len += text_len;
-		else
-			printed_len += log_store(facility, level,
-						 lflags | LOG_CONT, 0,
-						 dict, dictlen, text, text_len);
-	} else {
-		bool stored = false;
-
-		/*
-		 * If an earlier newline was missing and it was the same task,
-		 * either merge it with the current buffer and flush, or if
-		 * there was a race with interrupts (prefix == true) then just
-		 * flush it out and store this line separately.
-		 * If the preceding printk was from a different task and missed
-		 * a newline, flush and append the newline.
-		 */
-		if (cont.len) {
-			if (cont.owner == current && !(lflags & LOG_PREFIX))
-				stored = cont_add(facility, level, text,
-						  text_len);
-			cont_flush(LOG_NEWLINE);
-		}
-
-		if (stored)
-			printed_len += text_len;
-		else
-			printed_len += log_store(facility, level, lflags, 0,
-						 dict, dictlen, text, text_len);
-	}
+	printed_len += vprintk_store_text(facility, level, lflags,
+					  dict, dictlen, text, text_len);
 
 	logbuf_cpu = UINT_MAX;
 	raw_spin_unlock(&logbuf_lock);
-- 
1.8.5.6


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 10/10] printk: Split console call from vprintk_emit()
  2015-05-25 12:46 [PATCH 00/10] printk: Avoid deadlock in NMI + vprintk_emit() cleanup Petr Mladek
                   ` (8 preceding siblings ...)
  2015-05-25 12:46 ` [PATCH 09/10] printk: Split text storing logic from vprintk_emit() Petr Mladek
@ 2015-05-25 12:46 ` Petr Mladek
  2015-05-29 20:50 ` [PATCH 00/10] printk: Avoid deadlock in NMI + vprintk_emit() cleanup Andrew Morton
  10 siblings, 0 replies; 25+ messages in thread
From: Petr Mladek @ 2015-05-25 12:46 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Frederic Weisbecker, Steven Rostedt, Dave Anderson,
	Paul E. McKenney, Kay Sievers, Jiri Kosina, Michal Hocko,
	Jan Kara, linux-kernel, Wang Long, peifeiyue, dzickus,
	morgan.wang, sasha.levin, Petr Mladek

vprintk_emit() is too long. Let's split the logic about when and how
to call the console into a separate function.

Well, it is not only about console but also about /dev/kmsg and syslog().
Hence the generic name vprintk_poke_loggers() seems to be more appropriate.

This patch just shuffles the code. There is no change in the functionality.

Signed-off-by: Petr Mladek <pmladek@suse.cz>
---
 kernel/printk/printk.c | 72 ++++++++++++++++++++++++++++----------------------
 1 file changed, 41 insertions(+), 31 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index c0b97653987a..fb4af49c2d4e 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1877,6 +1877,46 @@ static int vprintk_store_text(int facility, int level, enum log_flags lflags,
 	return printed_len;
 }
 
+/*
+ * Try to get console, wake up logging services, and flush the last
+ * messages there.
+ */
+static void vprintk_poke_loggers(bool in_sched)
+{
+	/*
+	 * If called from the scheduler or NMI context, we can not get console
+	 * without a possible deadlock. In this case, we must deffer the task
+	 * via IRQ work.
+	 *
+	 * The only exception are Oops messages from NMI context where all
+	 * relevant locks have been forcefully dropped in
+	 * try_logbuf_lock_in_nmi(). We have to try to get the console,
+	 * otherwise the last messages would get lost.
+	 */
+	if (in_sched || (in_nmi() && !oops_in_progress)) {
+		__this_cpu_or(printk_pending, PRINTK_PENDING_OUTPUT);
+		irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
+	} else {
+		lockdep_off();
+		/*
+		 * Disable preemption to avoid being preempted while holding
+		 * console_sem which would prevent anyone from printing to
+		 * console
+		 */
+		preempt_disable();
+
+		/*
+		 * Try to acquire and then immediately release the console
+		 * semaphore.  The release will print out buffers and wake up
+		 * /dev/kmsg and syslog() users.
+		 */
+		if (console_trylock_for_printk())
+			console_unlock();
+		preempt_enable();
+		lockdep_on();
+	}
+}
+
 asmlinkage int vprintk_emit(int facility, int level,
 			    const char *dict, size_t dictlen,
 			    const char *fmt, va_list args)
@@ -1952,37 +1992,7 @@ asmlinkage int vprintk_emit(int facility, int level,
 	lockdep_on();
 	local_irq_restore(flags);
 
-	/*
-	 * If called from the scheduler or NMI context, we can not get console
-	 * without a possible deadlock.
-	 *
-	 * The only exception are Oops messages from NMI context where all
-	 * relevant locks have been forcefully dropped in
-	 * try_logbuf_lock_in_nmi(). We have to try to get the console,
-	 * otherwise the last messages would get lost.
-	 */
-	if (in_sched || (in_nmi() && !oops_in_progress)) {
-		__this_cpu_or(printk_pending, PRINTK_PENDING_OUTPUT);
-		irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
-	} else {
-		lockdep_off();
-		/*
-		 * Disable preemption to avoid being preempted while holding
-		 * console_sem which would prevent anyone from printing to
-		 * console
-		 */
-		preempt_disable();
-
-		/*
-		 * Try to acquire and then immediately release the console
-		 * semaphore.  The release will print out buffers and wake up
-		 * /dev/kmsg and syslog() users.
-		 */
-		if (console_trylock_for_printk())
-			console_unlock();
-		preempt_enable();
-		lockdep_on();
-	}
+	vprintk_poke_loggers(in_sched);
 
 	return printed_len;
 }
-- 
1.8.5.6


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 01/10] printk: Avoid deadlock in NMI context
  2015-05-25 12:46 ` [PATCH 01/10] printk: Avoid deadlock in NMI context Petr Mladek
@ 2015-05-27 23:13   ` Andrew Morton
  2015-05-28 12:00     ` Petr Mladek
  0 siblings, 1 reply; 25+ messages in thread
From: Andrew Morton @ 2015-05-27 23:13 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Frederic Weisbecker, Steven Rostedt, Dave Anderson,
	Paul E. McKenney, Kay Sievers, Jiri Kosina, Michal Hocko,
	Jan Kara, linux-kernel, Wang Long, peifeiyue, dzickus,
	morgan.wang, sasha.levin

On Mon, 25 May 2015 14:46:24 +0200 Petr Mladek <pmladek@suse.cz> wrote:

> printk() cannot be used in NMI context safely because it uses an internal
> lock and thus could cause a deadlock. This is fine because NMI context
> is very special. The handlers should be short, effective, and do not
> use printk().
> 
> But developers tend to print warnings even from NMI code. They are pretty
> hard to debug when they lockup the machine and nothing appears in the logs.
> 
> This patch prevents from the deadlock on logbuf_lock by using trylock
> rather than spin_lock. If the lock can not be taken, the message is
> ignored and some warning is printed later on.
> 
> We also must not try to get console from NMI context. It needs
> even more locks and there is even higher chance to hung up.
> 
> Unfortunately, we could not print more details about the lost message.
> We could not alloc a buffer in NMI. Therefore we would need some
> lockless mechanism to share a buffer between NMI and normal context.
> But this would make printk() code much more complicated and
> it is not worth it. There has already been an attempt to do so
> and it has been rejected, see https://lkml.org/lkml/2014/6/10/388
> This is also the reason why we use the atomic counter.

hm, I expect it wouldn't be too messy to shove the text into a static
per-cpu buffer.  So we at least get a few hundred bytes of stuff.

Or maybe just save the address of the control string perhaps, so the
message later comes out without any of its %thingies filled in.  That's
racy against rmmod and requires that the control string be in static
storage (which is basically always true).  Of course, it might just be
"%s" ;).  Forget I suggested this.


> +		/* emit KERN_CRIT message */
> +		printed_len += log_store(0, 2, LOG_PREFIX|LOG_NEWLINE, 0,
  +					 NULL, 0, text, text_len);

s/2/LOGLEVEL_CRIT/

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 02/10] printk: Try harder to get logbuf_lock on NMI
  2015-05-25 12:46 ` [PATCH 02/10] printk: Try harder to get logbuf_lock on NMI Petr Mladek
@ 2015-05-27 23:14   ` Andrew Morton
  2015-05-28  7:54     ` Jiri Kosina
  2015-05-28 13:50     ` Petr Mladek
  0 siblings, 2 replies; 25+ messages in thread
From: Andrew Morton @ 2015-05-27 23:14 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Frederic Weisbecker, Steven Rostedt, Dave Anderson,
	Paul E. McKenney, Kay Sievers, Jiri Kosina, Michal Hocko,
	Jan Kara, linux-kernel, Wang Long, peifeiyue, dzickus,
	morgan.wang, sasha.levin

On Mon, 25 May 2015 14:46:25 +0200 Petr Mladek <pmladek@suse.cz> wrote:

> If the logbuf_lock is not available immediately, it does not mean
> that there is a deadlock. We should try harder and wait a bit.
> 
> On the other hand, we must not forget that we are in NMI and the timeout
> has to be rather small. It must not cause dangerous stalls.
> 
> I even got full system freeze when the timeout was 10ms and I printed
> backtraces from all CPUs. In this case, all CPUs were blocked for
> too long.
> 
> ...
>
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -231,6 +231,8 @@ static DEFINE_RAW_SPINLOCK(logbuf_lock);
>  
>  #ifdef CONFIG_PRINTK
>  DECLARE_WAIT_QUEUE_HEAD(log_wait);
> +/* cpu currently holding logbuf_lock */
> +static unsigned int logbuf_cpu = UINT_MAX;
>  /* the next printk record to read by syslog(READ) or /proc/kmsg */
>  static u64 syslog_seq;
>  static u32 syslog_idx;
> @@ -1610,6 +1612,38 @@ static size_t cont_print_text(char *text, size_t size)
>  	return textlen;
>  }
>  
> +/*
> + * This value defines the maximum delay that we spend waiting for logbuf_lock
> + * in NMI context. 100us looks like a good compromise. Note that, for example,
> + * syslog_print_all() might hold the lock for quite some time. On the other
> + * hand, waiting 10ms caused system freeze when many backtraces were printed
> + * in NMI.
> + */
> +#define TRY_LOCKBUF_LOCK_MAX_DELAY_NS 100000UL
> +
> +/* We must be careful in NMI when we managed to preempt a running printk */
> +static int try_logbuf_lock_in_nmi(void)
> +{
> +	u64 start_time, current_time;
> +	int this_cpu = smp_processor_id();
> +
> +	/* no way if we are already locked on this CPU */
> +	if (logbuf_cpu == this_cpu)
> +		return 0;
> +
> +	/* try hard to get the lock but do not wait forever */
> +	start_time = cpu_clock(this_cpu);
> +	current_time = start_time;
> +	while (current_time - start_time < TRY_LOCKBUF_LOCK_MAX_DELAY_NS) {
> +		if (raw_spin_trylock(&logbuf_lock))
> +			return 1;
> +		cpu_relax();
> +		current_time = cpu_clock(this_cpu);
> +	}

(Looks at the read_seqcount_retry() in
kernel/time/sched_clock.c:sched_clock())

Running cpu_clock() in NMI context seems a generally bad idea.  Using
an ndelay/udelay wait loop here would be safer?


There are many sites in kernel/printk/printk.c which take logbuf_lock,
but this patch only sets logbuf_cpu in one of those cases:
vprintk_emit().  I suggest adding helper functions to take/release
logbuf_lock.  And rename logbuf_lock to something else to ensure that
nobody accidentally takes the lock directly.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 04/10] printk: Merge and flush NMI buffer predictably via IRQ work
  2015-05-25 12:46 ` [PATCH 04/10] printk: Merge and flush NMI buffer predictably via IRQ work Petr Mladek
@ 2015-05-27 23:14   ` Andrew Morton
  2015-05-28 13:12     ` Petr Mladek
  0 siblings, 1 reply; 25+ messages in thread
From: Andrew Morton @ 2015-05-27 23:14 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Frederic Weisbecker, Steven Rostedt, Dave Anderson,
	Paul E. McKenney, Kay Sievers, Jiri Kosina, Michal Hocko,
	Jan Kara, linux-kernel, Wang Long, peifeiyue, dzickus,
	morgan.wang, sasha.levin

On Mon, 25 May 2015 14:46:27 +0200 Petr Mladek <pmladek@suse.cz> wrote:

> It might take ages until users see messages from NMI context. They cannot
> be flushed to the console because the operation involves taking and
> releasing a bunch of locks. Everything gets fixed by the followup printk
> in normal context but it is not predictable.
> 
> The same problem has printk_sched() and this patch reuses the existing
> solution.
> 
> There is no special printk() variant for NMI context. Hence the IRQ work
> need to get queued from vprintk_emit().
> 
> ...
>
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -1554,9 +1554,6 @@ int printk_deferred(const char *fmt, ...)
>  	va_start(args, fmt);
>  	r = vprintk_emit(0, LOGLEVEL_SCHED, NULL, 0, fmt, args);
>  	va_end(args);
> -
> -	__this_cpu_or(printk_pending, PRINTK_PENDING_OUTPUT);
> -	irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
>  	preempt_enable();
>  
>  	return r;
> @@ -1880,7 +1877,10 @@ asmlinkage int vprintk_emit(int facility, int level,
>  	 * If called from the scheduler or NMI context, we can not get console
>  	 * without a possible deadlock.
>  	 */
> -	if (!in_sched && !in_nmi()) {
> +	if (in_sched || in_nmi()) {
> +		__this_cpu_or(printk_pending, PRINTK_PENDING_OUTPUT);
> +		irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
> +	} else {

This looks hairy.  Why irq_work_queue() OK to call from NMI? 
arch/arm64/kernel/smp.c uses smp_cross_call() which might use NMI! 
Presumably it'll call directly if the target CPU==this_cpu but I didn't
run around and audit everything.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 02/10] printk: Try harder to get logbuf_lock on NMI
  2015-05-27 23:14   ` Andrew Morton
@ 2015-05-28  7:54     ` Jiri Kosina
  2015-05-28 13:50     ` Petr Mladek
  1 sibling, 0 replies; 25+ messages in thread
From: Jiri Kosina @ 2015-05-28  7:54 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Petr Mladek, Frederic Weisbecker, Steven Rostedt, Dave Anderson,
	Paul E. McKenney, Kay Sievers, Michal Hocko, Jan Kara,
	linux-kernel, Wang Long, peifeiyue, dzickus, morgan.wang,
	sasha.levin

On Wed, 27 May 2015, Andrew Morton wrote:

> > +static int try_logbuf_lock_in_nmi(void)
> > +{
> > +	u64 start_time, current_time;
> > +	int this_cpu = smp_processor_id();
> > +
> > +	/* no way if we are already locked on this CPU */
> > +	if (logbuf_cpu == this_cpu)
> > +		return 0;
> > +
> > +	/* try hard to get the lock but do not wait forever */
> > +	start_time = cpu_clock(this_cpu);
> > +	current_time = start_time;
> > +	while (current_time - start_time < TRY_LOCKBUF_LOCK_MAX_DELAY_NS) {
> > +		if (raw_spin_trylock(&logbuf_lock))
> > +			return 1;
> > +		cpu_relax();
> > +		current_time = cpu_clock(this_cpu);
> > +	}
> 
> (Looks at the read_seqcount_retry() in
> kernel/time/sched_clock.c:sched_clock())
> 
> Running cpu_clock() in NMI context seems a generally bad idea.  

Umm, why? Requirement for it to be NMI-safe is explicitly documented. See 
the corresponding section in Documentation/timers/timekeeping.txt.

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 01/10] printk: Avoid deadlock in NMI context
  2015-05-27 23:13   ` Andrew Morton
@ 2015-05-28 12:00     ` Petr Mladek
  0 siblings, 0 replies; 25+ messages in thread
From: Petr Mladek @ 2015-05-28 12:00 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Frederic Weisbecker, Steven Rostedt, Dave Anderson,
	Paul E. McKenney, Kay Sievers, Jiri Kosina, Michal Hocko,
	Jan Kara, linux-kernel, Wang Long, peifeiyue, dzickus,
	morgan.wang, sasha.levin

On Wed 2015-05-27 16:13:46, Andrew Morton wrote:
> On Mon, 25 May 2015 14:46:24 +0200 Petr Mladek <pmladek@suse.cz> wrote:
> 
> > printk() cannot be used in NMI context safely because it uses an internal
> > lock and thus could cause a deadlock. This is fine because NMI context
> > is very special. The handlers should be short, effective, and do not
> > use printk().
> > 
> > But developers tend to print warnings even from NMI code. They are pretty
> > hard to debug when they lockup the machine and nothing appears in the logs.
> > 
> > This patch prevents from the deadlock on logbuf_lock by using trylock
> > rather than spin_lock. If the lock can not be taken, the message is
> > ignored and some warning is printed later on.
> > 
> > We also must not try to get console from NMI context. It needs
> > even more locks and there is even higher chance to hung up.
> > 
> > Unfortunately, we could not print more details about the lost message.
> > We could not alloc a buffer in NMI. Therefore we would need some
> > lockless mechanism to share a buffer between NMI and normal context.
> > But this would make printk() code much more complicated and
> > it is not worth it. There has already been an attempt to do so
> > and it has been rejected, see https://lkml.org/lkml/2014/6/10/388
> > This is also the reason why we use the atomic counter.
> 
> hm, I expect it wouldn't be too messy to shove the text into a static
> per-cpu buffer.  So we at least get a few hundred bytes of stuff.

The problem is that we would need to read the static buffer in the normal
context without a lock. The result might be a messy message. Or I
could add some lock-less hackery to keep some consistency but this
would make the code more complex.

In each case, we will not be able to preserve all messages. So, I am
not sure if any more complex solution is worth doing.

Note that we are talking about a corner case. printk() should not be
used in NMI in the first place. If it is used, we still do our best
to get it out. We try to get Oops messages even harder out. If the
message is lost, it might mean some flood of printk()s and
the message might get lost anyway.

> 
> > +		/* emit KERN_CRIT message */
> > +		printed_len += log_store(0, 2, LOG_PREFIX|LOG_NEWLINE, 0,
>   +					 NULL, 0, text, text_len);
> 
> s/2/LOGLEVEL_CRIT/

Good point.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 04/10] printk: Merge and flush NMI buffer predictably via IRQ work
  2015-05-27 23:14   ` Andrew Morton
@ 2015-05-28 13:12     ` Petr Mladek
  0 siblings, 0 replies; 25+ messages in thread
From: Petr Mladek @ 2015-05-28 13:12 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Frederic Weisbecker, Steven Rostedt, Dave Anderson,
	Paul E. McKenney, Kay Sievers, Jiri Kosina, Michal Hocko,
	Jan Kara, linux-kernel, Wang Long, peifeiyue, dzickus,
	morgan.wang, sasha.levin, Peter Zijlstra

On Wed 2015-05-27 16:14:23, Andrew Morton wrote:
> On Mon, 25 May 2015 14:46:27 +0200 Petr Mladek <pmladek@suse.cz> wrote:
> 
> > It might take ages until users see messages from NMI context. They cannot
> > be flushed to the console because the operation involves taking and
> > releasing a bunch of locks. Everything gets fixed by the followup printk
> > in normal context but it is not predictable.
> > 
> > The same problem has printk_sched() and this patch reuses the existing
> > solution.
> > 
> > There is no special printk() variant for NMI context. Hence the IRQ work
> > need to get queued from vprintk_emit().
> > 
> > ...
> >
> > --- a/kernel/printk/printk.c
> > +++ b/kernel/printk/printk.c
> > @@ -1554,9 +1554,6 @@ int printk_deferred(const char *fmt, ...)
> >  	va_start(args, fmt);
> >  	r = vprintk_emit(0, LOGLEVEL_SCHED, NULL, 0, fmt, args);
> >  	va_end(args);
> > -
> > -	__this_cpu_or(printk_pending, PRINTK_PENDING_OUTPUT);
> > -	irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
> >  	preempt_enable();
> >  
> >  	return r;
> > @@ -1880,7 +1877,10 @@ asmlinkage int vprintk_emit(int facility, int level,
> >  	 * If called from the scheduler or NMI context, we can not get console
> >  	 * without a possible deadlock.
> >  	 */
> > -	if (!in_sched && !in_nmi()) {
> > +	if (in_sched || in_nmi()) {
> > +		__this_cpu_or(printk_pending, PRINTK_PENDING_OUTPUT);
> > +		irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
> > +	} else {
> 
> This looks hairy.  Why irq_work_queue() OK to call from NMI?

IMHO, irq_work_queue() was primary created for exactly this job.
I mean to move some work from NMI into a more relaxed context.
See the initial commit e360adbe29241a01 ("irq_work: Add generic
hardirq context callbacks").

In each case, the code seems to be well prepared for this. Especially,
there is a big effort to manipulate work->flags lock-less way.

The only exception is irq_work_queue_on() but it seems to be related
to the particular arch_send_call_function_single_ipi(cpu) call
that is not NMI safe.

> arch/arm64/kernel/smp.c uses smp_cross_call() which might use NMI! 

smp_cross_call() is defined by set_smp_cross_call(). I see
only three functions that are assigned this way:

drivers/irqchip/irq-armada-370-xp.c:            set_smp_cross_call(armada_mpic_send_doorbell);
drivers/irqchip/irq-gic-v3.c:   set_smp_cross_call(gic_raise_softirq);
drivers/irqchip/irq-gic.c:              set_smp_cross_call(gic_raise_softirq);
drivers/irqchip/irq-hip04.c:    set_smp_cross_call(hip04_raise_softirq);

They all seems to work with soft IRQ.


> Presumably it'll call directly if the target CPU==this_cpu but I didn't
> run around and audit everything.

IMHO, this is the case of smp_call_function_single() that has the
function as a parameter. But irq_work_queue() always puts the function
into the work list and sends interrupt.

Of course, it is possible that I have missed something. I hope that
Peter or Frederic could confirm my observation.


Best Regards,
Petr

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 02/10] printk: Try harder to get logbuf_lock on NMI
  2015-05-27 23:14   ` Andrew Morton
  2015-05-28  7:54     ` Jiri Kosina
@ 2015-05-28 13:50     ` Petr Mladek
  2015-05-28 20:09       ` Andrew Morton
  1 sibling, 1 reply; 25+ messages in thread
From: Petr Mladek @ 2015-05-28 13:50 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Frederic Weisbecker, Steven Rostedt, Dave Anderson,
	Paul E. McKenney, Kay Sievers, Jiri Kosina, Michal Hocko,
	Jan Kara, linux-kernel, Wang Long, peifeiyue, dzickus,
	morgan.wang, sasha.levin

On Wed 2015-05-27 16:14:09, Andrew Morton wrote:
> On Mon, 25 May 2015 14:46:25 +0200 Petr Mladek <pmladek@suse.cz> wrote:
> 
> > If the logbuf_lock is not available immediately, it does not mean
> > that there is a deadlock. We should try harder and wait a bit.
> > 
> > On the other hand, we must not forget that we are in NMI and the timeout
> > has to be rather small. It must not cause dangerous stalls.
> > 
> > I even got full system freeze when the timeout was 10ms and I printed
> > backtraces from all CPUs. In this case, all CPUs were blocked for
> > too long.
> > 
> > ...
> >
> > --- a/kernel/printk/printk.c
> > +++ b/kernel/printk/printk.c
> > @@ -231,6 +231,8 @@ static DEFINE_RAW_SPINLOCK(logbuf_lock);
> >  
> >  #ifdef CONFIG_PRINTK
> >  DECLARE_WAIT_QUEUE_HEAD(log_wait);
> > +/* cpu currently holding logbuf_lock */
> > +static unsigned int logbuf_cpu = UINT_MAX;
> >  /* the next printk record to read by syslog(READ) or /proc/kmsg */
> >  static u64 syslog_seq;
> >  static u32 syslog_idx;
> > @@ -1610,6 +1612,38 @@ static size_t cont_print_text(char *text, size_t size)
> >  	return textlen;
> >  }
> >  
> > +/*
> > + * This value defines the maximum delay that we spend waiting for logbuf_lock
> > + * in NMI context. 100us looks like a good compromise. Note that, for example,
> > + * syslog_print_all() might hold the lock for quite some time. On the other
> > + * hand, waiting 10ms caused system freeze when many backtraces were printed
> > + * in NMI.
> > + */
> > +#define TRY_LOCKBUF_LOCK_MAX_DELAY_NS 100000UL
> > +
> > +/* We must be careful in NMI when we managed to preempt a running printk */
> > +static int try_logbuf_lock_in_nmi(void)
> > +{
> > +	u64 start_time, current_time;
> > +	int this_cpu = smp_processor_id();
> > +
> > +	/* no way if we are already locked on this CPU */
> > +	if (logbuf_cpu == this_cpu)
> > +		return 0;
> > +
> > +	/* try hard to get the lock but do not wait forever */
> > +	start_time = cpu_clock(this_cpu);
> > +	current_time = start_time;
> > +	while (current_time - start_time < TRY_LOCKBUF_LOCK_MAX_DELAY_NS) {
> > +		if (raw_spin_trylock(&logbuf_lock))
> > +			return 1;
> > +		cpu_relax();
> > +		current_time = cpu_clock(this_cpu);
> > +	}
> 
> (Looks at the read_seqcount_retry() in
> kernel/time/sched_clock.c:sched_clock())
>
> Running cpu_clock() in NMI context seems a generally bad idea.

I am sorry but this is too cryptic for me :-)
read_seqcount_retry() looks safe to me under NMI.

> Using an ndelay/udelay wait loop here would be safer?

My intention was to call raw_spin_trylock() as many times as possible
within a given timeframe. I think that my solution gives a better
control over the maximum time. The CPU has to stay in NMI and
thus busy anyway.

Or do you have some other concern, please?


> There are many sites in kernel/printk/printk.c which take logbuf_lock,
> but this patch only sets logbuf_cpu in one of those cases:
> vprintk_emit().  I suggest adding helper functions to take/release
> logbuf_lock.  And rename logbuf_lock to something else to ensure that
> nobody accidentally takes the lock directly.

IMHO, vprintk_emit() is special. It is the only location where the
lock is taken in NMI context. The other functions are used to dump
@logbuf and are called in normal context.

try_logbuf_lock_in_nmi() could fail and we need to handle the error
path. We do not need to do this in the other locations.

Note that we do not want to get the console in NMI because
there are even more locks that might cause a deadlock.


In each case, thanks a lot for detailed review. It is really appreciated.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 02/10] printk: Try harder to get logbuf_lock on NMI
  2015-05-28 13:50     ` Petr Mladek
@ 2015-05-28 20:09       ` Andrew Morton
  2015-05-29 10:56         ` Petr Mladek
  0 siblings, 1 reply; 25+ messages in thread
From: Andrew Morton @ 2015-05-28 20:09 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Frederic Weisbecker, Steven Rostedt, Dave Anderson,
	Paul E. McKenney, Kay Sievers, Jiri Kosina, Michal Hocko,
	Jan Kara, linux-kernel, Wang Long, peifeiyue, dzickus,
	morgan.wang, sasha.levin

On Thu, 28 May 2015 15:50:54 +0200 Petr Mladek <pmladek@suse.cz> wrote:

> > > +{
> > > +	u64 start_time, current_time;
> > > +	int this_cpu = smp_processor_id();
> > > +
> > > +	/* no way if we are already locked on this CPU */
> > > +	if (logbuf_cpu == this_cpu)
> > > +		return 0;
> > > +
> > > +	/* try hard to get the lock but do not wait forever */
> > > +	start_time = cpu_clock(this_cpu);
> > > +	current_time = start_time;
> > > +	while (current_time - start_time < TRY_LOCKBUF_LOCK_MAX_DELAY_NS) {
> > > +		if (raw_spin_trylock(&logbuf_lock))
> > > +			return 1;
> > > +		cpu_relax();
> > > +		current_time = cpu_clock(this_cpu);
> > > +	}
> > 
> > (Looks at the read_seqcount_retry() in
> > kernel/time/sched_clock.c:sched_clock())
> >
> > Running cpu_clock() in NMI context seems a generally bad idea.
> 
> I am sorry but this is too cryptic for me :-)
> read_seqcount_retry() looks safe to me under NMI.

hmpf.  If you guys say so...

Note that it's not just a matter of "safe to call from NMI context". 
The above loop also assume that cpu_clock() is *being updated* within
the context of single NMI.  Is that true/safe now and in the future?
Probably.  I didn't check all architectures but ARM looks OK at present.

We should at least update Documentation/timers/timekeeping.txt: "a sane
value" becomes "the correct value", no alternatives.

> > There are many sites in kernel/printk/printk.c which take logbuf_lock,
> > but this patch only sets logbuf_cpu in one of those cases:
> > vprintk_emit().  I suggest adding helper functions to take/release
> > logbuf_lock.  And rename logbuf_lock to something else to ensure that
> > nobody accidentally takes the lock directly.
> 
> IMHO, vprintk_emit() is special. It is the only location where the
> lock is taken in NMI context. The other functions are used to dump
> @logbuf and are called in normal context.
> 
> try_logbuf_lock_in_nmi() could fail and we need to handle the error
> path. We do not need to do this in the other locations.
> 
> Note that we do not want to get the console in NMI because
> there are even more locks that might cause a deadlock.

Consider the case where a CPU has taken logbuf_lock within
devkmsg_read() and then receives an NMI, from which it calls
try_logbuf_lock_in_nmi():

> +/* We must be careful in NMI when we managed to preempt a running printk */
> +static int try_logbuf_lock_in_nmi(void)
> +{
> +	u64 start_time, current_time;
> +	int this_cpu = smp_processor_id();
> +
> +	/* no way if we are already locked on this CPU */
> +	if (logbuf_cpu == this_cpu)
> +		return 0;
> +
> +	/* try hard to get the lock but do not wait forever */
> +	start_time = cpu_clock(this_cpu);
> +	current_time = start_time;
> +	while (current_time - start_time < TRY_LOCKBUF_LOCK_MAX_DELAY_NS) {
> +		if (raw_spin_trylock(&logbuf_lock))
> +			return 1;
> +		cpu_relax();
> +		current_time = cpu_clock(this_cpu);
> +	}
> +
> +	return 0;
> +}

That CPU is now going to spin around for 100us and then time out.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 02/10] printk: Try harder to get logbuf_lock on NMI
  2015-05-28 20:09       ` Andrew Morton
@ 2015-05-29 10:56         ` Petr Mladek
  2015-05-29 20:46           ` Andrew Morton
  0 siblings, 1 reply; 25+ messages in thread
From: Petr Mladek @ 2015-05-29 10:56 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Frederic Weisbecker, Steven Rostedt, Dave Anderson,
	Paul E. McKenney, Kay Sievers, Jiri Kosina, Michal Hocko,
	Jan Kara, linux-kernel, Wang Long, peifeiyue, dzickus,
	morgan.wang, sasha.levin

On Thu 2015-05-28 13:09:44, Andrew Morton wrote:
> On Thu, 28 May 2015 15:50:54 +0200 Petr Mladek <pmladek@suse.cz> wrote:
> 
> > > > +{
> > > > +	u64 start_time, current_time;
> > > > +	int this_cpu = smp_processor_id();
> > > > +
> > > > +	/* no way if we are already locked on this CPU */
> > > > +	if (logbuf_cpu == this_cpu)
> > > > +		return 0;
> > > > +
> > > > +	/* try hard to get the lock but do not wait forever */
> > > > +	start_time = cpu_clock(this_cpu);
> > > > +	current_time = start_time;
> > > > +	while (current_time - start_time < TRY_LOCKBUF_LOCK_MAX_DELAY_NS) {
> > > > +		if (raw_spin_trylock(&logbuf_lock))
> > > > +			return 1;
> > > > +		cpu_relax();
> > > > +		current_time = cpu_clock(this_cpu);
> > > > +	}
> > > 
> > > (Looks at the read_seqcount_retry() in
> > > kernel/time/sched_clock.c:sched_clock())
> > >
> > > Running cpu_clock() in NMI context seems a generally bad idea.
> > 
> > I am sorry but this is too cryptic for me :-)
> > read_seqcount_retry() looks safe to me under NMI.
> 
> hmpf.  If you guys say so...
> 
> Note that it's not just a matter of "safe to call from NMI context". 
> The above loop also assume that cpu_clock() is *being updated* within
> the context of single NMI.  Is that true/safe now and in the future?
> Probably.  I didn't check all architectures but ARM looks OK at present.
> 
> We should at least update Documentation/timers/timekeeping.txt: "a sane
> value" becomes "the correct value", no alternatives.
> 
> > > There are many sites in kernel/printk/printk.c which take logbuf_lock,
> > > but this patch only sets logbuf_cpu in one of those cases:
> > > vprintk_emit().  I suggest adding helper functions to take/release
> > > logbuf_lock.  And rename logbuf_lock to something else to ensure that
> > > nobody accidentally takes the lock directly.
> > 
> > IMHO, vprintk_emit() is special. It is the only location where the
> > lock is taken in NMI context. The other functions are used to dump
> > @logbuf and are called in normal context.
> > 
> > try_logbuf_lock_in_nmi() could fail and we need to handle the error
> > path. We do not need to do this in the other locations.
> > 
> > Note that we do not want to get the console in NMI because
> > there are even more locks that might cause a deadlock.
> 
> Consider the case where a CPU has taken logbuf_lock within
> devkmsg_read() and then receives an NMI, from which it calls
> try_logbuf_lock_in_nmi():

I am not sure that I understand. My point is that we do not call
devkmsg_read() from NMI context, so we do not need to use
try_logbuf_lock_in_nmi() there. IMHO, the same is true for
all other locations except for vprintk_emit().


> > +/* We must be careful in NMI when we managed to preempt a running printk */
> > +static int try_logbuf_lock_in_nmi(void)
> > +{
> > +	u64 start_time, current_time;
> > +	int this_cpu = smp_processor_id();
> > +
> > +	/* no way if we are already locked on this CPU */
> > +	if (logbuf_cpu == this_cpu)
> > +		return 0;

Or do you have this check in mind? It will detect the deadlock
immediately but @logbuf_cpu is set only in vprintk_emit(). We
will spin when NMI comes inside the other functions,
e.g. devkmsg_read().


> > +	/* try hard to get the lock but do not wait forever */
> > +	start_time = cpu_clock(this_cpu);
> > +	current_time = start_time;
> > +	while (current_time - start_time < TRY_LOCKBUF_LOCK_MAX_DELAY_NS) {
> > +		if (raw_spin_trylock(&logbuf_lock))
> > +			return 1;
> > +		cpu_relax();
> > +		current_time = cpu_clock(this_cpu);
> > +	}
> > +
> > +	return 0;
> > +}
> 
> That CPU is now going to spin around for 100us and then time out.

Yes, there was a deadlock without the patch. So, limited spinning is
still a win.

Or would you like to detect the deadlock immediately in all cases?
I mean to add the proposed wrapper around take/release lock calls
and set/test some cpu-specific variable there?

It sounds interesting. Well, the detection will not be 100% correct
because there is a small race window between taking @logbuf_lock
and setting @lockbuf_cpu. I wonder if it is worth doing. But I will
do it if you want.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 02/10] printk: Try harder to get logbuf_lock on NMI
  2015-05-29 10:56         ` Petr Mladek
@ 2015-05-29 20:46           ` Andrew Morton
  0 siblings, 0 replies; 25+ messages in thread
From: Andrew Morton @ 2015-05-29 20:46 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Frederic Weisbecker, Steven Rostedt, Dave Anderson,
	Paul E. McKenney, Kay Sievers, Jiri Kosina, Michal Hocko,
	Jan Kara, linux-kernel, Wang Long, peifeiyue, dzickus,
	morgan.wang, sasha.levin

On Fri, 29 May 2015 12:56:07 +0200 Petr Mladek <pmladek@suse.cz> wrote:

> > > +	/* try hard to get the lock but do not wait forever */
> > > +	start_time = cpu_clock(this_cpu);
> > > +	current_time = start_time;
> > > +	while (current_time - start_time < TRY_LOCKBUF_LOCK_MAX_DELAY_NS) {
> > > +		if (raw_spin_trylock(&logbuf_lock))
> > > +			return 1;
> > > +		cpu_relax();
> > > +		current_time = cpu_clock(this_cpu);
> > > +	}
> > > +
> > > +	return 0;
> > > +}
> > 
> > That CPU is now going to spin around for 100us and then time out.
> 
> Yes, there was a deadlock without the patch. So, limited spinning is
> still a win.
> 
> Or would you like to detect the deadlock immediately in all cases?
> I mean to add the proposed wrapper around take/release lock calls
> and set/test some cpu-specific variable there?

Yes.  Pointlessly spinning in NMI for 100us is bad.

> It sounds interesting. Well, the detection will not be 100% correct
> because there is a small race window between taking @logbuf_lock
> and setting @lockbuf_cpu. I wonder if it is worth doing. But I will
> do it if you want.

You might be able to do something with checking logbuf_cpu within the
loop to avoid the worst-case scenarios.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 00/10] printk: Avoid deadlock in NMI + vprintk_emit() cleanup
  2015-05-25 12:46 [PATCH 00/10] printk: Avoid deadlock in NMI + vprintk_emit() cleanup Petr Mladek
                   ` (9 preceding siblings ...)
  2015-05-25 12:46 ` [PATCH 10/10] printk: Split console call " Petr Mladek
@ 2015-05-29 20:50 ` Andrew Morton
  2015-06-01 13:06   ` Jan Kara
  10 siblings, 1 reply; 25+ messages in thread
From: Andrew Morton @ 2015-05-29 20:50 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Frederic Weisbecker, Steven Rostedt, Dave Anderson,
	Paul E. McKenney, Kay Sievers, Jiri Kosina, Michal Hocko,
	Jan Kara, linux-kernel, Wang Long, peifeiyue, dzickus,
	morgan.wang, sasha.levin

On Mon, 25 May 2015 14:46:23 +0200 Petr Mladek <pmladek@suse.cz> wrote:

> The main source of deadlocks caused by printk() in NMI context has been
> solved by the commit a9edc88093287 ("x86/nmi: Perform a safe NMI stack
> trace on all CPUs").
> 
> But there are still few warnings printed in the NMI code that could
> case a deadlock. For example, see the freeze discussed at
> https://lkml.org/lkml/2015/5/20/481

I'm not (yet) convinced that we want the entire patchset btw.  Do we
really want to try to semi-support printk from NMI?  With a rather
nasty set of hacks?

Why not just delete the offending printks?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 00/10] printk: Avoid deadlock in NMI + vprintk_emit() cleanup
  2015-05-29 20:50 ` [PATCH 00/10] printk: Avoid deadlock in NMI + vprintk_emit() cleanup Andrew Morton
@ 2015-06-01 13:06   ` Jan Kara
  2015-06-02  9:46     ` long.wanglong
  0 siblings, 1 reply; 25+ messages in thread
From: Jan Kara @ 2015-06-01 13:06 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Petr Mladek, Frederic Weisbecker, Steven Rostedt, Dave Anderson,
	Paul E. McKenney, Kay Sievers, Jiri Kosina, Michal Hocko,
	Jan Kara, linux-kernel, Wang Long, peifeiyue, dzickus,
	morgan.wang, sasha.levin

On Fri 29-05-15 13:50:45, Andrew Morton wrote:
> On Mon, 25 May 2015 14:46:23 +0200 Petr Mladek <pmladek@suse.cz> wrote:
> 
> > The main source of deadlocks caused by printk() in NMI context has been
> > solved by the commit a9edc88093287 ("x86/nmi: Perform a safe NMI stack
> > trace on all CPUs").
> > 
> > But there are still few warnings printed in the NMI code that could
> > case a deadlock. For example, see the freeze discussed at
> > https://lkml.org/lkml/2015/5/20/481
> 
> I'm not (yet) convinced that we want the entire patchset btw.  Do we
> really want to try to semi-support printk from NMI?  With a rather
> nasty set of hacks?
> 
> Why not just delete the offending printks?
  And what about WARN_ONs and BUG_ONs? Delete as well? Or just don't print
anything when we are in NMI? I agree that NMI is so problematic context
that restricting printk there makes some sence. OTOH propagating
information from NMI to user is useful as well so I'm somewhat undecided.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 00/10] printk: Avoid deadlock in NMI + vprintk_emit() cleanup
  2015-06-01 13:06   ` Jan Kara
@ 2015-06-02  9:46     ` long.wanglong
  2015-06-02  9:52       ` Jiri Kosina
  0 siblings, 1 reply; 25+ messages in thread
From: long.wanglong @ 2015-06-02  9:46 UTC (permalink / raw)
  To: Jan Kara
  Cc: Andrew Morton, Petr Mladek, Frederic Weisbecker, Steven Rostedt,
	Dave Anderson, Paul E. McKenney, Kay Sievers, Jiri Kosina,
	Michal Hocko, linux-kernel, peifeiyue, dzickus, morgan.wang,
	sasha.levin, xuhanbing

On 2015/6/1 21:06, Jan Kara wrote:
> On Fri 29-05-15 13:50:45, Andrew Morton wrote:
>> On Mon, 25 May 2015 14:46:23 +0200 Petr Mladek <pmladek@suse.cz> wrote:
>>
>>> The main source of deadlocks caused by printk() in NMI context has been
>>> solved by the commit a9edc88093287 ("x86/nmi: Perform a safe NMI stack
>>> trace on all CPUs").
>>>
>>> But there are still few warnings printed in the NMI code that could
>>> case a deadlock. For example, see the freeze discussed at
>>> https://lkml.org/lkml/2015/5/20/481
>>
>> I'm not (yet) convinced that we want the entire patchset btw.  Do we
>> really want to try to semi-support printk from NMI?  With a rather
>> nasty set of hacks?
>>
>> Why not just delete the offending printks?
>   And what about WARN_ONs and BUG_ONs? Delete as well? Or just don't print
> anything when we are in NMI? I agree that NMI is so problematic context
> that restricting printk there makes some sence. OTOH propagating
> information from NMI to user is useful as well so I'm somewhat undecided.
> 
> 								Honza
> 

I think that delete all printks in NMI context is not a good solution.
because some information is very useful to user.

The commit a9edc88093287 ("x86/nmi: Perform a safe NMI stack trace on all CPUs")
solved the deadlock problem only in arch_trigger_all_cpu_backtrace_handler() by
replacing the printk to a special print function (nmi_vprintk). But all the
other NMI handlers still use the original printk.

How about replacing printk function earlier? we can replace printk function
before we calling default_do_nmi(arch/x86/kernel/nmi.c) and replace back
after calling.

Is it a feasible solution? or does it introduce other problems?

Best Regards
Wang Long


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 00/10] printk: Avoid deadlock in NMI + vprintk_emit() cleanup
  2015-06-02  9:46     ` long.wanglong
@ 2015-06-02  9:52       ` Jiri Kosina
  0 siblings, 0 replies; 25+ messages in thread
From: Jiri Kosina @ 2015-06-02  9:52 UTC (permalink / raw)
  To: long.wanglong
  Cc: Jan Kara, Andrew Morton, Petr Mladek, Frederic Weisbecker,
	Steven Rostedt, Dave Anderson, Paul E. McKenney, Kay Sievers,
	Michal Hocko, linux-kernel, peifeiyue, dzickus, morgan.wang,
	sasha.levin, xuhanbing

On Tue, 2 Jun 2015, long.wanglong wrote:

> How about replacing printk function earlier? we can replace printk 
> function before we calling default_do_nmi(arch/x86/kernel/nmi.c) and 
> replace back after calling.
> 
> Is it a feasible solution? or does it introduce other problems?

This has been already discussed in the past when we were fixing the 
NMI-based stackdumping. See

	http://marc.info/?l=linux-kernel&m=141640218702588&w=2

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2015-06-02  9:53 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-25 12:46 [PATCH 00/10] printk: Avoid deadlock in NMI + vprintk_emit() cleanup Petr Mladek
2015-05-25 12:46 ` [PATCH 01/10] printk: Avoid deadlock in NMI context Petr Mladek
2015-05-27 23:13   ` Andrew Morton
2015-05-28 12:00     ` Petr Mladek
2015-05-25 12:46 ` [PATCH 02/10] printk: Try harder to get logbuf_lock on NMI Petr Mladek
2015-05-27 23:14   ` Andrew Morton
2015-05-28  7:54     ` Jiri Kosina
2015-05-28 13:50     ` Petr Mladek
2015-05-28 20:09       ` Andrew Morton
2015-05-29 10:56         ` Petr Mladek
2015-05-29 20:46           ` Andrew Morton
2015-05-25 12:46 ` [PATCH 03/10] printk: Move the deferred printk stuff Petr Mladek
2015-05-25 12:46 ` [PATCH 04/10] printk: Merge and flush NMI buffer predictably via IRQ work Petr Mladek
2015-05-27 23:14   ` Andrew Morton
2015-05-28 13:12     ` Petr Mladek
2015-05-25 12:46 ` [PATCH 05/10] printk: Try hard to print Oops message in NMI context Petr Mladek
2015-05-25 12:46 ` [PATCH 06/10] printk: Split delayed printing of warnings from vprintk_emit() Petr Mladek
2015-05-25 12:46 ` [PATCH 07/10] printk: Split text formatting and analyze " Petr Mladek
2015-05-25 12:46 ` [PATCH 08/10] printk: Detect scheduler messages in vprintk_format_and_analyze() Petr Mladek
2015-05-25 12:46 ` [PATCH 09/10] printk: Split text storing logic from vprintk_emit() Petr Mladek
2015-05-25 12:46 ` [PATCH 10/10] printk: Split console call " Petr Mladek
2015-05-29 20:50 ` [PATCH 00/10] printk: Avoid deadlock in NMI + vprintk_emit() cleanup Andrew Morton
2015-06-01 13:06   ` Jan Kara
2015-06-02  9:46     ` long.wanglong
2015-06-02  9:52       ` Jiri Kosina

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.