All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC][PATCH 0/7] printk: use alt_printk to handle printk() recursive calls
@ 2016-09-27 14:22 Sergey Senozhatsky
  2016-09-27 14:22 ` [RFC][PATCH 1/7] printk: use vprintk_func in vprintk() Sergey Senozhatsky
                   ` (7 more replies)
  0 siblings, 8 replies; 35+ messages in thread
From: Sergey Senozhatsky @ 2016-09-27 14:22 UTC (permalink / raw)
  To: Petr Mladek, Jan Kara
  Cc: Andrew Morton, Tejun Heo, Calvin Owens, linux-kernel,
	Sergey Senozhatsky, Sergey Senozhatsky

	Hello,

	RFC

	This patch set extends a lock-less NMI per-cpu buffers idea to
handle recursive printk() calls. The basic mechanism is pretty much the
same -- at the beginning of a deadlock-prone section we switch to lock-less
printk callback, and return back to a default printk implementation at the
end; the messages are getting flushed to a logbuf buffer from a safer
context.

Deadlocks scenarios that alt_printk can handle:

a) printk recursion from logbuf_lock spin_lock section in printk()
  printk()
    raw_spin_lock(&logbuf_lock);
    WARN_ON(1);
    raw_spin_unlock(&logbuf_lock);

b) printk from sem->lock spin_lock section
  printk()
    console_trylock()
      down_trylock()
        raw_spin_lock_irqsave(&sem->lock, flags);
        WARN_ON(1);
        raw_spin_unlock_irqrestore(&sem->lock, flags);

c) printk from logbuf_lock spin_lock section in console_unlock()
  printk()
    console_unlock()
      raw_spin_lock(&logbuf_lock);
      WARN_ON(1);
      raw_spin_unlock(&logbuf_lock);

d) printk from ->pi_lock from semaphore up
  printk()
    console_unlock()
      up()
        try_to_wake_up()
          raw_spin_lock_irqsave(&p->pi_lock, flags);
          WARN_ON(1);
          raw_spin_unlock_irqrestore(&p->pi_lock, flags);

and some other.

[[against next-20160927]]


Sergey Senozhatsky (7):
  printk: use vprintk_func in vprintk()
  printk: rename nmi.c and exported api
  printk: introduce per-cpu alt_print seq buffer
  printk: make alt_printk available when config printk set
  printk: drop vprintk_func function
  printk: use alternative printk buffers
  printk: new printk() recursion detection

 include/linux/printk.h     |  14 +-
 init/Kconfig               |  16 +-
 init/main.c                |   2 +-
 kernel/kexec_core.c        |   2 +-
 kernel/panic.c             |   4 +-
 kernel/printk/Makefile     |   2 +-
 kernel/printk/alt_printk.c | 432 +++++++++++++++++++++++++++++++++++++++++++++
 kernel/printk/internal.h   |  26 ++-
 kernel/printk/nmi.c        | 268 ----------------------------
 kernel/printk/printk.c     |  67 +++----
 lib/nmi_backtrace.c        |   2 +-
 11 files changed, 495 insertions(+), 340 deletions(-)
 create mode 100644 kernel/printk/alt_printk.c
 delete mode 100644 kernel/printk/nmi.c

-- 
2.10.0.372.g6fe1b14

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [RFC][PATCH 1/7] printk: use vprintk_func in vprintk()
  2016-09-27 14:22 [RFC][PATCH 0/7] printk: use alt_printk to handle printk() recursive calls Sergey Senozhatsky
@ 2016-09-27 14:22 ` Sergey Senozhatsky
  2016-09-27 14:22 ` [RFC][PATCH 2/7] printk: rename nmi.c and exported api Sergey Senozhatsky
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 35+ messages in thread
From: Sergey Senozhatsky @ 2016-09-27 14:22 UTC (permalink / raw)
  To: Petr Mladek, Jan Kara
  Cc: Andrew Morton, Tejun Heo, Calvin Owens, linux-kernel,
	Sergey Senozhatsky, Sergey Senozhatsky

vprintk(), just like printk(), better be using per-cpu printk_func
instead of direct vprintk_emit() call. Just in case if vprintk()
will ever be called from NMI, or from any other context that can
deadlock in printk().

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
---
 kernel/printk/printk.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 8019cc0..c7ffbef 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1922,7 +1922,7 @@ EXPORT_SYMBOL(vprintk_emit);
 
 asmlinkage int vprintk(const char *fmt, va_list args)
 {
-	return vprintk_emit(0, LOGLEVEL_DEFAULT, NULL, 0, fmt, args);
+	return vprintk_func(fmt, args);
 }
 EXPORT_SYMBOL(vprintk);
 
-- 
2.10.0.372.g6fe1b14

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [RFC][PATCH 2/7] printk: rename nmi.c and exported api
  2016-09-27 14:22 [RFC][PATCH 0/7] printk: use alt_printk to handle printk() recursive calls Sergey Senozhatsky
  2016-09-27 14:22 ` [RFC][PATCH 1/7] printk: use vprintk_func in vprintk() Sergey Senozhatsky
@ 2016-09-27 14:22 ` Sergey Senozhatsky
  2016-09-27 14:22 ` [RFC][PATCH 3/7] printk: introduce per-cpu alt_print seq buffer Sergey Senozhatsky
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 35+ messages in thread
From: Sergey Senozhatsky @ 2016-09-27 14:22 UTC (permalink / raw)
  To: Petr Mladek, Jan Kara
  Cc: Andrew Morton, Tejun Heo, Calvin Owens, linux-kernel,
	Sergey Senozhatsky, Sergey Senozhatsky

A preparation patch for alt_printk work. No functional change.
- rename nmi.c to alt_print.c
- rename exported functions to have alt_printk prefix.

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
---
 include/linux/printk.h                | 12 +++----
 init/Kconfig                          | 16 +++++----
 init/main.c                           |  2 +-
 kernel/kexec_core.c                   |  2 +-
 kernel/panic.c                        |  4 +--
 kernel/printk/Makefile                |  2 +-
 kernel/printk/{nmi.c => alt_printk.c} | 61 ++++++++++++++++++-----------------
 lib/nmi_backtrace.c                   |  2 +-
 8 files changed, 52 insertions(+), 49 deletions(-)
 rename kernel/printk/{nmi.c => alt_printk.c} (80%)

diff --git a/include/linux/printk.h b/include/linux/printk.h
index 696a56b..7510613 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -131,17 +131,17 @@ void early_printk(const char *s, ...) { }
 #endif
 
 #ifdef CONFIG_PRINTK_NMI
-extern void printk_nmi_init(void);
+extern void alt_printk_init(void);
 extern void printk_nmi_enter(void);
 extern void printk_nmi_exit(void);
-extern void printk_nmi_flush(void);
-extern void printk_nmi_flush_on_panic(void);
+extern void alt_printk_flush(void);
+extern void alt_printk_flush_on_panic(void);
 #else
-static inline void printk_nmi_init(void) { }
+static inline void alt_printk_init(void) { }
 static inline void printk_nmi_enter(void) { }
 static inline void printk_nmi_exit(void) { }
-static inline void printk_nmi_flush(void) { }
-static inline void printk_nmi_flush_on_panic(void) { }
+static inline void alt_printk_flush(void) { }
+static inline void alt_printk_flush_on_panic(void) { }
 #endif /* PRINTK_NMI */
 
 #ifdef CONFIG_PRINTK
diff --git a/init/Kconfig b/init/Kconfig
index 34407f1..434807f 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -875,17 +875,19 @@ config LOG_CPU_MAX_BUF_SHIFT
 		     13 =>   8 KB for each CPU
 		     12 =>   4 KB for each CPU
 
-config NMI_LOG_BUF_SHIFT
-	int "Temporary per-CPU NMI log buffer size (12 => 4KB, 13 => 8KB)"
+config ALT_PRINTK_LOG_BUF_SHIFT
+	int "Temporary per-CPU printk log buffer size (12 => 4KB, 13 => 8KB)"
 	range 10 21
 	default 13
-	depends on PRINTK_NMI
+	depends on PRINTK
 	help
-	  Select the size of a per-CPU buffer where NMI messages are temporary
-	  stored. They are copied to the main log buffer in a safe context
-	  to avoid a deadlock. The value defines the size as a power of 2.
+	  Select the size of an alternate printk per-CPU buffer where messages
+	  printed from usafe contexts are temporary stored. One example would
+	  be NMI messages, another one - printk recursion. The messages are
+	  copied to the main log buffer in a safe context to avoid a deadlock.
+	  The value defines the size as a power of 2.
 
-	  NMI messages are rare and limited. The largest one is when
+	  Those messages are rare and limited. The largest one is when
 	  a backtrace is printed. It usually fits into 4KB. Select
 	  8KB if you want to be on the safe side.
 
diff --git a/init/main.c b/init/main.c
index 9af9274..3dab03c 100644
--- a/init/main.c
+++ b/init/main.c
@@ -577,7 +577,7 @@ asmlinkage __visible void __init start_kernel(void)
 	timekeeping_init();
 	time_init();
 	sched_clock_postinit();
-	printk_nmi_init();
+	alt_printk_init();
 	perf_event_init();
 	profile_init();
 	call_function_init();
diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index a865969..a307fa3 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -909,7 +909,7 @@ void crash_kexec(struct pt_regs *regs)
 	old_cpu = atomic_cmpxchg(&panic_cpu, PANIC_CPU_INVALID, this_cpu);
 	if (old_cpu == PANIC_CPU_INVALID) {
 		/* This is the 1st CPU which comes here, so go ahead. */
-		printk_nmi_flush_on_panic();
+		alt_printk_flush_on_panic();
 		__crash_kexec(regs);
 
 		/*
diff --git a/kernel/panic.c b/kernel/panic.c
index e6480e2..28f97ce 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -188,7 +188,7 @@ void panic(const char *fmt, ...)
 	 * Bypass the panic_cpu check and call __crash_kexec directly.
 	 */
 	if (!_crash_kexec_post_notifiers) {
-		printk_nmi_flush_on_panic();
+		alt_printk_flush_on_panic();
 		__crash_kexec(NULL);
 
 		/*
@@ -213,7 +213,7 @@ void panic(const char *fmt, ...)
 	atomic_notifier_call_chain(&panic_notifier_list, 0, buf);
 
 	/* Call flush even twice. It tries harder with a single online CPU */
-	printk_nmi_flush_on_panic();
+	alt_printk_flush_on_panic();
 	kmsg_dump(KMSG_DUMP_PANIC);
 
 	/*
diff --git a/kernel/printk/Makefile b/kernel/printk/Makefile
index abb0042..3fc5e4e 100644
--- a/kernel/printk/Makefile
+++ b/kernel/printk/Makefile
@@ -1,3 +1,3 @@
 obj-y	= printk.o
-obj-$(CONFIG_PRINTK_NMI)		+= nmi.o
+obj-$(CONFIG_PRINTK_NMI)		+= alt_printk.o
 obj-$(CONFIG_A11Y_BRAILLE_CONSOLE)	+= braille.o
diff --git a/kernel/printk/nmi.c b/kernel/printk/alt_printk.c
similarity index 80%
rename from kernel/printk/nmi.c
rename to kernel/printk/alt_printk.c
index 16bab47..7178661 100644
--- a/kernel/printk/nmi.c
+++ b/kernel/printk/alt_printk.c
@@ -1,5 +1,5 @@
 /*
- * nmi.c - Safe printk in NMI context
+ * alt_printk.c - Safe printk in NMI context
  *
  * This program is free software; you can redistribute it and/or
  * modify it under the terms of the GNU General Public License
@@ -39,18 +39,18 @@
  * were handled or when IRQs are blocked.
  */
 DEFINE_PER_CPU(printk_func_t, printk_func) = vprintk_default;
-static int printk_nmi_irq_ready;
+static int alt_printk_irq_ready;
 atomic_t nmi_message_lost;
 
-#define NMI_LOG_BUF_LEN ((1 << CONFIG_NMI_LOG_BUF_SHIFT) -		\
+#define ALT_LOG_BUF_LEN ((1 << CONFIG_ALT_PRINTK_LOG_BUF_SHIFT) -	\
 			 sizeof(atomic_t) - sizeof(struct irq_work))
 
-struct nmi_seq_buf {
+struct alt_printk_seq_buf {
 	atomic_t		len;	/* length of written data */
 	struct irq_work		work;	/* IRQ work that flushes the buffer */
-	unsigned char		buffer[NMI_LOG_BUF_LEN];
+	unsigned char		buffer[ALT_LOG_BUF_LEN];
 };
-static DEFINE_PER_CPU(struct nmi_seq_buf, nmi_print_seq);
+static DEFINE_PER_CPU(struct alt_printk_seq_buf, nmi_print_seq);
 
 /*
  * Safe printk() for NMI context. It uses a per-CPU buffer to
@@ -60,7 +60,7 @@ static DEFINE_PER_CPU(struct nmi_seq_buf, nmi_print_seq);
  */
 static int vprintk_nmi(const char *fmt, va_list args)
 {
-	struct nmi_seq_buf *s = this_cpu_ptr(&nmi_print_seq);
+	struct alt_printk_seq_buf *s = this_cpu_ptr(&nmi_print_seq);
 	int add = 0;
 	size_t len;
 
@@ -90,7 +90,7 @@ static int vprintk_nmi(const char *fmt, va_list args)
 		goto again;
 
 	/* Get flushed in a more safe context. */
-	if (add && printk_nmi_irq_ready) {
+	if (add && alt_printk_irq_ready) {
 		/* Make sure that IRQ work is really initialized. */
 		smp_rmb();
 		irq_work_queue(&s->work);
@@ -99,7 +99,7 @@ static int vprintk_nmi(const char *fmt, va_list args)
 	return add;
 }
 
-static void printk_nmi_flush_line(const char *text, int len)
+static void alt_printk_flush_line(const char *text, int len)
 {
 	/*
 	 * The buffers are flushed in NMI only on panic.  The messages must
@@ -117,23 +117,24 @@ static void printk_nmi_flush_line(const char *text, int len)
  * printk one line from the temporary buffer from @start index until
  * and including the @end index.
  */
-static void printk_nmi_flush_seq_line(struct nmi_seq_buf *s,
+static void alt_printk_flush_seq_line(struct alt_printk_seq_buf *s,
 					int start, int end)
 {
 	const char *buf = s->buffer + start;
 
-	printk_nmi_flush_line(buf, (end - start) + 1);
+	alt_printk_flush_line(buf, (end - start) + 1);
 }
 
 /*
  * Flush data from the associated per_CPU buffer. The function
  * can be called either via IRQ work or independently.
  */
-static void __printk_nmi_flush(struct irq_work *work)
+static void __alt_printk_flush(struct irq_work *work)
 {
 	static raw_spinlock_t read_lock =
 		__RAW_SPIN_LOCK_INITIALIZER(read_lock);
-	struct nmi_seq_buf *s = container_of(work, struct nmi_seq_buf, work);
+	struct alt_printk_seq_buf *s = container_of(work,
+			struct alt_printk_seq_buf, work);
 	unsigned long flags;
 	size_t len, size;
 	int i, last_i;
@@ -157,9 +158,9 @@ static void __printk_nmi_flush(struct irq_work *work)
 	 * @len must only increase.
 	 */
 	if (i && i >= len) {
-		const char *msg = "printk_nmi_flush: internal error\n";
+		const char *msg = "alt_printk_flush: internal error\n";
 
-		printk_nmi_flush_line(msg, strlen(msg));
+		alt_printk_flush_line(msg, strlen(msg));
 	}
 
 	if (!len)
@@ -174,14 +175,14 @@ static void __printk_nmi_flush(struct irq_work *work)
 	/* Print line by line. */
 	for (; i < size; i++) {
 		if (s->buffer[i] == '\n') {
-			printk_nmi_flush_seq_line(s, last_i, i);
+			alt_printk_flush_seq_line(s, last_i, i);
 			last_i = i + 1;
 		}
 	}
 	/* Check if there was a partial line. */
 	if (last_i < size) {
-		printk_nmi_flush_seq_line(s, last_i, size - 1);
-		printk_nmi_flush_line("\n", strlen("\n"));
+		alt_printk_flush_seq_line(s, last_i, size - 1);
+		alt_printk_flush_line("\n", strlen("\n"));
 	}
 
 	/*
@@ -198,31 +199,31 @@ static void __printk_nmi_flush(struct irq_work *work)
 }
 
 /**
- * printk_nmi_flush - flush all per-cpu nmi buffers.
+ * alt_printk_flush - flush all per-cpu nmi buffers.
  *
  * The buffers are flushed automatically via IRQ work. This function
  * is useful only when someone wants to be sure that all buffers have
  * been flushed at some point.
  */
-void printk_nmi_flush(void)
+void alt_printk_flush(void)
 {
 	int cpu;
 
 	for_each_possible_cpu(cpu)
-		__printk_nmi_flush(&per_cpu(nmi_print_seq, cpu).work);
+		__alt_printk_flush(&per_cpu(nmi_print_seq, cpu).work);
 }
 
 /**
- * printk_nmi_flush_on_panic - flush all per-cpu nmi buffers when the system
+ * alt_printk_flush_on_panic - flush all per-cpu nmi buffers when the system
  *	goes down.
  *
- * Similar to printk_nmi_flush() but it can be called even in NMI context when
+ * Similar to alt_printk_flush() but it can be called even in NMI context when
  * the system goes down. It does the best effort to get NMI messages into
  * the main ring buffer.
  *
  * Note that it could try harder when there is only one CPU online.
  */
-void printk_nmi_flush_on_panic(void)
+void alt_printk_flush_on_panic(void)
 {
 	/*
 	 * Make sure that we could access the main ring buffer.
@@ -236,25 +237,25 @@ void printk_nmi_flush_on_panic(void)
 		raw_spin_lock_init(&logbuf_lock);
 	}
 
-	printk_nmi_flush();
+	alt_printk_flush();
 }
 
-void __init printk_nmi_init(void)
+void __init alt_printk_init(void)
 {
 	int cpu;
 
 	for_each_possible_cpu(cpu) {
-		struct nmi_seq_buf *s = &per_cpu(nmi_print_seq, cpu);
+		struct alt_printk_seq_buf *s = &per_cpu(nmi_print_seq, cpu);
 
-		init_irq_work(&s->work, __printk_nmi_flush);
+		init_irq_work(&s->work, __alt_printk_flush);
 	}
 
 	/* Make sure that IRQ works are initialized before enabling. */
 	smp_wmb();
-	printk_nmi_irq_ready = 1;
+	alt_printk_irq_ready = 1;
 
 	/* Flush pending messages that did not have scheduled IRQ works. */
-	printk_nmi_flush();
+	alt_printk_flush();
 }
 
 void printk_nmi_enter(void)
diff --git a/lib/nmi_backtrace.c b/lib/nmi_backtrace.c
index 7555475..257ab1e 100644
--- a/lib/nmi_backtrace.c
+++ b/lib/nmi_backtrace.c
@@ -77,7 +77,7 @@ void nmi_trigger_cpumask_backtrace(const cpumask_t *mask,
 	 * Force flush any remote buffers that might be stuck in IRQ context
 	 * and therefore could not run their irq_work.
 	 */
-	printk_nmi_flush();
+	alt_printk_flush();
 
 	clear_bit_unlock(0, &backtrace_flag);
 	put_cpu();
-- 
2.10.0.372.g6fe1b14

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [RFC][PATCH 3/7] printk: introduce per-cpu alt_print seq buffer
  2016-09-27 14:22 [RFC][PATCH 0/7] printk: use alt_printk to handle printk() recursive calls Sergey Senozhatsky
  2016-09-27 14:22 ` [RFC][PATCH 1/7] printk: use vprintk_func in vprintk() Sergey Senozhatsky
  2016-09-27 14:22 ` [RFC][PATCH 2/7] printk: rename nmi.c and exported api Sergey Senozhatsky
@ 2016-09-27 14:22 ` Sergey Senozhatsky
  2016-09-29 12:26   ` Petr Mladek
  2016-09-27 14:22 ` [RFC][PATCH 4/7] printk: make alt_printk available when config printk set Sergey Senozhatsky
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 35+ messages in thread
From: Sergey Senozhatsky @ 2016-09-27 14:22 UTC (permalink / raw)
  To: Petr Mladek, Jan Kara
  Cc: Andrew Morton, Tejun Heo, Calvin Owens, linux-kernel,
	Sergey Senozhatsky, Sergey Senozhatsky

This patch extends the idea of NMI per-cpu buffers to regions
that may cause recursive printk() calls and possible deadlocks.
Namely, printk() can't handle printk calls from schedule code
or printk() calls from lock debugging code (spin_dump() for instance);
because those may be called with `sem->lock' already taken or any
of critical scheduler locks (p->pi_lock, etc.). An example of
a deadlock can be

 vprintk_emit()
  console_unlock()
   up()                        << raw_spin_lock_irqsave(&sem->lock, flags);
    wake_up_process()
     try_to_wake_up()
      ttwu_queue()
       ttwu_do_activate()
        ttwu_activate()
         activate_task()
          enqueue_task()
           enqueue_task_fair()
            cfs_rq_of()
             task_of()
              WARN_ON_ONCE(!entity_is_task(se))
               vprintk_emit()
                console_trylock()
                 down_trylock()
                  raw_spin_lock_irqsave(&sem->lock, flags)
                  ^^^^ deadlock

and some other cases.

Just like in NMI implementation, the solution uses a per-cpu
`printk_func' pointer to 'redirect' printk() calls to a 'safe'
callback, that store messages in a per-cpu buffer and flushes
them back to logbuf buffer later.

Usage example:

 printk()
  local_irq_save()
  alt_printk_enter()
  //
  //  any printk() call from here will endup in vprintk_alt(),
  //  that stores messages in a special per-CPU buffer.
  //
  alt_printk_exit()
  local_irq_restore()

The patch only adds a alt_printk support, we don't use it yet.

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
---
 kernel/printk/alt_printk.c | 160 +++++++++++++++++++++++++++++++++++++++++----
 kernel/printk/internal.h   |  12 ++++
 2 files changed, 158 insertions(+), 14 deletions(-)

diff --git a/kernel/printk/alt_printk.c b/kernel/printk/alt_printk.c
index 7178661..8978390 100644
--- a/kernel/printk/alt_printk.c
+++ b/kernel/printk/alt_printk.c
@@ -1,5 +1,5 @@
 /*
- * alt_printk.c - Safe printk in NMI context
+ * alt_printk.c - Safe printk for printk-deadlock-prone contexts
  *
  * This program is free software; you can redistribute it and/or
  * modify it under the terms of the GNU General Public License
@@ -53,24 +53,39 @@ struct alt_printk_seq_buf {
 static DEFINE_PER_CPU(struct alt_printk_seq_buf, nmi_print_seq);
 
 /*
- * Safe printk() for NMI context. It uses a per-CPU buffer to
- * store the message. NMIs are not nested, so there is always only
- * one writer running. But the buffer might get flushed from another
- * CPU, so we need to be careful.
+ * There can be two alt_printk contexts at most - a `normal' alt_printk
+ * and NMI alt_printk context. Normal alt_printk context is the one that
+ * direct caller of printk() setups (either a process or IRQ) and it can
+ * be preempted only by NMI (if the platform supports NMI). NMI context
+ * can preempt normal alt_printk context, but cannot be preempted on its
+ * own.
  */
-static int vprintk_nmi(const char *fmt, va_list args)
+#ifdef CONFIG_PRINTK_NMI
+#define MAX_ALT_PRINTK_CTX	2
+#else
+#define MAX_ALT_PRINTK_CTX	1
+#endif
+
+struct alt_printk_ctx {
+	atomic_t	idx;
+	unsigned int	entry_count;
+	printk_func_t	saved_printk_func[MAX_ALT_PRINTK_CTX];
+};
+
+static DEFINE_PER_CPU(struct alt_printk_seq_buf, alt_print_seq);
+static DEFINE_PER_CPU(struct alt_printk_ctx, alt_printk_ctx);
+
+static int alt_printk_log_store(struct alt_printk_seq_buf *s,
+		const char *fmt, va_list args)
 {
-	struct alt_printk_seq_buf *s = this_cpu_ptr(&nmi_print_seq);
-	int add = 0;
+	int add;
 	size_t len;
 
 again:
 	len = atomic_read(&s->len);
 
-	if (len >= sizeof(s->buffer)) {
-		atomic_inc(&nmi_message_lost);
+	if (len >= sizeof(s->buffer))
 		return 0;
-	}
 
 	/*
 	 * Make sure that all old data have been read before the buffer was
@@ -99,6 +114,96 @@ static int vprintk_nmi(const char *fmt, va_list args)
 	return add;
 }
 
+/*
+ * Lockless printk(), to avoid deadlocks should the printk() recurse
+ * into itself. It uses a per-CPU buffer to store the message, just like
+ * NMI.
+ */
+static int vprintk_alt(const char *fmt, va_list args)
+{
+	struct alt_printk_seq_buf *s = this_cpu_ptr(&alt_print_seq);
+
+	return alt_printk_log_store(s, fmt, args);
+}
+
+/*
+ * We must keep the track of `printk_func' because alt_printk
+ * context can be preempted by NMI alt_printk context.
+ *
+ * Consider the following example:
+ *
+ * vprintk_emit()
+ * 	alt_printk_enter()
+ * 		printk_func = vprintk_alt;
+ *
+ * -> NMI
+ *  	printk_nmi_enter()
+ *		printk_func = vprintk_nmi;
+ *	printk_nmi_exit()
+ *		printk_func = vprintk_default;
+ *		^^^^^^^^^^^
+ * <- NMI
+ *
+ * 	printk("foo") -> vprintk_default();
+ *
+ * Thus we must restore the orignal `printk_func' value, the one
+ * NMI saw at printk_nmi_enter() time.
+ */
+static void __lockless_printk_enter(printk_func_t new_func)
+{
+	struct alt_printk_ctx *ctx = this_cpu_ptr(&alt_printk_ctx);
+	int idx = atomic_inc_return(&ctx->idx) - 1;
+
+	ctx->saved_printk_func[idx] = this_cpu_read(printk_func);
+	this_cpu_write(printk_func, new_func);
+}
+
+static void __lockless_printk_exit(void)
+{
+	struct alt_printk_ctx *ctx = this_cpu_ptr(&alt_printk_ctx);
+	int idx = atomic_read(&ctx->idx) - 1;
+
+	this_cpu_write(printk_func, ctx->saved_printk_func[idx]);
+	atomic_dec(&ctx->idx);
+}
+
+/* Local IRQs must be disabled; can be preempted by NMI. */
+void alt_printk_enter(void)
+{
+	struct alt_printk_ctx *ctx = this_cpu_ptr(&alt_printk_ctx);
+
+	/*
+	 * We can't switch `printk_func' while the CPU is flushing its
+	 * alternative buffers. At the same time, we leave flushing
+	 * `unprotected', because we always use vprintk_default() there.
+	 *
+	 * ->entry_count can detect printk() recursion from flushing context:
+	 *  -- alt_printk_flush() sets ->entry_count to 1
+	 *  -- every vprintk_default() call from alt_printk_flush() increments
+	 *     ->entry_count to 2 when it enters the recursion un-safe region
+	 *     and decrements it back to 1 when it leaves that region
+	 *  -- thus, if printk() will recurs from recursion un-safe region, we
+	 *     will see ->entry_count > 2.
+	 */
+	ctx->entry_count++;
+	if (ctx->entry_count > 1)
+		return;
+
+	/* @TODO: do something sensible in case of printk() recursion */
+
+	__lockless_printk_enter(vprintk_alt);
+}
+
+/* Local IRQs must be disabled; can be preempted by NMI. */
+void alt_printk_exit(void)
+{
+	struct alt_printk_ctx *ctx = this_cpu_ptr(&alt_printk_ctx);
+
+	if (ctx->entry_count == 1)
+		__lockless_printk_exit();
+	ctx->entry_count--;
+}
+
 static void alt_printk_flush_line(const char *text, int len)
 {
 	/*
@@ -110,7 +215,6 @@ static void alt_printk_flush_line(const char *text, int len)
 		printk_deferred("%.*s", len, text);
 	else
 		printk("%.*s", len, text);
-
 }
 
 /*
@@ -135,6 +239,7 @@ static void __alt_printk_flush(struct irq_work *work)
 		__RAW_SPIN_LOCK_INITIALIZER(read_lock);
 	struct alt_printk_seq_buf *s = container_of(work,
 			struct alt_printk_seq_buf, work);
+	struct alt_printk_ctx *ctx = this_cpu_ptr(&alt_printk_ctx);
 	unsigned long flags;
 	size_t len, size;
 	int i, last_i;
@@ -147,6 +252,11 @@ static void __alt_printk_flush(struct irq_work *work)
 	 * a backtrace.
 	 */
 	raw_spin_lock_irqsave(&read_lock, flags);
+	/*
+	 * Forbid the alt_printk on this CPU, we want to flush messages to
+	 * logbuf, not to alt_printk buffer again.
+	 */
+	ctx->entry_count++;
 
 	i = 0;
 more:
@@ -195,6 +305,7 @@ static void __alt_printk_flush(struct irq_work *work)
 		goto more;
 
 out:
+	ctx->entry_count--;
 	raw_spin_unlock_irqrestore(&read_lock, flags);
 }
 
@@ -248,6 +359,9 @@ void __init alt_printk_init(void)
 		struct alt_printk_seq_buf *s = &per_cpu(nmi_print_seq, cpu);
 
 		init_irq_work(&s->work, __alt_printk_flush);
+
+		s = &per_cpu(alt_print_seq, cpu);
+		init_irq_work(&s->work, __alt_printk_flush);
 	}
 
 	/* Make sure that IRQ works are initialized before enabling. */
@@ -258,12 +372,30 @@ void __init alt_printk_init(void)
 	alt_printk_flush();
 }
 
+/*
+ * Safe printk() for NMI context. It uses a per-CPU buffer to
+ * store the message. NMIs are not nested, so there is always only
+ * one writer running. But the buffer might get flushed from another
+ * CPU, so we need to be careful.
+ */
+static int vprintk_nmi(const char *fmt, va_list args)
+{
+	struct alt_printk_seq_buf *s = this_cpu_ptr(&nmi_print_seq);
+	int add;
+
+	add = alt_printk_log_store(s, fmt, args);
+	if (!add)
+		atomic_inc(&nmi_message_lost);
+
+	return add;
+}
+
 void printk_nmi_enter(void)
 {
-	this_cpu_write(printk_func, vprintk_nmi);
+	__lockless_printk_enter(vprintk_nmi);
 }
 
 void printk_nmi_exit(void)
 {
-	this_cpu_write(printk_func, vprintk_default);
+	__lockless_printk_exit();
 }
diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h
index 7fd2838..c5f970e 100644
--- a/kernel/printk/internal.h
+++ b/kernel/printk/internal.h
@@ -20,6 +20,18 @@ typedef __printf(1, 0) int (*printk_func_t)(const char *fmt, va_list args);
 
 int __printf(1, 0) vprintk_default(const char *fmt, va_list args);
 
+#ifdef CONFIG_PRINTK
+
+void alt_printk_enter(void);
+void alt_printk_exit(void);
+
+#else
+
+void alt_printk_enter(void) { }
+void alt_printk_exit(void) { }
+
+#endif /* CONFIG_PRINTK */
+
 #ifdef CONFIG_PRINTK_NMI
 
 extern raw_spinlock_t logbuf_lock;
-- 
2.10.0.372.g6fe1b14

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [RFC][PATCH 4/7] printk: make alt_printk available when config printk set
  2016-09-27 14:22 [RFC][PATCH 0/7] printk: use alt_printk to handle printk() recursive calls Sergey Senozhatsky
                   ` (2 preceding siblings ...)
  2016-09-27 14:22 ` [RFC][PATCH 3/7] printk: introduce per-cpu alt_print seq buffer Sergey Senozhatsky
@ 2016-09-27 14:22 ` Sergey Senozhatsky
  2016-09-27 14:22 ` [RFC][PATCH 5/7] printk: drop vprintk_func function Sergey Senozhatsky
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 35+ messages in thread
From: Sergey Senozhatsky @ 2016-09-27 14:22 UTC (permalink / raw)
  To: Petr Mladek, Jan Kara
  Cc: Andrew Morton, Tejun Heo, Calvin Owens, linux-kernel,
	Sergey Senozhatsky, Sergey Senozhatsky

alt_printk must be enabled on systems that have CONFIG_PRINTK set,
while NMI related functions must depend on CONFIG_PRINTK_NMI.

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
---
 include/linux/printk.h     | 14 ++++++++------
 kernel/printk/Makefile     |  2 +-
 kernel/printk/alt_printk.c | 23 +++++++++++++++++------
 kernel/printk/internal.h   | 31 +++++++++++++------------------
 4 files changed, 39 insertions(+), 31 deletions(-)

diff --git a/include/linux/printk.h b/include/linux/printk.h
index 7510613..fae308a 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -131,17 +131,11 @@ void early_printk(const char *s, ...) { }
 #endif
 
 #ifdef CONFIG_PRINTK_NMI
-extern void alt_printk_init(void);
 extern void printk_nmi_enter(void);
 extern void printk_nmi_exit(void);
-extern void alt_printk_flush(void);
-extern void alt_printk_flush_on_panic(void);
 #else
-static inline void alt_printk_init(void) { }
 static inline void printk_nmi_enter(void) { }
 static inline void printk_nmi_exit(void) { }
-static inline void alt_printk_flush(void) { }
-static inline void alt_printk_flush_on_panic(void) { }
 #endif /* PRINTK_NMI */
 
 #ifdef CONFIG_PRINTK
@@ -193,6 +187,10 @@ void __init setup_log_buf(int early);
 __printf(1, 2) void dump_stack_set_arch_desc(const char *fmt, ...);
 void dump_stack_print_info(const char *log_lvl);
 void show_regs_print_info(const char *log_lvl);
+
+extern void alt_printk_init(void);
+extern void alt_printk_flush(void);
+extern void alt_printk_flush_on_panic(void);
 #else
 static inline __printf(1, 0)
 int vprintk(const char *s, va_list args)
@@ -252,6 +250,10 @@ static inline void dump_stack_print_info(const char *log_lvl)
 static inline void show_regs_print_info(const char *log_lvl)
 {
 }
+
+static inline void alt_printk_init(void) { }
+static inline void alt_printk_flush(void) { }
+static inline void alt_printk_flush_on_panic(void) { }
 #endif
 
 extern asmlinkage void dump_stack(void) __cold;
diff --git a/kernel/printk/Makefile b/kernel/printk/Makefile
index 3fc5e4e..d11838c 100644
--- a/kernel/printk/Makefile
+++ b/kernel/printk/Makefile
@@ -1,3 +1,3 @@
 obj-y	= printk.o
-obj-$(CONFIG_PRINTK_NMI)		+= alt_printk.o
+obj-$(CONFIG_PRINTK)		+= alt_printk.o
 obj-$(CONFIG_A11Y_BRAILLE_CONSOLE)	+= braille.o
diff --git a/kernel/printk/alt_printk.c b/kernel/printk/alt_printk.c
index 8978390..8ac1daa 100644
--- a/kernel/printk/alt_printk.c
+++ b/kernel/printk/alt_printk.c
@@ -40,7 +40,6 @@
  */
 DEFINE_PER_CPU(printk_func_t, printk_func) = vprintk_default;
 static int alt_printk_irq_ready;
-atomic_t nmi_message_lost;
 
 #define ALT_LOG_BUF_LEN ((1 << CONFIG_ALT_PRINTK_LOG_BUF_SHIFT) -	\
 			 sizeof(atomic_t) - sizeof(struct irq_work))
@@ -50,7 +49,11 @@ struct alt_printk_seq_buf {
 	struct irq_work		work;	/* IRQ work that flushes the buffer */
 	unsigned char		buffer[ALT_LOG_BUF_LEN];
 };
+
+#ifdef CONFIG_PRINTK_NMI
 static DEFINE_PER_CPU(struct alt_printk_seq_buf, nmi_print_seq);
+atomic_t nmi_message_lost;
+#endif
 
 /*
  * There can be two alt_printk contexts at most - a `normal' alt_printk
@@ -320,8 +323,12 @@ void alt_printk_flush(void)
 {
 	int cpu;
 
-	for_each_possible_cpu(cpu)
-		__alt_printk_flush(&per_cpu(nmi_print_seq, cpu).work);
+	for_each_possible_cpu(cpu) {
+		if (IS_ENABLED(CONFIG_PRINTK_NMI))
+			__alt_printk_flush(&per_cpu(nmi_print_seq, cpu).work);
+
+		__alt_printk_flush(&per_cpu(alt_print_seq, cpu).work);
+	}
 }
 
 /**
@@ -356,12 +363,14 @@ void __init alt_printk_init(void)
 	int cpu;
 
 	for_each_possible_cpu(cpu) {
-		struct alt_printk_seq_buf *s = &per_cpu(nmi_print_seq, cpu);
+		struct alt_printk_seq_buf *s = &per_cpu(alt_print_seq, cpu);
 
 		init_irq_work(&s->work, __alt_printk_flush);
 
-		s = &per_cpu(alt_print_seq, cpu);
-		init_irq_work(&s->work, __alt_printk_flush);
+		if (IS_ENABLED(CONFIG_PRINTK_NMI)) {
+			s = &per_cpu(nmi_print_seq, cpu);
+			init_irq_work(&s->work, __alt_printk_flush);
+		}
 	}
 
 	/* Make sure that IRQ works are initialized before enabling. */
@@ -372,6 +381,7 @@ void __init alt_printk_init(void)
 	alt_printk_flush();
 }
 
+#ifdef CONFIG_PRINTK_NMI
 /*
  * Safe printk() for NMI context. It uses a per-CPU buffer to
  * store the message. NMIs are not nested, so there is always only
@@ -399,3 +409,4 @@ void printk_nmi_exit(void)
 {
 	__lockless_printk_exit();
 }
+#endif /* CONFIG_PRINTK_NMI */
diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h
index c5f970e..3eebf6d 100644
--- a/kernel/printk/internal.h
+++ b/kernel/printk/internal.h
@@ -18,21 +18,9 @@
 
 typedef __printf(1, 0) int (*printk_func_t)(const char *fmt, va_list args);
 
-int __printf(1, 0) vprintk_default(const char *fmt, va_list args);
-
 #ifdef CONFIG_PRINTK
 
-void alt_printk_enter(void);
-void alt_printk_exit(void);
-
-#else
-
-void alt_printk_enter(void) { }
-void alt_printk_exit(void) { }
-
-#endif /* CONFIG_PRINTK */
-
-#ifdef CONFIG_PRINTK_NMI
+int __printf(1, 0) vprintk_default(const char *fmt, va_list args);
 
 extern raw_spinlock_t logbuf_lock;
 
@@ -48,6 +36,18 @@ static inline __printf(1, 0) int vprintk_func(const char *fmt, va_list args)
 	return this_cpu_read(printk_func)(fmt, args);
 }
 
+void alt_printk_enter(void);
+void alt_printk_exit(void);
+
+#else
+
+void alt_printk_enter(void) { }
+void alt_printk_exit(void) { }
+
+#endif /* CONFIG_PRINTK */
+
+#ifdef CONFIG_PRINTK_NMI
+
 extern atomic_t nmi_message_lost;
 static inline int get_nmi_message_lost(void)
 {
@@ -56,11 +56,6 @@ static inline int get_nmi_message_lost(void)
 
 #else /* CONFIG_PRINTK_NMI */
 
-static inline __printf(1, 0) int vprintk_func(const char *fmt, va_list args)
-{
-	return vprintk_default(fmt, args);
-}
-
 static inline int get_nmi_message_lost(void)
 {
 	return 0;
-- 
2.10.0.372.g6fe1b14

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [RFC][PATCH 5/7] printk: drop vprintk_func function
  2016-09-27 14:22 [RFC][PATCH 0/7] printk: use alt_printk to handle printk() recursive calls Sergey Senozhatsky
                   ` (3 preceding siblings ...)
  2016-09-27 14:22 ` [RFC][PATCH 4/7] printk: make alt_printk available when config printk set Sergey Senozhatsky
@ 2016-09-27 14:22 ` Sergey Senozhatsky
  2016-09-27 14:22 ` [RFC][PATCH 6/7] printk: use alternative printk buffers Sergey Senozhatsky
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 35+ messages in thread
From: Sergey Senozhatsky @ 2016-09-27 14:22 UTC (permalink / raw)
  To: Petr Mladek, Jan Kara
  Cc: Andrew Morton, Tejun Heo, Calvin Owens, linux-kernel,
	Sergey Senozhatsky, Sergey Senozhatsky

Drop vprintk_func() function and use this_cpu_read(printk_func),
since now we always have alt_printk enabled.

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
---
 kernel/printk/internal.h | 4 ----
 kernel/printk/printk.c   | 4 ++--
 2 files changed, 2 insertions(+), 6 deletions(-)

diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h
index 3eebf6d..b030ca0 100644
--- a/kernel/printk/internal.h
+++ b/kernel/printk/internal.h
@@ -31,10 +31,6 @@ extern raw_spinlock_t logbuf_lock;
  * via per-CPU variable.
  */
 DECLARE_PER_CPU(printk_func_t, printk_func);
-static inline __printf(1, 0) int vprintk_func(const char *fmt, va_list args)
-{
-	return this_cpu_read(printk_func)(fmt, args);
-}
 
 void alt_printk_enter(void);
 void alt_printk_exit(void);
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index c7ffbef..2afa16b 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1922,7 +1922,7 @@ EXPORT_SYMBOL(vprintk_emit);
 
 asmlinkage int vprintk(const char *fmt, va_list args)
 {
-	return vprintk_func(fmt, args);
+	return this_cpu_read(printk_func)(fmt, args);
 }
 EXPORT_SYMBOL(vprintk);
 
@@ -1984,7 +1984,7 @@ asmlinkage __visible int printk(const char *fmt, ...)
 	int r;
 
 	va_start(args, fmt);
-	r = vprintk_func(fmt, args);
+	r = this_cpu_read(printk_func)(fmt, args);
 	va_end(args);
 
 	return r;
-- 
2.10.0.372.g6fe1b14

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [RFC][PATCH 6/7] printk: use alternative printk buffers
  2016-09-27 14:22 [RFC][PATCH 0/7] printk: use alt_printk to handle printk() recursive calls Sergey Senozhatsky
                   ` (4 preceding siblings ...)
  2016-09-27 14:22 ` [RFC][PATCH 5/7] printk: drop vprintk_func function Sergey Senozhatsky
@ 2016-09-27 14:22 ` Sergey Senozhatsky
  2016-09-29 13:00   ` Petr Mladek
  2016-09-27 14:22 ` [RFC][PATCH 7/7] printk: new printk() recursion detection Sergey Senozhatsky
  2016-09-29 13:25 ` [RFC][PATCH 0/7] printk: use alt_printk to handle printk() recursive calls Petr Mladek
  7 siblings, 1 reply; 35+ messages in thread
From: Sergey Senozhatsky @ 2016-09-27 14:22 UTC (permalink / raw)
  To: Petr Mladek, Jan Kara
  Cc: Andrew Morton, Tejun Heo, Calvin Owens, linux-kernel,
	Sergey Senozhatsky, Sergey Senozhatsky

Use alt_printk buffer in in printk recursion-prone blocks:
-- around logbuf_lock protected sections in vprintk_emit() and
   console_unlock()
-- around down_trylock_console_sem() and up_console_sem()

Note that it addresses deadlocks caused by recursiove printk()
calls only.

Examples:

1) printk() from logbuf_lock spin_lock section

Assume the following code:
  printk()
    raw_spin_lock(&logbuf_lock);
    WARN_ON(1);
    raw_spin_unlock(&logbuf_lock);

which now produces:

 ------------[ cut here ]------------
 WARNING: CPU: 0 PID: 366 at kernel/printk/printk.c:1811 vprintk_emit+0x1cd/0x438
 CPU: 0 PID: 366 Comm: bash
 Call Trace:
  [<ffffffff811e24d9>] dump_stack+0x68/0x92
  [<ffffffff810455cd>] __warn+0xc2/0xdd
  [<ffffffff81045658>] warn_slowpath_null+0x1d/0x1f
  [<ffffffff81091527>] vprintk_emit+0x1cd/0x438
  [<ffffffff810918fe>] vprintk_default+0x1d/0x1f
  [<ffffffff810fdf6a>] printk+0x48/0x50
  [..]
  [<ffffffff813caaaa>] entry_SYSCALL_64_fastpath+0x18/0xad
 ---[ end trace ]---

2) printk() from semaphore sem->lock spin_lock section

Assume the following code

  printk()
    console_trylock()
      down_trylock()
        raw_spin_lock_irqsave(&sem->lock, flags);
        WARN_ON(1);
        raw_spin_unlock_irqrestore(&sem->lock, flags);

which now produces:

 ------------[ cut here ]------------
 WARNING: CPU: 1 PID: 363 at kernel/locking/semaphore.c:141 down_trylock+0x3d/0x62
 CPU: 1 PID: 363 Comm: bash
 Call Trace:
  [<ffffffff811e24e0>] dump_stack+0x68/0x92
  [<ffffffff810455cd>] __warn+0xc2/0xdd
  [<ffffffff81045658>] warn_slowpath_null+0x1d/0x1f
  [<ffffffff810838df>] down_trylock+0x3d/0x62
  [<ffffffff8109177e>] ? vprintk_emit+0x3f9/0x414
  [<ffffffff810905cb>] console_trylock+0x31/0xeb
  [<ffffffff8109177e>] vprintk_emit+0x3f9/0x414
  [<ffffffff81091905>] vprintk_default+0x1d/0x1f
  [<ffffffff810fdf71>] printk+0x48/0x50
  [..]
  [<ffffffff813caaaa>] entry_SYSCALL_64_fastpath+0x18/0xad
 ---[ end trace ]---

3) printk() from console_unlock()

Assume the following code:

  printk()
    console_unlock()
      raw_spin_lock(&logbuf_lock);
      WARN_ON(1);
      raw_spin_unlock(&logbuf_lock);

which now produces:

 ------------[ cut here ]------------
 WARNING: CPU: 1 PID: 329 at kernel/printk/printk.c:2384 console_unlock+0x12d/0x559
 CPU: 1 PID: 329 Comm: bash
 Call Trace:
  [<ffffffff811f2380>] dump_stack+0x68/0x92
  [<ffffffff8103d649>] __warn+0xb8/0xd3
  [<ffffffff8103d6ca>] warn_slowpath_null+0x18/0x1a
  [<ffffffff8107f917>] console_unlock+0x12d/0x559
  [<ffffffff810772c4>] ? trace_hardirqs_on_caller+0x16d/0x189
  [<ffffffff810772ed>] ? trace_hardirqs_on+0xd/0xf
  [<ffffffff810800a6>] vprintk_emit+0x363/0x374
  [<ffffffff81080219>] vprintk_default+0x18/0x1a
  [<ffffffff810c7f77>] printk+0x43/0x4b
  [..]
  [<ffffffff814c2d6a>] entry_SYSCALL_64_fastpath+0x18/0xad
 ---[ end trace ]---

4) printk() from try_to_wake_up()

Assume the following code:

  printk()
    console_unlock()
      up()
        try_to_wake_up()
          raw_spin_lock_irqsave(&p->pi_lock, flags);
          WARN_ON(1);
          raw_spin_unlock_irqrestore(&p->pi_lock, flags);

which now produces:

 ------------[ cut here ]------------
 WARNING: CPU: 3 PID: 363 at kernel/sched/core.c:2028 try_to_wake_up+0x7f/0x4f7
 CPU: 3 PID: 363 Comm: bash
 Call Trace:
  [<ffffffff811e2557>] dump_stack+0x68/0x92
  [<ffffffff810455cd>] __warn+0xc2/0xdd
  [<ffffffff81045658>] warn_slowpath_null+0x1d/0x1f
  [<ffffffff8106c8d0>] try_to_wake_up+0x7f/0x4f7
  [<ffffffff8106cd5d>] wake_up_process+0x15/0x17
  [<ffffffff813c82c6>] __up.isra.0+0x56/0x63
  [<ffffffff810839a3>] up+0x32/0x42
  [<ffffffff8108f624>] __up_console_sem+0x37/0x55
  [<ffffffff810910ff>] console_unlock+0x21e/0x4c2
  [<ffffffff810917bf>] vprintk_emit+0x41c/0x462
  [<ffffffff81091971>] vprintk_default+0x1d/0x1f
  [<ffffffff810fdfdd>] printk+0x48/0x50
  [..]
  [<ffffffff813cab2a>] entry_SYSCALL_64_fastpath+0x18/0xad
 ---[ end trace ]---

another example of WARN_ON() from scheduler code:

 ------------[ cut here ]------------
 WARNING: CPU: 1 PID: 361 at kernel/sched/core.c:2026 try_to_wake_up+0x6c/0x4e4
 CPU: 1 PID: 361 Comm: bash
 Call Trace:
  <IRQ>  [<ffffffff811e2544>] dump_stack+0x68/0x92
  [<ffffffff810455cd>] __warn+0xc2/0xdd
  [<ffffffff81045658>] warn_slowpath_null+0x1d/0x1f
  [<ffffffff8106c8bd>] try_to_wake_up+0x6c/0x4e4
  [<ffffffff81089760>] ? lock_acquire+0x130/0x1c4
  [<ffffffff8106cd4a>] wake_up_process+0x15/0x17
  [<ffffffff8105a696>] wake_up_worker+0x28/0x2a
  [<ffffffff8105a869>] insert_work+0xa5/0xb1
  [<ffffffff8105bb99>] __queue_work+0x374/0x509
  [<ffffffff8105bd68>] queue_work_on+0x3a/0x64
  [<ffffffff812d7fd1>] dbs_irq_work+0x24/0x27
  [<ffffffff810ebbe7>] irq_work_run_list+0x43/0x67
  [<ffffffff810ebe26>] irq_work_run+0x2a/0x44
  [<ffffffff81019196>] smp_irq_work_interrupt+0x2e/0x35
  [<ffffffff813cccf9>] irq_work_interrupt+0x89/0x90
  [<ffffffff813cd1fe>] ? __do_softirq+0x96/0x434
  [<ffffffff813cd207>] ? __do_softirq+0x9f/0x434
  [<ffffffff8104b6e4>] irq_exit+0x40/0x90
  [<ffffffff813ccee9>] smp_apic_timer_interrupt+0x42/0x4d
  [<ffffffff813cc2d9>] apic_timer_interrupt+0x89/0x90
  <EOI>  [<ffffffff8101c578>] ? native_sched_clock+0x7/0xbc
  [<ffffffff81072db7>] ? sched_clock_cpu+0x17/0xc6
  [<ffffffff81072fef>] running_clock+0x10/0x12
  [<ffffffff810917d8>] vprintk_emit+0x448/0x462
  [<ffffffff8109195e>] vprintk_default+0x1d/0x1f
  [<ffffffff810fdfca>] printk+0x48/0x50
  [..]
  [<ffffffff813cab2a>] entry_SYSCALL_64_fastpath+0x18/0xad
 ---[ end trace ]---

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
---
 kernel/printk/printk.c | 27 ++++++++++++++++++++-------
 1 file changed, 20 insertions(+), 7 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 2afa16b..e5dacfb 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1791,7 +1791,7 @@ asmlinkage int vprintk_emit(int facility, int level,
 		zap_locks();
 	}
 
-	lockdep_off();
+	alt_printk_enter();
 	/* This stops the holder of console_sem just where we want him */
 	raw_spin_lock(&logbuf_lock);
 	logbuf_cpu = this_cpu;
@@ -1900,12 +1900,11 @@ asmlinkage int vprintk_emit(int facility, int level,
 
 	logbuf_cpu = UINT_MAX;
 	raw_spin_unlock(&logbuf_lock);
-	lockdep_on();
+	alt_printk_exit();
 	local_irq_restore(flags);
 
 	/* If called from the scheduler, we can not call up(). */
 	if (!in_sched) {
-		lockdep_off();
 		/*
 		 * Try to acquire and then immediately release the console
 		 * semaphore.  The release will print out buffers and wake up
@@ -1913,7 +1912,6 @@ asmlinkage int vprintk_emit(int facility, int level,
 		 */
 		if (console_trylock())
 			console_unlock();
-		lockdep_on();
 	}
 
 	return printed_len;
@@ -2239,8 +2237,18 @@ EXPORT_SYMBOL(console_lock);
  */
 int console_trylock(void)
 {
-	if (down_trylock_console_sem())
+	unsigned long flags;
+	int lock_failed;
+
+	local_irq_save(flags);
+	alt_printk_enter();
+	lock_failed = down_trylock_console_sem();
+	alt_printk_exit();
+	local_irq_restore(flags);
+
+	if (lock_failed)
 		return 0;
+
 	if (console_suspended) {
 		up_console_sem();
 		return 0;
@@ -2395,7 +2403,9 @@ void console_unlock(void)
 		size_t len;
 		int level;
 
-		raw_spin_lock_irqsave(&logbuf_lock, flags);
+		local_irq_save(flags);
+		alt_printk_enter();
+		raw_spin_lock(&logbuf_lock);
 		if (seen_seq != log_next_seq) {
 			wake_klogd = true;
 			seen_seq = log_next_seq;
@@ -2456,6 +2466,7 @@ void console_unlock(void)
 		stop_critical_timings();	/* don't trace print latency */
 		call_console_drivers(level, ext_text, ext_len, text, len);
 		start_critical_timings();
+		alt_printk_exit();
 		local_irq_restore(flags);
 
 		if (do_cond_resched)
@@ -2479,7 +2490,9 @@ void console_unlock(void)
 	 */
 	raw_spin_lock(&logbuf_lock);
 	retry = console_seq != log_next_seq;
-	raw_spin_unlock_irqrestore(&logbuf_lock, flags);
+	raw_spin_unlock(&logbuf_lock);
+	alt_printk_exit();
+	local_irq_restore(flags);
 
 	if (retry && console_trylock())
 		goto again;
-- 
2.10.0.372.g6fe1b14

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [RFC][PATCH 7/7] printk: new printk() recursion detection
  2016-09-27 14:22 [RFC][PATCH 0/7] printk: use alt_printk to handle printk() recursive calls Sergey Senozhatsky
                   ` (5 preceding siblings ...)
  2016-09-27 14:22 ` [RFC][PATCH 6/7] printk: use alternative printk buffers Sergey Senozhatsky
@ 2016-09-27 14:22 ` Sergey Senozhatsky
  2016-09-29 13:19   ` Petr Mladek
  2016-09-29 13:25 ` [RFC][PATCH 0/7] printk: use alt_printk to handle printk() recursive calls Petr Mladek
  7 siblings, 1 reply; 35+ messages in thread
From: Sergey Senozhatsky @ 2016-09-27 14:22 UTC (permalink / raw)
  To: Petr Mladek, Jan Kara
  Cc: Andrew Morton, Tejun Heo, Calvin Owens, linux-kernel,
	Sergey Senozhatsky, Sergey Senozhatsky

We can't use alt_printk when we are flushing alt_printk buffers to logbuf,
because otherwise we would just flush messages back to alt_printk buffers.
At the same time, this opens a door for potential deadlock, should flushing
endup in recursive printk() call:

  irq_work_run()
    alt_printk_flush_line()
      printk()
        WARN_ON()
        ^^^^^^^^^
        depending on circumstances this can result in printk-deadlock.

To cope with it, we can use alt_printk context ->entry_count:
- __alt_printk_flush() sets ->entry_count to 1
- every printk() call from alt_printk_flush_foo() calls
  alt_printk_enter() and alt_printk_exit()
- alt_printk_enter() increments ->entry_count and tests the value:
  - if it's `==2' then we are in normal alt_printk_flush()->printk() path
  - if it's `>=3' then we are in recursion. The first time we detect
    recursion (->entry_count == 3) we switch from default printk() to
    alt_printk(), to avoid possible deadlocks.
- in alt_printk_exit() we test if we were in normal printk() path or
  in printk() recursion and rollback to default printk() if so;
  and decrement ->entry_count.

->entry_count is a per-cpu variable, which must be touched only with
local IRQs disabled. NMI should never touch ->entry_count.

An example:

WARN_ON() from printk() while alt_printk_flush() flushed messages
from alt_printk buffer to a logbuf:

 ------------[ cut here ]------------
 WARNING: CPU: 1 PID: 332 at kernel/printk/printk.c:1775 vprintk_emit+0x73/0x398
 CPU: 1 PID: 332 Comm: bash
 Call Trace:
  <IRQ>  [<ffffffff811f23ae>] dump_stack+0x68/0x92
  [<ffffffff8103d649>] __warn+0xb8/0xd3
  [<ffffffff8103d6ca>] warn_slowpath_null+0x18/0x1a
  [<ffffffff8107fdb6>] vprintk_emit+0x73/0x398
  [<ffffffff8108023d>] vprintk_default+0x18/0x1a
  [<ffffffff810c7fa5>] printk+0x43/0x4b
  [<ffffffff81080fc3>] alt_printk_flush_line+0x27/0x29
  [<ffffffff810810e1>] __alt_printk_flush+0xd7/0x12d
  [<ffffffff810b6356>] irq_work_run_list+0x43/0x67
  [<ffffffff810b6595>] irq_work_run+0x2a/0x44
  [<ffffffff81016925>] smp_trace_irq_work_interrupt+0x29/0x30
  [<ffffffff81016935>] smp_irq_work_interrupt+0x9/0xb
  [<ffffffff814c3fe6>] irq_work_interrupt+0x86/0x90
  <EOI>  [<ffffffff814c2577>] ? _raw_spin_unlock+0x2c/0x3f
  [<ffffffff8107fd12>] ? console_unlock+0x528/0x559
  [<ffffffff810800ca>] vprintk_emit+0x387/0x398
  [<ffffffff8108023d>] vprintk_default+0x18/0x1a
  [<ffffffff810c7fa5>] printk+0x43/0x4b
  [..]
  [<ffffffff814c2daa>] entry_SYSCALL_64_fastpath+0x18/0xad
 ---[ end trace ]---

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
---
 kernel/printk/alt_printk.c | 26 +++++++++++++++++++++++---
 kernel/printk/internal.h   |  5 +++++
 kernel/printk/printk.c     | 38 ++------------------------------------
 3 files changed, 30 insertions(+), 39 deletions(-)

diff --git a/kernel/printk/alt_printk.c b/kernel/printk/alt_printk.c
index 8ac1daa..0302ea3 100644
--- a/kernel/printk/alt_printk.c
+++ b/kernel/printk/alt_printk.c
@@ -69,6 +69,9 @@ atomic_t nmi_message_lost;
 #define MAX_ALT_PRINTK_CTX	1
 #endif
 
+#define CTX_ENTRY_FLUSHING	2
+#define CTX_ENTRY_FLUSHING_RECURSION	(CTX_ENTRY_FLUSHING + 1)
+
 struct alt_printk_ctx {
 	atomic_t	idx;
 	unsigned int	entry_count;
@@ -189,10 +192,15 @@ void alt_printk_enter(void)
 	 *     will see ->entry_count > 2.
 	 */
 	ctx->entry_count++;
-	if (ctx->entry_count > 1)
+	if (ctx->entry_count == CTX_ENTRY_FLUSHING)
 		return;
 
-	/* @TODO: do something sensible in case of printk() recursion */
+	/*
+	 * Once ->entry_coun == CTX_ENTRY_FLUSHING_RECURSION printk()
+	 * switches to alt_printk() buffers.
+	 */
+	if (ctx->entry_count > CTX_ENTRY_FLUSHING_RECURSION)
+		return;
 
 	__lockless_printk_enter(vprintk_alt);
 }
@@ -202,11 +210,23 @@ void alt_printk_exit(void)
 {
 	struct alt_printk_ctx *ctx = this_cpu_ptr(&alt_printk_ctx);
 
-	if (ctx->entry_count == 1)
+	/*
+	 * If we are returning from a normal alt_printk context or
+	 * from printk() recursion -- switch back to default printk.
+	 */
+	if (ctx->entry_count == 1 ||
+			ctx->entry_count == CTX_ENTRY_FLUSHING_RECURSION)
 		__lockless_printk_exit();
 	ctx->entry_count--;
 }
 
+bool recursed_printk_call(void)
+{
+	struct alt_printk_ctx *ctx = this_cpu_ptr(&alt_printk_ctx);
+
+	return oops_in_progress && (ctx->entry_count > CTX_ENTRY_FLUSHING);
+}
+
 static void alt_printk_flush_line(const char *text, int len)
 {
 	/*
diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h
index b030ca0..7d082b7 100644
--- a/kernel/printk/internal.h
+++ b/kernel/printk/internal.h
@@ -34,11 +34,16 @@ DECLARE_PER_CPU(printk_func_t, printk_func);
 
 void alt_printk_enter(void);
 void alt_printk_exit(void);
+bool recursed_printk_call(void);
 
 #else
 
 void alt_printk_enter(void) { }
 void alt_printk_exit(void) { }
+bool recursed_printk_call(void)
+{
+	return false;
+}
 
 #endif /* CONFIG_PRINTK */
 
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index e5dacfb..47f1e0e 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1748,18 +1748,14 @@ asmlinkage int vprintk_emit(int facility, int level,
 			    const char *dict, size_t dictlen,
 			    const char *fmt, va_list args)
 {
-	static bool recursion_bug;
 	static char textbuf[LOG_LINE_MAX];
 	char *text = textbuf;
 	size_t text_len = 0;
 	enum log_flags lflags = 0;
 	unsigned long flags;
-	int this_cpu;
 	int printed_len = 0;
 	int nmi_message_lost;
 	bool in_sched = false;
-	/* cpu currently holding logbuf_lock in this function */
-	static unsigned int logbuf_cpu = UINT_MAX;
 
 	if (level == LOGLEVEL_SCHED) {
 		level = LOGLEVEL_DEFAULT;
@@ -1770,42 +1766,13 @@ asmlinkage int vprintk_emit(int facility, int level,
 	printk_delay();
 
 	local_irq_save(flags);
-	this_cpu = smp_processor_id();
+	alt_printk_enter();
 
-	/*
-	 * Ouch, printk recursed into itself!
-	 */
-	if (unlikely(logbuf_cpu == this_cpu)) {
-		/*
-		 * If a crash is occurring during printk() on this CPU,
-		 * then try to get the crash message out but make sure
-		 * we can't deadlock. Otherwise just return to avoid the
-		 * recursion and return - but flag the recursion so that
-		 * it can be printed at the next appropriate moment:
-		 */
-		if (!oops_in_progress && !lockdep_recursing(current)) {
-			recursion_bug = true;
-			local_irq_restore(flags);
-			return 0;
-		}
+	if (recursed_printk_call())
 		zap_locks();
-	}
 
-	alt_printk_enter();
 	/* This stops the holder of console_sem just where we want him */
 	raw_spin_lock(&logbuf_lock);
-	logbuf_cpu = this_cpu;
-
-	if (unlikely(recursion_bug)) {
-		static const char recursion_msg[] =
-			"BUG: recent printk recursion!";
-
-		recursion_bug = false;
-		/* emit KERN_CRIT message */
-		printed_len += log_store(0, 2, LOG_PREFIX|LOG_NEWLINE, 0,
-					 NULL, 0, recursion_msg,
-					 strlen(recursion_msg));
-	}
 
 	nmi_message_lost = get_nmi_message_lost();
 	if (unlikely(nmi_message_lost)) {
@@ -1898,7 +1865,6 @@ asmlinkage int vprintk_emit(int facility, int level,
 						 dict, dictlen, text, text_len);
 	}
 
-	logbuf_cpu = UINT_MAX;
 	raw_spin_unlock(&logbuf_lock);
 	alt_printk_exit();
 	local_irq_restore(flags);
-- 
2.10.0.372.g6fe1b14

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [RFC][PATCH 3/7] printk: introduce per-cpu alt_print seq buffer
  2016-09-27 14:22 ` [RFC][PATCH 3/7] printk: introduce per-cpu alt_print seq buffer Sergey Senozhatsky
@ 2016-09-29 12:26   ` Petr Mladek
  2016-09-30  1:05     ` Sergey Senozhatsky
  0 siblings, 1 reply; 35+ messages in thread
From: Petr Mladek @ 2016-09-29 12:26 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Jan Kara, Andrew Morton, Tejun Heo, Calvin Owens, linux-kernel,
	Sergey Senozhatsky

On Tue 2016-09-27 23:22:33, Sergey Senozhatsky wrote:
> This patch extends the idea of NMI per-cpu buffers to regions
> that may cause recursive printk() calls and possible deadlocks.
> Namely, printk() can't handle printk calls from schedule code
> or printk() calls from lock debugging code (spin_dump() for instance);
> because those may be called with `sem->lock' already taken or any
> of critical scheduler locks (p->pi_lock, etc.). An example of
> a deadlock can be
> 
>  vprintk_emit()
>   console_unlock()
>    up()                        << raw_spin_lock_irqsave(&sem->lock, flags);
>     wake_up_process()
>      try_to_wake_up()
>       ttwu_queue()
>        ttwu_do_activate()
>         ttwu_activate()
>          activate_task()
>           enqueue_task()
>            enqueue_task_fair()
>             cfs_rq_of()
>              task_of()
>               WARN_ON_ONCE(!entity_is_task(se))
>                vprintk_emit()
>                 console_trylock()
>                  down_trylock()
>                   raw_spin_lock_irqsave(&sem->lock, flags)
>                   ^^^^ deadlock
> 
> and some other cases.
> 
> Just like in NMI implementation, the solution uses a per-cpu
> `printk_func' pointer to 'redirect' printk() calls to a 'safe'
> callback, that store messages in a per-cpu buffer and flushes
> them back to logbuf buffer later.
> 
> Usage example:
> 
>  printk()
>   local_irq_save()
>   alt_printk_enter()

We need to make sure that exit() is called on the same CPU.
Therefore we need to disable preemption as well.

>   //
>   //  any printk() call from here will endup in vprintk_alt(),
>   //  that stores messages in a special per-CPU buffer.
>   //
>   alt_printk_exit()
>   local_irq_restore()
> 
> The patch only adds a alt_printk support, we don't use it yet.
> 
> Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
> ---
>  kernel/printk/alt_printk.c | 160 +++++++++++++++++++++++++++++++++++++++++----
>  kernel/printk/internal.h   |  12 ++++
>  2 files changed, 158 insertions(+), 14 deletions(-)
> 
> diff --git a/kernel/printk/alt_printk.c b/kernel/printk/alt_printk.c
> index 7178661..8978390 100644
> --- a/kernel/printk/alt_printk.c
> +++ b/kernel/printk/alt_printk.c
> @@ -1,5 +1,5 @@
>  /*
> - * alt_printk.c - Safe printk in NMI context
> + * alt_printk.c - Safe printk for printk-deadlock-prone contexts
>   *
>   * This program is free software; you can redistribute it and/or
>   * modify it under the terms of the GNU General Public License
> @@ -53,24 +53,39 @@ struct alt_printk_seq_buf {
>  static DEFINE_PER_CPU(struct alt_printk_seq_buf, nmi_print_seq);
>  
>  /*
> - * Safe printk() for NMI context. It uses a per-CPU buffer to
> - * store the message. NMIs are not nested, so there is always only
> - * one writer running. But the buffer might get flushed from another
> - * CPU, so we need to be careful.
> + * There can be two alt_printk contexts at most - a `normal' alt_printk
> + * and NMI alt_printk context. Normal alt_printk context is the one that
> + * direct caller of printk() setups (either a process or IRQ) and it can
> + * be preempted only by NMI (if the platform supports NMI). NMI context
> + * can preempt normal alt_printk context, but cannot be preempted on its
> + * own.
>   */
> -static int vprintk_nmi(const char *fmt, va_list args)
> +#ifdef CONFIG_PRINTK_NMI
> +#define MAX_ALT_PRINTK_CTX	2
> +#else
> +#define MAX_ALT_PRINTK_CTX	1
> +#endif
> +
> +struct alt_printk_ctx {
> +	atomic_t	idx;
> +	unsigned int	entry_count;
> +	printk_func_t	saved_printk_func[MAX_ALT_PRINTK_CTX];
> +};
> +
> +static DEFINE_PER_CPU(struct alt_printk_seq_buf, alt_print_seq);
> +static DEFINE_PER_CPU(struct alt_printk_ctx, alt_printk_ctx);
> +
> +static int alt_printk_log_store(struct alt_printk_seq_buf *s,
> +		const char *fmt, va_list args)
>  {
> -	struct alt_printk_seq_buf *s = this_cpu_ptr(&nmi_print_seq);
> -	int add = 0;
> +	int add;
>  	size_t len;
>  
>  again:
>  	len = atomic_read(&s->len);
>  
> -	if (len >= sizeof(s->buffer)) {
> -		atomic_inc(&nmi_message_lost);
> +	if (len >= sizeof(s->buffer))
>  		return 0;
> -	}
>  
>  	/*
>  	 * Make sure that all old data have been read before the buffer was
> @@ -99,6 +114,96 @@ static int vprintk_nmi(const char *fmt, va_list args)
>  	return add;
>  }
>  
> +/*
> + * Lockless printk(), to avoid deadlocks should the printk() recurse
> + * into itself. It uses a per-CPU buffer to store the message, just like
> + * NMI.
> + */
> +static int vprintk_alt(const char *fmt, va_list args)
> +{
> +	struct alt_printk_seq_buf *s = this_cpu_ptr(&alt_print_seq);
> +
> +	return alt_printk_log_store(s, fmt, args);
> +}
> +
> +/*
> + * We must keep the track of `printk_func' because alt_printk
> + * context can be preempted by NMI alt_printk context.
> + *
> + * Consider the following example:
> + *
> + * vprintk_emit()
> + * 	alt_printk_enter()
> + * 		printk_func = vprintk_alt;
> + *
> + * -> NMI
> + *  	printk_nmi_enter()
> + *		printk_func = vprintk_nmi;
> + *	printk_nmi_exit()
> + *		printk_func = vprintk_default;
> + *		^^^^^^^^^^^
> + * <- NMI
> + *
> + * 	printk("foo") -> vprintk_default();
> + *
> + * Thus we must restore the orignal `printk_func' value, the one
> + * NMI saw at printk_nmi_enter() time.
> + */
> +static void __lockless_printk_enter(printk_func_t new_func)
> +{
> +	struct alt_printk_ctx *ctx = this_cpu_ptr(&alt_printk_ctx);
> +	int idx = atomic_inc_return(&ctx->idx) - 1;
> +
> +	ctx->saved_printk_func[idx] = this_cpu_read(printk_func);
> +	this_cpu_write(printk_func, new_func);
> +}
> +
> +static void __lockless_printk_exit(void)
> +{
> +	struct alt_printk_ctx *ctx = this_cpu_ptr(&alt_printk_ctx);
> +	int idx = atomic_read(&ctx->idx) - 1;
> +
> +	this_cpu_write(printk_func, ctx->saved_printk_func[idx]);
> +	atomic_dec(&ctx->idx);
> +}
> +
> +/* Local IRQs must be disabled; can be preempted by NMI. */
> +void alt_printk_enter(void)
> +{
> +	struct alt_printk_ctx *ctx = this_cpu_ptr(&alt_printk_ctx);
> +
> +	/*
> +	 * We can't switch `printk_func' while the CPU is flushing its
> +	 * alternative buffers. At the same time, we leave flushing
> +	 * `unprotected', because we always use vprintk_default() there.
> +	 *
> +	 * ->entry_count can detect printk() recursion from flushing context:
> +	 *  -- alt_printk_flush() sets ->entry_count to 1
> +	 *  -- every vprintk_default() call from alt_printk_flush() increments
> +	 *     ->entry_count to 2 when it enters the recursion un-safe region
> +	 *     and decrements it back to 1 when it leaves that region
> +	 *  -- thus, if printk() will recurs from recursion un-safe region, we
> +	 *     will see ->entry_count > 2.
> +	 */
> +	ctx->entry_count++;
> +	if (ctx->entry_count > 1)
> +		return;
> +
> +	/* @TODO: do something sensible in case of printk() recursion */
> +
> +	__lockless_printk_enter(vprintk_alt);
> +}
> +
> +/* Local IRQs must be disabled; can be preempted by NMI. */
> +void alt_printk_exit(void)
> +{
> +	struct alt_printk_ctx *ctx = this_cpu_ptr(&alt_printk_ctx);
> +
> +	if (ctx->entry_count == 1)
> +		__lockless_printk_exit();
> +	ctx->entry_count--;
> +}

Huh, this is complicated like hell. I think that it is like this
because of two reasons. First, it tries to change more values a
lockless way. Second, it tries to be kind of generic.

Note that it is not much generic. idx is increased or decreased only
when entry_count == 1 or when called from nmi_enter()/exit().
Therefore it could not be easily extended to yet another alt buffer.
Also it relies on the fact the nmi_enter()/exit() calls are not
nested.

What do you think about my approach with the printk_context per-CPU
value from the WARN_DEFERRED() patchset? The main idea is that
the entry()/exit() functions manipulate preempt_count-like per-CPU
variable. The printk() function selects the safe implementation
according to the current state.

>  static void alt_printk_flush_line(const char *text, int len)
>  {
>  	/*
> @@ -110,7 +215,6 @@ static void alt_printk_flush_line(const char *text, int len)
>  		printk_deferred("%.*s", len, text);
>  	else
>  		printk("%.*s", len, text);
> -
>  }
>  
>  /*
> @@ -135,6 +239,7 @@ static void __alt_printk_flush(struct irq_work *work)
>  		__RAW_SPIN_LOCK_INITIALIZER(read_lock);
>  	struct alt_printk_seq_buf *s = container_of(work,
>  			struct alt_printk_seq_buf, work);
> +	struct alt_printk_ctx *ctx = this_cpu_ptr(&alt_printk_ctx);
>  	unsigned long flags;
>  	size_t len, size;
>  	int i, last_i;
> @@ -147,6 +252,11 @@ static void __alt_printk_flush(struct irq_work *work)
>  	 * a backtrace.
>  	 */
>  	raw_spin_lock_irqsave(&read_lock, flags);
> +	/*
> +	 * Forbid the alt_printk on this CPU, we want to flush messages to
> +	 * logbuf, not to alt_printk buffer again.
> +	 */
> +	ctx->entry_count++;

This looks very strange. If entry_count is not zero, it means that we
are not in a safe context to flush the alternative buffer. Or do
I miss something, please?
 

My other concern about this approach was that it would spread printk()
messages to even more buffers. But I am not scared anymore. The
new buffer will be used only for printk-internal errors that would
normally cause a deadlock. The important thing is that we are able
to share the implementation handling the extra buffer.

So, I think that this approach have a chance if we clean it a bit.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC][PATCH 6/7] printk: use alternative printk buffers
  2016-09-27 14:22 ` [RFC][PATCH 6/7] printk: use alternative printk buffers Sergey Senozhatsky
@ 2016-09-29 13:00   ` Petr Mladek
  2016-09-30  1:15     ` Sergey Senozhatsky
  0 siblings, 1 reply; 35+ messages in thread
From: Petr Mladek @ 2016-09-29 13:00 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Jan Kara, Andrew Morton, Tejun Heo, Calvin Owens, linux-kernel,
	Sergey Senozhatsky

On Tue 2016-09-27 23:22:36, Sergey Senozhatsky wrote:
> Use alt_printk buffer in in printk recursion-prone blocks:
> -- around logbuf_lock protected sections in vprintk_emit() and
>    console_unlock()
> -- around down_trylock_console_sem() and up_console_sem()
> 
> Note that it addresses deadlocks caused by recursiove printk()
> calls only.
> 
> Examples:
> 
> 1) printk() from logbuf_lock spin_lock section
> 
> Assume the following code:
>   printk()
>     raw_spin_lock(&logbuf_lock);
>     WARN_ON(1);
>     raw_spin_unlock(&logbuf_lock);
> 
> which now produces:
> 
>  ------------[ cut here ]------------
>  WARNING: CPU: 0 PID: 366 at kernel/printk/printk.c:1811 vprintk_emit+0x1cd/0x438
>  CPU: 0 PID: 366 Comm: bash
>  Call Trace:
>   [<ffffffff811e24d9>] dump_stack+0x68/0x92
>   [<ffffffff810455cd>] __warn+0xc2/0xdd
>   [<ffffffff81045658>] warn_slowpath_null+0x1d/0x1f
>   [<ffffffff81091527>] vprintk_emit+0x1cd/0x438
>   [<ffffffff810918fe>] vprintk_default+0x1d/0x1f
>   [<ffffffff810fdf6a>] printk+0x48/0x50
>   [..]
>   [<ffffffff813caaaa>] entry_SYSCALL_64_fastpath+0x18/0xad
>  ---[ end trace ]---
> 
> 2) printk() from semaphore sem->lock spin_lock section
> 
> Assume the following code
> 
>   printk()
>     console_trylock()
>       down_trylock()
>         raw_spin_lock_irqsave(&sem->lock, flags);
>         WARN_ON(1);
>         raw_spin_unlock_irqrestore(&sem->lock, flags);
> 
> which now produces:
> 
>  ------------[ cut here ]------------
>  WARNING: CPU: 1 PID: 363 at kernel/locking/semaphore.c:141 down_trylock+0x3d/0x62
>  CPU: 1 PID: 363 Comm: bash
>  Call Trace:
>   [<ffffffff811e24e0>] dump_stack+0x68/0x92
>   [<ffffffff810455cd>] __warn+0xc2/0xdd
>   [<ffffffff81045658>] warn_slowpath_null+0x1d/0x1f
>   [<ffffffff810838df>] down_trylock+0x3d/0x62
>   [<ffffffff8109177e>] ? vprintk_emit+0x3f9/0x414
>   [<ffffffff810905cb>] console_trylock+0x31/0xeb
>   [<ffffffff8109177e>] vprintk_emit+0x3f9/0x414
>   [<ffffffff81091905>] vprintk_default+0x1d/0x1f
>   [<ffffffff810fdf71>] printk+0x48/0x50
>   [..]
>   [<ffffffff813caaaa>] entry_SYSCALL_64_fastpath+0x18/0xad
>  ---[ end trace ]---
> 
> 3) printk() from console_unlock()
> 
> Assume the following code:
> 
>   printk()
>     console_unlock()
>       raw_spin_lock(&logbuf_lock);
>       WARN_ON(1);
>       raw_spin_unlock(&logbuf_lock);
> 
> which now produces:
> 
>  ------------[ cut here ]------------
>  WARNING: CPU: 1 PID: 329 at kernel/printk/printk.c:2384 console_unlock+0x12d/0x559
>  CPU: 1 PID: 329 Comm: bash
>  Call Trace:
>   [<ffffffff811f2380>] dump_stack+0x68/0x92
>   [<ffffffff8103d649>] __warn+0xb8/0xd3
>   [<ffffffff8103d6ca>] warn_slowpath_null+0x18/0x1a
>   [<ffffffff8107f917>] console_unlock+0x12d/0x559
>   [<ffffffff810772c4>] ? trace_hardirqs_on_caller+0x16d/0x189
>   [<ffffffff810772ed>] ? trace_hardirqs_on+0xd/0xf
>   [<ffffffff810800a6>] vprintk_emit+0x363/0x374
>   [<ffffffff81080219>] vprintk_default+0x18/0x1a
>   [<ffffffff810c7f77>] printk+0x43/0x4b
>   [..]
>   [<ffffffff814c2d6a>] entry_SYSCALL_64_fastpath+0x18/0xad
>  ---[ end trace ]---
> 
> 4) printk() from try_to_wake_up()
> 
> Assume the following code:
> 
>   printk()
>     console_unlock()
>       up()
>         try_to_wake_up()
>           raw_spin_lock_irqsave(&p->pi_lock, flags);
>           WARN_ON(1);
>           raw_spin_unlock_irqrestore(&p->pi_lock, flags);
> 
> which now produces:
> 
>  ------------[ cut here ]------------
>  WARNING: CPU: 3 PID: 363 at kernel/sched/core.c:2028 try_to_wake_up+0x7f/0x4f7
>  CPU: 3 PID: 363 Comm: bash
>  Call Trace:
>   [<ffffffff811e2557>] dump_stack+0x68/0x92
>   [<ffffffff810455cd>] __warn+0xc2/0xdd
>   [<ffffffff81045658>] warn_slowpath_null+0x1d/0x1f
>   [<ffffffff8106c8d0>] try_to_wake_up+0x7f/0x4f7
>   [<ffffffff8106cd5d>] wake_up_process+0x15/0x17
>   [<ffffffff813c82c6>] __up.isra.0+0x56/0x63
>   [<ffffffff810839a3>] up+0x32/0x42
>   [<ffffffff8108f624>] __up_console_sem+0x37/0x55
>   [<ffffffff810910ff>] console_unlock+0x21e/0x4c2
>   [<ffffffff810917bf>] vprintk_emit+0x41c/0x462
>   [<ffffffff81091971>] vprintk_default+0x1d/0x1f
>   [<ffffffff810fdfdd>] printk+0x48/0x50
>   [..]
>   [<ffffffff813cab2a>] entry_SYSCALL_64_fastpath+0x18/0xad
>  ---[ end trace ]---
> 
> another example of WARN_ON() from scheduler code:
> 
>  ------------[ cut here ]------------
>  WARNING: CPU: 1 PID: 361 at kernel/sched/core.c:2026 try_to_wake_up+0x6c/0x4e4
>  CPU: 1 PID: 361 Comm: bash
>  Call Trace:
>   <IRQ>  [<ffffffff811e2544>] dump_stack+0x68/0x92
>   [<ffffffff810455cd>] __warn+0xc2/0xdd
>   [<ffffffff81045658>] warn_slowpath_null+0x1d/0x1f
>   [<ffffffff8106c8bd>] try_to_wake_up+0x6c/0x4e4
>   [<ffffffff81089760>] ? lock_acquire+0x130/0x1c4
>   [<ffffffff8106cd4a>] wake_up_process+0x15/0x17
>   [<ffffffff8105a696>] wake_up_worker+0x28/0x2a
>   [<ffffffff8105a869>] insert_work+0xa5/0xb1
>   [<ffffffff8105bb99>] __queue_work+0x374/0x509
>   [<ffffffff8105bd68>] queue_work_on+0x3a/0x64
>   [<ffffffff812d7fd1>] dbs_irq_work+0x24/0x27
>   [<ffffffff810ebbe7>] irq_work_run_list+0x43/0x67
>   [<ffffffff810ebe26>] irq_work_run+0x2a/0x44
>   [<ffffffff81019196>] smp_irq_work_interrupt+0x2e/0x35
>   [<ffffffff813cccf9>] irq_work_interrupt+0x89/0x90
>   [<ffffffff813cd1fe>] ? __do_softirq+0x96/0x434
>   [<ffffffff813cd207>] ? __do_softirq+0x9f/0x434
>   [<ffffffff8104b6e4>] irq_exit+0x40/0x90
>   [<ffffffff813ccee9>] smp_apic_timer_interrupt+0x42/0x4d
>   [<ffffffff813cc2d9>] apic_timer_interrupt+0x89/0x90
>   <EOI>  [<ffffffff8101c578>] ? native_sched_clock+0x7/0xbc
>   [<ffffffff81072db7>] ? sched_clock_cpu+0x17/0xc6
>   [<ffffffff81072fef>] running_clock+0x10/0x12
>   [<ffffffff810917d8>] vprintk_emit+0x448/0x462
>   [<ffffffff8109195e>] vprintk_default+0x1d/0x1f
>   [<ffffffff810fdfca>] printk+0x48/0x50
>   [..]
>   [<ffffffff813cab2a>] entry_SYSCALL_64_fastpath+0x18/0xad
>  ---[ end trace ]---
> 
> Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
> ---
>  kernel/printk/printk.c | 27 ++++++++++++++++++++-------
>  1 file changed, 20 insertions(+), 7 deletions(-)
> 
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 2afa16b..e5dacfb 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -1791,7 +1791,7 @@ asmlinkage int vprintk_emit(int facility, int level,
>  		zap_locks();
>  	}
>  
> -	lockdep_off();
> +	alt_printk_enter();

IMHO, we could not longer enter vprintk_emit() recursively. The same
section that was guarded by logbuf_cpu is guarded by
alt_printk_enter()/exit() now.

IMHO, we could remove all the logic around the recursion. Then we
could even disable/enable irqs inside alt_printk_enter()/exit().

And to correct myself from the previous mail. It is enough to disable
IRQs. It is enough to make sure that we will not preempt and will
stay on the same CPU.

>  	/* This stops the holder of console_sem just where we want him */
>  	raw_spin_lock(&logbuf_lock);
>  	logbuf_cpu = this_cpu;
> @@ -1900,12 +1900,11 @@ asmlinkage int vprintk_emit(int facility, int level,
>  
>  	logbuf_cpu = UINT_MAX;
>  	raw_spin_unlock(&logbuf_lock);
> -	lockdep_on();
> +	alt_printk_exit();
>  	local_irq_restore(flags);
>  
>  	/* If called from the scheduler, we can not call up(). */
>  	if (!in_sched) {
> -		lockdep_off();
>  		/*
>  		 * Try to acquire and then immediately release the console
>  		 * semaphore.  The release will print out buffers and wake up
> @@ -1913,7 +1912,6 @@ asmlinkage int vprintk_emit(int facility, int level,
>  		 */
>  		if (console_trylock())
>  			console_unlock();
> -		lockdep_on();
>  	}
>  
>  	return printed_len;
> @@ -2239,8 +2237,18 @@ EXPORT_SYMBOL(console_lock);
>   */
>  int console_trylock(void)
>  {
> -	if (down_trylock_console_sem())
> +	unsigned long flags;
> +	int lock_failed;
> +
> +	local_irq_save(flags);
> +	alt_printk_enter();
> +	lock_failed = down_trylock_console_sem();
> +	alt_printk_exit();
> +	local_irq_restore(flags);
> +
> +	if (lock_failed)
>  		return 0;
> +
>  	if (console_suspended) {
>  		up_console_sem();
>  		return 0;
> @@ -2395,7 +2403,9 @@ void console_unlock(void)
>  		size_t len;
>  		int level;
>  
> -		raw_spin_lock_irqsave(&logbuf_lock, flags);
> +		local_irq_save(flags);
> +		alt_printk_enter();
> +		raw_spin_lock(&logbuf_lock);
>  		if (seen_seq != log_next_seq) {
>  			wake_klogd = true;
>  			seen_seq = log_next_seq;
> @@ -2456,6 +2466,7 @@ void console_unlock(void)
>  		stop_critical_timings();	/* don't trace print latency */
>  		call_console_drivers(level, ext_text, ext_len, text, len);
>  		start_critical_timings();
> +		alt_printk_exit();
>  		local_irq_restore(flags);
>  
>  		if (do_cond_resched)
> @@ -2479,7 +2490,9 @@ void console_unlock(void)
>  	 */
>  	raw_spin_lock(&logbuf_lock);
>  	retry = console_seq != log_next_seq;
> -	raw_spin_unlock_irqrestore(&logbuf_lock, flags);
> +	raw_spin_unlock(&logbuf_lock);
> +	alt_printk_exit();
> +	local_irq_restore(flags);

We should mention that this patch makes an obsolete artefact from
printk_deferred(). It opens the door for another big cleanup and
relief.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC][PATCH 7/7] printk: new printk() recursion detection
  2016-09-27 14:22 ` [RFC][PATCH 7/7] printk: new printk() recursion detection Sergey Senozhatsky
@ 2016-09-29 13:19   ` Petr Mladek
  2016-09-30  2:00     ` Sergey Senozhatsky
  0 siblings, 1 reply; 35+ messages in thread
From: Petr Mladek @ 2016-09-29 13:19 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Jan Kara, Andrew Morton, Tejun Heo, Calvin Owens, linux-kernel,
	Sergey Senozhatsky

On Tue 2016-09-27 23:22:37, Sergey Senozhatsky wrote:
> We can't use alt_printk when we are flushing alt_printk buffers to logbuf,
> because otherwise we would just flush messages back to alt_printk buffers.
> At the same time, this opens a door for potential deadlock, should flushing
> endup in recursive printk() call:
> 
>   irq_work_run()
>     alt_printk_flush_line()
>       printk()
>         WARN_ON()
>         ^^^^^^^^^
>         depending on circumstances this can result in printk-deadlock.
> 
> To cope with it, we can use alt_printk context ->entry_count:
> - __alt_printk_flush() sets ->entry_count to 1
> - every printk() call from alt_printk_flush_foo() calls
>   alt_printk_enter() and alt_printk_exit()
> - alt_printk_enter() increments ->entry_count and tests the value:
>   - if it's `==2' then we are in normal alt_printk_flush()->printk() path
>   - if it's `>=3' then we are in recursion. The first time we detect
>     recursion (->entry_count == 3) we switch from default printk() to
>     alt_printk(), to avoid possible deadlocks.
> - in alt_printk_exit() we test if we were in normal printk() path or
>   in printk() recursion and rollback to default printk() if so;
>   and decrement ->entry_count.

I am sorry but I do not understand this much. printk() should set the
alternative implementation in the critical section by default.
Why do we need to handle this so specially?

Is it because of flushing in NMI context when panicing? I would call
vprintk_emit() directly from the flush_line() function in this case.
Then all other possible error printk's will get redirected to the
NMI buffer which is good enouh.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC][PATCH 0/7] printk: use alt_printk to handle printk() recursive calls
  2016-09-27 14:22 [RFC][PATCH 0/7] printk: use alt_printk to handle printk() recursive calls Sergey Senozhatsky
                   ` (6 preceding siblings ...)
  2016-09-27 14:22 ` [RFC][PATCH 7/7] printk: new printk() recursion detection Sergey Senozhatsky
@ 2016-09-29 13:25 ` Petr Mladek
  2016-09-30  2:43   ` Sergey Senozhatsky
  7 siblings, 1 reply; 35+ messages in thread
From: Petr Mladek @ 2016-09-29 13:25 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Jan Kara, Andrew Morton, Tejun Heo, Calvin Owens, linux-kernel,
	Sergey Senozhatsky

On Tue 2016-09-27 23:22:30, Sergey Senozhatsky wrote:
> 	Hello,
> 
> 	RFC
> 
> 	This patch set extends a lock-less NMI per-cpu buffers idea to
> handle recursive printk() calls. The basic mechanism is pretty much the
> same -- at the beginning of a deadlock-prone section we switch to lock-less
> printk callback, and return back to a default printk implementation at the
> end; the messages are getting flushed to a logbuf buffer from a safer
> context.

I was skeptical but I really like this way now.

The switching of the buffers is a bit hairy in this version but I
think that we could make it much better.

Other than that it looks like a big win. It kills a lot of
printk-related pain points. And it will not be that complicated
after all.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC][PATCH 3/7] printk: introduce per-cpu alt_print seq buffer
  2016-09-29 12:26   ` Petr Mladek
@ 2016-09-30  1:05     ` Sergey Senozhatsky
  2016-09-30 11:35       ` Petr Mladek
  0 siblings, 1 reply; 35+ messages in thread
From: Sergey Senozhatsky @ 2016-09-30  1:05 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Jan Kara, Andrew Morton, Tejun Heo,
	Calvin Owens, linux-kernel, Sergey Senozhatsky

On (09/29/16 14:26), Petr Mladek wrote:
[..]
> >  printk()
> >   local_irq_save()
> >   alt_printk_enter()
> 
> We need to make sure that exit() is called on the same CPU.
> Therefore we need to disable preemption as well.

local_irq_save() does this for us, we can't get sched tick or
re-sched IPI, and even more - we eliminate race conditions on
this CPU. only one path can touch alt_printk related stuff,
NMI works with its own buffer.

[..]
> What do you think about my approach with the printk_context per-CPU
> value from the WARN_DEFERRED() patchset? The main idea is that
> the entry()/exit() functions manipulate preempt_count-like per-CPU
> variable. The printk() function selects the safe implementation
> according to the current state.

I'll take a look.

hm, what I was thinking of... you are right, this all smells a bit
bad. I'll revisit it.

thanks!

	-ss

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC][PATCH 6/7] printk: use alternative printk buffers
  2016-09-29 13:00   ` Petr Mladek
@ 2016-09-30  1:15     ` Sergey Senozhatsky
  2016-09-30 11:15       ` Petr Mladek
  0 siblings, 1 reply; 35+ messages in thread
From: Sergey Senozhatsky @ 2016-09-30  1:15 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Jan Kara, Andrew Morton, Tejun Heo,
	Calvin Owens, linux-kernel, Sergey Senozhatsky

On (09/29/16 15:00), Petr Mladek wrote:
[..]
> > @@ -1791,7 +1791,7 @@ asmlinkage int vprintk_emit(int facility, int level,
> >  		zap_locks();
> >  	}
> >  
> > -	lockdep_off();
> > +	alt_printk_enter();
> 
> IMHO, we could not longer enter vprintk_emit() recursively. The same
> section that was guarded by logbuf_cpu is guarded by
> alt_printk_enter()/exit() now.

you might be very right here. I'll take a look.

> IMHO, we could remove all the logic around the recursion. Then we
> could even disable/enable irqs inside alt_printk_enter()/exit().

I was thinking of doing something like this; but that would require
storing 'unsigned long' flags in per-cpu data

	alt_enter()
	{
		unsinged long flags;

		local_irq_save(flags);
		ctx = this_cpu_ptr();
		ctx->flags = flags;
		...
	}

	alt_exit()
	{
		ctx = this_cpu_ptr();
		...
		local_irq_restore(ctx->flags);
	}


and the decision was to keep `unsigned long flags' on stack in the
alt_enter/exit caller. besides in most of the cases we already have
it (in vprintk_emit() and console_unlock()).

but I can certainly hide these details in alt_enter/exit.


> And to correct myself from the previous mail. It is enough to disable
> IRQs. It is enough to make sure that we will not preempt and will
> stay on the same CPU.

ah, no prob.

> > @@ -2479,7 +2490,9 @@ void console_unlock(void)
> >  	 */
> >  	raw_spin_lock(&logbuf_lock);
> >  	retry = console_seq != log_next_seq;
> > -	raw_spin_unlock_irqrestore(&logbuf_lock, flags);
> > +	raw_spin_unlock(&logbuf_lock);
> > +	alt_printk_exit();
> > +	local_irq_restore(flags);
> 
> We should mention that this patch makes an obsolete artefact from
> printk_deferred(). It opens the door for another big cleanup and
> relief.

do you mean that, once alt_printk is done properly, we can drop
printk_deferred()? I was thinking of it, but decided not to
mention/touch it in this patch set.

	-ss

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC][PATCH 7/7] printk: new printk() recursion detection
  2016-09-29 13:19   ` Petr Mladek
@ 2016-09-30  2:00     ` Sergey Senozhatsky
  0 siblings, 0 replies; 35+ messages in thread
From: Sergey Senozhatsky @ 2016-09-30  2:00 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Jan Kara, Andrew Morton, Tejun Heo,
	Calvin Owens, linux-kernel, Sergey Senozhatsky

On (09/29/16 15:19), Petr Mladek wrote:
> I am sorry but I do not understand this much. printk() should set the
> alternative implementation in the critical section by default.
> Why do we need to handle this so specially?
> 
> Is it because of flushing in NMI context when panicing? I would call
> vprintk_emit() directly from the flush_line() function in this case.
> Then all other possible error printk's will get redirected to the
> NMI buffer which is good enouh.

I'm going to re-do the entire thing. I had some cases in mind, like
WARN from vsnprintf from printk from alt_printk_flushing from panic.
or something like this. perhaps too complicated, will re-think it.

	-ss

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC][PATCH 0/7] printk: use alt_printk to handle printk() recursive calls
  2016-09-29 13:25 ` [RFC][PATCH 0/7] printk: use alt_printk to handle printk() recursive calls Petr Mladek
@ 2016-09-30  2:43   ` Sergey Senozhatsky
  2016-09-30 11:27     ` Petr Mladek
  0 siblings, 1 reply; 35+ messages in thread
From: Sergey Senozhatsky @ 2016-09-30  2:43 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Jan Kara, Andrew Morton, Tejun Heo,
	Calvin Owens, linux-kernel, Sergey Senozhatsky

On (09/29/16 15:25), Petr Mladek wrote:
> On Tue 2016-09-27 23:22:30, Sergey Senozhatsky wrote:
> > 	Hello,
> > 
> > 	RFC
> > 
> > 	This patch set extends a lock-less NMI per-cpu buffers idea to
> > handle recursive printk() calls. The basic mechanism is pretty much the
> > same -- at the beginning of a deadlock-prone section we switch to lock-less
> > printk callback, and return back to a default printk implementation at the
> > end; the messages are getting flushed to a logbuf buffer from a safer
> > context.
> 
> I was skeptical but I really like this way now.
>
> The switching of the buffers is a bit hairy in this version but I
> think that we could make it much better.
> 
> Other than that it looks like a big win. It kills a lot of
> printk-related pain points. And it will not be that complicated
> after all.

many thanks for looking at this train wreck.

so, like I said, it addresses printk()-recursion in *ideally* quite
a minimalistic way -- just several alt_printk_enter/exit calls in
printk.c, without ever touching any other parts of the kernel.

gunning down printk deadlocks in general, however, requires much more
effort; or even a completely different approach.

a) a lock-less printk() by default
   um, `#define printk alt_printk'. but this will break printk() from irq.
   and the ordering of messages from per-cpu buffers may be far from correct.

b) combining a DEFERRED_WARN + alt_printk
   DEFERRED_WARN potentially is a never ending thing. we can add some
   lockdep annotations, perhaps, and hope that error handling branches
   that may contain WARN_ONs/printk-s will be executed with prove_locking
   enabled on someone's machine.

c) ...

	-ss

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC][PATCH 6/7] printk: use alternative printk buffers
  2016-09-30  1:15     ` Sergey Senozhatsky
@ 2016-09-30 11:15       ` Petr Mladek
  2016-10-01  2:48         ` Sergey Senozhatsky
                           ` (2 more replies)
  0 siblings, 3 replies; 35+ messages in thread
From: Petr Mladek @ 2016-09-30 11:15 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Sergey Senozhatsky, Jan Kara, Andrew Morton, Tejun Heo,
	Calvin Owens, linux-kernel

On Fri 2016-09-30 10:15:44, Sergey Senozhatsky wrote:
> On (09/29/16 15:00), Petr Mladek wrote:
> [..]
> > > @@ -1791,7 +1791,7 @@ asmlinkage int vprintk_emit(int facility, int level,
> > >  		zap_locks();
> > >  	}
> > >  
> > > -	lockdep_off();
> > > +	alt_printk_enter();
> > 
> > IMHO, we could not longer enter vprintk_emit() recursively. The same
> > section that was guarded by logbuf_cpu is guarded by
> > alt_printk_enter()/exit() now.
> 
> you might be very right here. I'll take a look.
> 
> > IMHO, we could remove all the logic around the recursion. Then we
> > could even disable/enable irqs inside alt_printk_enter()/exit().
> 
> I was thinking of doing something like this; but that would require
> storing 'unsigned long' flags in per-cpu data
> 
> 	alt_enter()
> 	{
> 		unsinged long flags;
> 
> 		local_irq_save(flags);
> 		ctx = this_cpu_ptr();
> 		ctx->flags = flags;
> 		...
> 	}
> 
> 	alt_exit()
> 	{
> 		ctx = this_cpu_ptr();
> 		...
> 		local_irq_restore(ctx->flags);
> 	}
>
> and the decision was to keep `unsigned long flags' on stack in the
> alt_enter/exit caller. besides in most of the cases we already have
> it (in vprintk_emit() and console_unlock()).

I would pass the pointer to flags as alt_enter() parameter.
> 
> but I can certainly hide these details in alt_enter/exit.
>
> > > @@ -2479,7 +2490,9 @@ void console_unlock(void)
> > >  	 */
> > >  	raw_spin_lock(&logbuf_lock);
> > >  	retry = console_seq != log_next_seq;
> > > -	raw_spin_unlock_irqrestore(&logbuf_lock, flags);
> > > +	raw_spin_unlock(&logbuf_lock);
> > > +	alt_printk_exit();
> > > +	local_irq_restore(flags);
> > 
> > We should mention that this patch makes an obsolete artefact from
> > printk_deferred(). It opens the door for another big cleanup and
> > relief.
> 
> do you mean that, once alt_printk is done properly, we can drop
> printk_deferred()? I was thinking of it, but decided not to
> mention/touch it in this patch set.

My understanding is the following:

The difference between normal printk() and printk_deferred() is
that the other does not call console_trylock()/console_unlock().
It means that printk_deferred() can avoid recursion only from these
two calls.

printk_deferred() is used only in scheduler and timekeeping code.
Therefore it prevents only limited number of possible recursions
and deadlocks at the moment.

This patch guards most of the two calls a more generic way.
The redirected parts prevent recursion not only to into the
code guarded by console_sem but also into parts guarded
by lockbuf_lock.

By other words, this patch is supposed to handle a superset
of the deadlocks that are currently prevented by printk_deferred().
If this is true, we do not longer need printk_deferred().

The only question is if this patch guards enough parts of
console_try_lock()/console_unlock() to handle the superset
of the possible deadlocks.

I see that it does not guard two up_console_sem() calls
from console_unlock(). But this can be fixed in the next
version.

Or is there any other catch that I do not see at the moment?

In each case, getting rid of printk_deferred() could be
a fantastic selling point for this patchset.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC][PATCH 0/7] printk: use alt_printk to handle printk() recursive calls
  2016-09-30  2:43   ` Sergey Senozhatsky
@ 2016-09-30 11:27     ` Petr Mladek
  2016-10-01  3:02       ` Sergey Senozhatsky
  0 siblings, 1 reply; 35+ messages in thread
From: Petr Mladek @ 2016-09-30 11:27 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Sergey Senozhatsky, Jan Kara, Andrew Morton, Tejun Heo,
	Calvin Owens, linux-kernel

On Fri 2016-09-30 11:43:07, Sergey Senozhatsky wrote:
> On (09/29/16 15:25), Petr Mladek wrote:
> > On Tue 2016-09-27 23:22:30, Sergey Senozhatsky wrote:
> > > 	Hello,
> > > 
> > > 	RFC
> > > 
> > > 	This patch set extends a lock-less NMI per-cpu buffers idea to
> > > handle recursive printk() calls. The basic mechanism is pretty much the
> > > same -- at the beginning of a deadlock-prone section we switch to lock-less
> > > printk callback, and return back to a default printk implementation at the
> > > end; the messages are getting flushed to a logbuf buffer from a safer
> > > context.
> > 
> > I was skeptical but I really like this way now.
> >
> > The switching of the buffers is a bit hairy in this version but I
> > think that we could make it much better.
> > 
> > Other than that it looks like a big win. It kills a lot of
> > printk-related pain points. And it will not be that complicated
> > after all.
> 
> many thanks for looking at this train wreck.
> 
> so, like I said, it addresses printk()-recursion in *ideally* quite
> a minimalistic way -- just several alt_printk_enter/exit calls in
> printk.c, without ever touching any other parts of the kernel.
> 
> gunning down printk deadlocks in general, however, requires much more
> effort; or even a completely different approach.
> 
> a) a lock-less printk() by default
>    um, `#define printk alt_printk'. but this will break printk() from irq.
>    and the ordering of messages from per-cpu buffers may be far from correct.

Well, the current vprintk_nmi() is lockless. The alternative printk()
is going to use the same code, so it will be lockless as well. It
means that even this patchset is supposed to avoid all possible
deadlocks via printk() calls.

There is still a risk of an infinite recursion. But vprintk_nmi()
bails out early when the buffer is full. This should minimalize
the risk. In fact, the recursion would become rather theoretical
problem.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC][PATCH 3/7] printk: introduce per-cpu alt_print seq buffer
  2016-09-30  1:05     ` Sergey Senozhatsky
@ 2016-09-30 11:35       ` Petr Mladek
  0 siblings, 0 replies; 35+ messages in thread
From: Petr Mladek @ 2016-09-30 11:35 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Sergey Senozhatsky, Jan Kara, Andrew Morton, Tejun Heo,
	Calvin Owens, linux-kernel

On Fri 2016-09-30 10:05:28, Sergey Senozhatsky wrote:
> On (09/29/16 14:26), Petr Mladek wrote:
> [..]
> > >  printk()
> > >   local_irq_save()
> > >   alt_printk_enter()
> > 
> > We need to make sure that exit() is called on the same CPU.
> > Therefore we need to disable preemption as well.
> 
> local_irq_save() does this for us, we can't get sched tick or
> re-sched IPI, and even more - we eliminate race conditions on
> this CPU. only one path can touch alt_printk related stuff,
> NMI works with its own buffer.
> 
> [..]
> > What do you think about my approach with the printk_context per-CPU
> > value from the WARN_DEFERRED() patchset? The main idea is that
> > the entry()/exit() functions manipulate preempt_count-like per-CPU
> > variable. The printk() function selects the safe implementation
> > according to the current state.
> 
> I'll take a look.
> 
> I'll revisit it.

Just a side note. If you make it less generic then please use some more
meaning-full name for the alternative printk stuff. The following
comes to my mind:

	     printk_safe_enter();
	     printk_safe_exit();
	     vprintk_safe();

IMHO, the "_safe" suffix often means a variant that prevents a
possible deadlock.

I wonder where this patches would end but it looks promising.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC][PATCH 6/7] printk: use alternative printk buffers
  2016-09-30 11:15       ` Petr Mladek
@ 2016-10-01  2:48         ` Sergey Senozhatsky
  2016-10-04 12:22           ` Petr Mladek
  2016-10-03  7:53         ` Sergey Senozhatsky
  2016-10-04 14:52         ` Petr Mladek
  2 siblings, 1 reply; 35+ messages in thread
From: Sergey Senozhatsky @ 2016-10-01  2:48 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Jan Kara, Andrew Morton,
	Tejun Heo, Calvin Owens, linux-kernel

On (09/30/16 13:15), Petr Mladek wrote:
[..]
> > and the decision was to keep `unsigned long flags' on stack in the
> > alt_enter/exit caller. besides in most of the cases we already have
> > it (in vprintk_emit() and console_unlock()).
> 
> I would pass the pointer to flags as alt_enter() parameter.

can do. or _may be_ leave the IRQ manipulation as it is. because there
are different tricks. take a look at console_ulock() and so on.

> > do you mean that, once alt_printk is done properly, we can drop
> > printk_deferred()? I was thinking of it, but decided not to
> > mention/touch it in this patch set.
> 
> My understanding is the following:
> 
> The difference between normal printk() and printk_deferred() is
> that the other does not call console_trylock()/console_unlock().
> It means that printk_deferred() can avoid recursion only from these
> two calls.

yes.

> printk_deferred() is used only in scheduler and timekeeping code.
> Therefore it prevents only limited number of possible recursions
> and deadlocks at the moment.
> 
> This patch guards most of the two calls a more generic way.
> The redirected parts prevent recursion not only to into the
> code guarded by console_sem but also into parts guarded
> by lockbuf_lock.

yes. I'm considering to extend it to "non-recursive printk" cases
sometime in the future. it's easy to protect lockbuf_lock, but not
so easy to protect sleeping console_lock().

the cases I'm talking of are (for instance):

  devkmsg_open()
   raw_spin_lock_irq(&logbuf_lock)
    spin_dump()
     printk()
      raw_spin_lock_irq(&logbuf_lock)  << deadlock

entering to alt_printk mode each time we take the `logbuf_lock' sort
of makes sense. can be done later, don't want to overload this patch set.

addressing sleeping console_sem function is not super hard in general.
we just would have to put alt_printk_enter/exit into scheduler code, or
at least into semaphore code. which can be hard to sell. there are
gazillions of semaphores and we need to protect only one of them, but
have to do alt_printk_enter/exit for every.

> By other words, this patch is supposed to handle a superset
> of the deadlocks that are currently prevented by printk_deferred().
> If this is true, we do not longer need printk_deferred().

yes, I suppose so.
the only difference here is that printk_deferred() immediately puts the
message into logbuf (well, *if* it can lock the `logbuf_lock'), while
vprintk_alt() puts it into per-cpu buffer first and needs an irq work in
that CPU to flush it to logbuf. OTOH, vprintk_alt() does not depend on
`logbuf_lock'.

> The only question is if this patch guards enough parts of
> console_try_lock()/console_unlock() to handle the superset
> of the possible deadlocks.
> 
> I see that it does not guard two up_console_sem() calls
> from console_unlock(). But this can be fixed in the next
> version.
>
> Or is there any other catch that I do not see at the moment?

it's a bit tricky. we break from printing loop only with logbuf_lock
spin_lock locked, irqs disabled and in alt_printk mode. so everything
after the printing loop is still protected, up until

	raw_spin_lock(&logbuf_lock);
	retry = console_seq != log_next_seq;
	raw_spin_unlock(&logbuf_lock);
	alt_printk_exit();
	local_irq_restore(flags);

> In each case, getting rid of printk_deferred() could be
> a fantastic selling point for this patchset.

agree.

I also suspect that we can eliminate the recursion detection logic
in vprintk_emit().

	-ss

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC][PATCH 0/7] printk: use alt_printk to handle printk() recursive calls
  2016-09-30 11:27     ` Petr Mladek
@ 2016-10-01  3:02       ` Sergey Senozhatsky
  2016-10-04 11:35         ` Petr Mladek
  0 siblings, 1 reply; 35+ messages in thread
From: Sergey Senozhatsky @ 2016-10-01  3:02 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Jan Kara, Andrew Morton,
	Tejun Heo, Calvin Owens, linux-kernel

On (09/30/16 13:27), Petr Mladek wrote:
> > > > 	This patch set extends a lock-less NMI per-cpu buffers idea to
> > > > handle recursive printk() calls. The basic mechanism is pretty much the
> > > > same -- at the beginning of a deadlock-prone section we switch to lock-less
> > > > printk callback, and return back to a default printk implementation at the
> > > > end; the messages are getting flushed to a logbuf buffer from a safer
> > > > context.
> > > 
> > > I was skeptical but I really like this way now.
> > >
> > > The switching of the buffers is a bit hairy in this version but I
> > > think that we could make it much better.
> > > 
> > > Other than that it looks like a big win. It kills a lot of
> > > printk-related pain points. And it will not be that complicated
> > > after all.
> > 
> > many thanks for looking at this train wreck.
> > 
> > so, like I said, it addresses printk()-recursion in *ideally* quite
> > a minimalistic way -- just several alt_printk_enter/exit calls in
> > printk.c, without ever touching any other parts of the kernel.
> > 
> > gunning down printk deadlocks in general, however, requires much more
> > effort; or even a completely different approach.
> > 
> > a) a lock-less printk() by default
> >    um, `#define printk alt_printk'. but this will break printk() from irq.
> >    and the ordering of messages from per-cpu buffers may be far from correct.
> 
> Well, the current vprintk_nmi() is lockless. The alternative printk()
> is going to use the same code, so it will be lockless as well. It
> means that even this patchset is supposed to avoid all possible
> deadlocks via printk() calls.

I meant that printk-recursion and printk-deadlock can be different
scenarios. deadlocks are harder to handle

 devkmsg_open()
  raw_spin_lock_irq(&logbuf_lock)
   spin_dump()
    printk()
     raw_spin_lock_irqsave(&logbuf_lock)

this one can be handled by alt_printk.

 devkmsg_open()
  local_irq_save();
  alt_printk_enter()
  raw_spin_lock(&logbuf_lock)
   spin_dump()
    printk()
     vprintk_alt()

but there are some that can't be handled solely in printk.c

	-ss

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC][PATCH 6/7] printk: use alternative printk buffers
  2016-09-30 11:15       ` Petr Mladek
  2016-10-01  2:48         ` Sergey Senozhatsky
@ 2016-10-03  7:53         ` Sergey Senozhatsky
  2016-10-04 14:52         ` Petr Mladek
  2 siblings, 0 replies; 35+ messages in thread
From: Sergey Senozhatsky @ 2016-10-03  7:53 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Jan Kara, Andrew Morton,
	Tejun Heo, Calvin Owens, linux-kernel

On (09/30/16 13:15), Petr Mladek wrote:
[..]
> > and the decision was to keep `unsigned long flags' on stack in the
> > alt_enter/exit caller. besides in most of the cases we already have
> > it (in vprintk_emit() and console_unlock()).
> 
> I would pass the pointer to flags as alt_enter() parameter.

I think I'll do something like this

--

void __alt_printk_enter(void);
void __alt_printk_exit(void);

#define alt_printk_enter(flags)			\
	do {					\
		local_irq_save(flags);		\
		__alt_printk_enter();		\
	} while (0)

#define alt_printk_exit(flags) 			\
	do { 					\
		__alt_printk_exit();		\
		local_irq_restore(flags);	\
	} while (0)

--

seems that this mostly will fit our needs in printk().
for anything that does not fit -- we have __alt_printk_enter()/exit().


and I'll also extend alt_printk more, some additional paths.

I'm also thinking about your rename proposal, but didn't find
a perfect name yet.

	-ss

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC][PATCH 0/7] printk: use alt_printk to handle printk() recursive calls
  2016-10-01  3:02       ` Sergey Senozhatsky
@ 2016-10-04 11:35         ` Petr Mladek
  0 siblings, 0 replies; 35+ messages in thread
From: Petr Mladek @ 2016-10-04 11:35 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Sergey Senozhatsky, Jan Kara, Andrew Morton, Tejun Heo,
	Calvin Owens, linux-kernel

On Sat 2016-10-01 12:02:51, Sergey Senozhatsky wrote:
> On (09/30/16 13:27), Petr Mladek wrote:
> > > > > 	This patch set extends a lock-less NMI per-cpu buffers idea to
> > > > > handle recursive printk() calls. The basic mechanism is pretty much the
> > > > > same -- at the beginning of a deadlock-prone section we switch to lock-less
> > > > > printk callback, and return back to a default printk implementation at the
> > > > > end; the messages are getting flushed to a logbuf buffer from a safer
> > > > > context.
> > > > 
> > > > I was skeptical but I really like this way now.
> > > >
> > > > The switching of the buffers is a bit hairy in this version but I
> > > > think that we could make it much better.
> > > > 
> > > > Other than that it looks like a big win. It kills a lot of
> > > > printk-related pain points. And it will not be that complicated
> > > > after all.
> > > 
> > > many thanks for looking at this train wreck.
> > > 
> > > so, like I said, it addresses printk()-recursion in *ideally* quite
> > > a minimalistic way -- just several alt_printk_enter/exit calls in
> > > printk.c, without ever touching any other parts of the kernel.
> > > 
> > > gunning down printk deadlocks in general, however, requires much more
> > > effort; or even a completely different approach.
> > > 
> > > a) a lock-less printk() by default
> > >    um, `#define printk alt_printk'. but this will break printk() from irq.
> > >    and the ordering of messages from per-cpu buffers may be far from correct.
> > 
> > Well, the current vprintk_nmi() is lockless. The alternative printk()
> > is going to use the same code, so it will be lockless as well. It
> > means that even this patchset is supposed to avoid all possible
> > deadlocks via printk() calls.
> 
> I meant that printk-recursion and printk-deadlock can be different
> scenarios. deadlocks are harder to handle
> 
>  devkmsg_open()
>   raw_spin_lock_irq(&logbuf_lock)
>    spin_dump()
>     printk()
>      raw_spin_lock_irqsave(&logbuf_lock)
> 
> this one can be handled by alt_printk.
> 
>  devkmsg_open()
>   local_irq_save();
>   alt_printk_enter()
>   raw_spin_lock(&logbuf_lock)
>    spin_dump()
>     printk()
>      vprintk_alt()
> 
> but there are some that can't be handled solely in printk.c

Do you have an example of the still problematic code, please?
vprintk_alt() must be lockless because the same code is used
also in NMI context. If it takes a lock, it is a bug.
Therefore it should not cause a deadlock.

The only problem might be an infinite loop. But the loop should
break once the per-CPU buffer is full. We only need to make sure
that there is no printk() called before the check for the full
buffer. But this reduces the error-prone part of the code to
a minimum. Therefore it should be bearable.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC][PATCH 6/7] printk: use alternative printk buffers
  2016-10-01  2:48         ` Sergey Senozhatsky
@ 2016-10-04 12:22           ` Petr Mladek
  2016-10-05  1:36             ` Sergey Senozhatsky
  0 siblings, 1 reply; 35+ messages in thread
From: Petr Mladek @ 2016-10-04 12:22 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Sergey Senozhatsky, Jan Kara, Andrew Morton, Tejun Heo,
	Calvin Owens, linux-kernel

On Sat 2016-10-01 11:48:29, Sergey Senozhatsky wrote:
> On (09/30/16 13:15), Petr Mladek wrote:
> > > do you mean that, once alt_printk is done properly, we can drop
> > > printk_deferred()? I was thinking of it, but decided not to
> > > mention/touch it in this patch set.
> > 
> > My understanding is the following:
> > 
> > The difference between normal printk() and printk_deferred() is
> > that the other does not call console_trylock()/console_unlock().
> > It means that printk_deferred() can avoid recursion only from these
> > two calls.
> 
> yes.
> 
> > printk_deferred() is used only in scheduler and timekeeping code.
> > Therefore it prevents only limited number of possible recursions
> > and deadlocks at the moment.
> > 
> > This patch guards most of the two calls a more generic way.
> > The redirected parts prevent recursion not only to into the
> > code guarded by console_sem but also into parts guarded
> > by lockbuf_lock.
> 
> yes. I'm considering to extend it to "non-recursive printk" cases
> sometime in the future. it's easy to protect lockbuf_lock, but not
> so easy to protect sleeping console_lock().
> 
> the cases I'm talking of are (for instance):
> 
>   devkmsg_open()
>    raw_spin_lock_irq(&logbuf_lock)
>     spin_dump()
>      printk()
>       raw_spin_lock_irq(&logbuf_lock)  << deadlock

I have finally got it. You are right, there are still some other
possible deadlocks.

But these are not protected at the moment. Neither printk_deferred()
not logbuf_cpu prevent this. Therefore this is not a reason to keep
either printk_deferred() or logbuf_cpu.


> entering to alt_printk mode each time we take the `logbuf_lock' sort
> of makes sense. can be done later, don't want to overload this patch set.

I agree that it does not make sense to solve these other problems in
this patchset.


> addressing sleeping console_sem function is not super hard in general.
> we just would have to put alt_printk_enter/exit into scheduler code, or
> at least into semaphore code. which can be hard to sell. there are
> gazillions of semaphores and we need to protect only one of them, but
> have to do alt_printk_enter/exit for every.

Yeah, so a generic solution for console_sem might be rather hard.
I would not complicate this patchset with it.


> > By other words, this patch is supposed to handle a superset
> > of the deadlocks that are currently prevented by printk_deferred().
> > If this is true, we do not longer need printk_deferred().
> 
> yes, I suppose so.
> the only difference here is that printk_deferred() immediately puts the
> message into logbuf (well, *if* it can lock the `logbuf_lock'), while
> vprintk_alt() puts it into per-cpu buffer first and needs an irq work in
> that CPU to flush it to logbuf. OTOH, vprintk_alt() does not depend on
> `logbuf_lock'.

On the other hand, the per-CPU buffer will include only error
messages from the printk() code. This is why I am fine with
the extra buffer. The only way is to use printk_deferred()
these days. And as you said, this was error prone and hard
to maintain.


> > The only question is if this patch guards enough parts of
> > console_try_lock()/console_unlock() to handle the superset
> > of the possible deadlocks.
> > 
> > I see that it does not guard two up_console_sem() calls
> > from console_unlock(). But this can be fixed in the next
> > version.
> >
> > Or is there any other catch that I do not see at the moment?
> 
> it's a bit tricky. we break from printing loop only with logbuf_lock
> spin_lock locked, irqs disabled and in alt_printk mode. so everything
> after the printing loop is still protected, up until
> 
> 	raw_spin_lock(&logbuf_lock);
> 	retry = console_seq != log_next_seq;
> 	raw_spin_unlock(&logbuf_lock);
> 	alt_printk_exit();
> 	local_irq_restore(flags);

Yes, but we are safe to call normal printk() at this point.
lockbuf_lock is released => no danger of a deadlock.
console_sem is taken but this is not a problem. printk()
will do console_trylock() protected by the alt_printk_enter()/exit().
Therefore the trylock will fail but it could not cause a deadlock.

We just need to replace

	if (retry && console_trylock())
		goto again;

with a safe variant, something like

	if (retry) {
		local_irq_save(flags);
		alt_printk_enter();
		lock_failed = console_trylock();
		alt_printk_exit();
		local_irq_restore(flags);

		if (!lock_failed)
			goto again;
	}

Or do I miss anything?

> > In each case, getting rid of printk_deferred() could be
> > a fantastic selling point for this patchset.
> 
> agree.

I believe that it is doable and worth try.

I am going to look at the second version of the patchset.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC][PATCH 6/7] printk: use alternative printk buffers
  2016-09-30 11:15       ` Petr Mladek
  2016-10-01  2:48         ` Sergey Senozhatsky
  2016-10-03  7:53         ` Sergey Senozhatsky
@ 2016-10-04 14:52         ` Petr Mladek
  2016-10-05  1:27           ` Sergey Senozhatsky
  2 siblings, 1 reply; 35+ messages in thread
From: Petr Mladek @ 2016-10-04 14:52 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Sergey Senozhatsky, Jan Kara, Andrew Morton, Tejun Heo,
	Calvin Owens, linux-kernel

On Fri 2016-09-30 13:15:46, Petr Mladek wrote:
> On Fri 2016-09-30 10:15:44, Sergey Senozhatsky wrote:
> > On (09/29/16 15:00), Petr Mladek wrote:
> > [..]
> > > > @@ -1791,7 +1791,7 @@ asmlinkage int vprintk_emit(int facility, int level,
> > > >  		zap_locks();
> > > >  	}
> > > >  
> > > > -	lockdep_off();
> > > > +	alt_printk_enter();
> > > 
> > > IMHO, we could not longer enter vprintk_emit() recursively. The same
> > > section that was guarded by logbuf_cpu is guarded by
> > > alt_printk_enter()/exit() now.
> > 
> > you might be very right here. I'll take a look.
> > 
> > > IMHO, we could remove all the logic around the recursion. Then we
> > > could even disable/enable irqs inside alt_printk_enter()/exit().
> > 
> > I was thinking of doing something like this; but that would require
> > storing 'unsigned long' flags in per-cpu data
> > 
> > 	alt_enter()
> > 	{
> > 		unsinged long flags;
> > 
> > 		local_irq_save(flags);
> > 		ctx = this_cpu_ptr();
> > 		ctx->flags = flags;
> > 		...
> > 	}
> > 
> > 	alt_exit()
> > 	{
> > 		ctx = this_cpu_ptr();
> > 		...
> > 		local_irq_restore(ctx->flags);
> > 	}
> >
> > and the decision was to keep `unsigned long flags' on stack in the
> > alt_enter/exit caller. besides in most of the cases we already have
> > it (in vprintk_emit() and console_unlock()).
> 
> I would pass the pointer to flags as alt_enter() parameter.
> > 
> > but I can certainly hide these details in alt_enter/exit.
> >
> > > > @@ -2479,7 +2490,9 @@ void console_unlock(void)
> > > >  	 */
> > > >  	raw_spin_lock(&logbuf_lock);
> > > >  	retry = console_seq != log_next_seq;
> > > > -	raw_spin_unlock_irqrestore(&logbuf_lock, flags);
> > > > +	raw_spin_unlock(&logbuf_lock);
> > > > +	alt_printk_exit();
> > > > +	local_irq_restore(flags);
> > > 
> > > We should mention that this patch makes an obsolete artefact from
> > > printk_deferred(). It opens the door for another big cleanup and
> > > relief.
> > 
> > do you mean that, once alt_printk is done properly, we can drop
> > printk_deferred()? I was thinking of it, but decided not to
> > mention/touch it in this patch set.
> 
> My understanding is the following:
> 
> The difference between normal printk() and printk_deferred() is
> that the other does not call console_trylock()/console_unlock().
> It means that printk_deferred() can avoid recursion only from these
> two calls.
> 
> printk_deferred() is used only in scheduler and timekeeping code.
> Therefore it prevents only limited number of possible recursions
> and deadlocks at the moment.
> 
> This patch guards most of the two calls a more generic way.
> The redirected parts prevent recursion not only to into the
> code guarded by console_sem but also into parts guarded
> by lockbuf_lock.
> 
> By other words, this patch is supposed to handle a superset
> of the deadlocks that are currently prevented by printk_deferred().
> If this is true, we do not longer need printk_deferred().
> 
> The only question is if this patch guards enough parts of
> console_try_lock()/console_unlock() to handle the superset
> of the possible deadlocks.
> 
> I see that it does not guard two up_console_sem() calls
> from console_unlock(). But this can be fixed in the next
> version.
> 
> Or is there any other catch that I do not see at the moment?

And there is :-( The above logic looked at the problem only from
one side. It was about errors starting from the printk()
code itself, for example:

 vprintk_emit()
  console_unlock()
   up()                        << raw_spin_lock_irqsave(&sem->lock, flags);
    wake_up_process()
     try_to_wake_up()
      ttwu_queue()
       ttwu_activate()
        activate_task()
         enqueue_task()
          enqueue_task_fair()
           cfs_rq_of()
            task_of()
             WARN_ON_ONCE(!entity_is_task(se))
              vprintk_emit()
               console_trylock()
                down_trylock()
                 raw_spin_lock_irqsave(&sem->lock, flags)
                 ^^^^ deadlock


But it does no solve errors starting in the scheduler or
timekeeping code. I mean:

   any_function_waking_process()
    wake_up_process()
     try_to_wake_up()  // takes &p->pi_lock
      ttwu_queue()
       ttwu_activate()
        activate_task()
         enqueue_task()
          enqueue_task_fair()
           cfs_rq_of()
            task_of()
             WARN_ON_ONCE(!entity_is_task(se))
              vprintk_emit()
               console_trylock()   // success
	       console_unlock()
	         up_console_sem()
		  up()
		    __up()
		     wake_up_process()
		      try_to_wake_up()
		       raw_spin_lock_irqsave(&p->pi_lock, flags);
		       ^^^^ deadlock

The only thing that might help here is to call
alt_printk_enter()/exit() in wake_up_process() itself. Otherwise,
we still would need to keep the printk_deferred() stuff.

By other words, we might need to put alt_printk_enter()/exit()
into the scheduler and timekeeping code. In theory it might
be easier to maintain than the separated printk_deferred() calls.
But there might be some catches because we need to disable
the interrupts, ...

Sigh, this 2nd scenario is much more likely than the 1st one.
I guess that warnings in the scheduler/timekeeping code
will be triggered outside printk() most of the time.

It means that this approach might be much harder to sell
after all :-(

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC][PATCH 6/7] printk: use alternative printk buffers
  2016-10-04 14:52         ` Petr Mladek
@ 2016-10-05  1:27           ` Sergey Senozhatsky
  2016-10-05  9:50             ` Petr Mladek
  0 siblings, 1 reply; 35+ messages in thread
From: Sergey Senozhatsky @ 2016-10-05  1:27 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Jan Kara, Andrew Morton,
	Tejun Heo, Calvin Owens, linux-kernel

On (10/04/16 16:52), Petr Mladek wrote:
> > 
> > Or is there any other catch that I do not see at the moment?
> 
> And there is :-( The above logic looked at the problem only from
> one side. It was about errors starting from the printk()
> code itself, for example:

yes, like I said - printk recursion and printk deadlock are different
things. and recursion cases are a subset of deadlock cases.

> The only thing that might help here is to call
> alt_printk_enter()/exit() in wake_up_process() itself. Otherwise,
> we still would need to keep the printk_deferred() stuff.

yes.
or
- combine alt_printk and DEFERRED_WARN/etc.
or
- rewrite printk() to be lock-less by default (for all invocations).

> By other words, we might need to put alt_printk_enter()/exit()
> into the scheduler and timekeeping code. In theory it might
> be easier to maintain than the separated printk_deferred() calls.
> But there might be some catches because we need to disable
> the interrupts, ...

right. and I have some doubts that people will be willing to put
alt_printk_enter/exit into those hot paths.


> Sigh, this 2nd scenario is much more likely than the 1st one.
> I guess that warnings in the scheduler/timekeeping code
> will be triggered outside printk() most of the time.

hm. may be. but the reports we received so far starts from printk()
and end up in printk() - IOW, recursion.

> It means that this approach might be much harder to sell
> after all :-(

well, it solves a number of problems that the existing implementation
cannot handle.

would have been nice to cover all of the cases, but that's a bit hard.

	-ss

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC][PATCH 6/7] printk: use alternative printk buffers
  2016-10-04 12:22           ` Petr Mladek
@ 2016-10-05  1:36             ` Sergey Senozhatsky
  2016-10-05 10:18               ` Petr Mladek
  0 siblings, 1 reply; 35+ messages in thread
From: Sergey Senozhatsky @ 2016-10-05  1:36 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Jan Kara, Andrew Morton,
	Tejun Heo, Calvin Owens, linux-kernel

On (10/04/16 14:22), Petr Mladek wrote:
[..]
> 	if (retry && console_trylock())
> 		goto again;
> 
> with a safe variant, something like
> 
> 	if (retry) {
> 		local_irq_save(flags);
> 		alt_printk_enter();
> 		lock_failed = console_trylock();
> 		alt_printk_exit();
> 		local_irq_restore(flags);
> 
> 		if (!lock_failed)
> 			goto again;
> 	}
> 
> Or do I miss anything?

nope, you don't. that's close to what I do in v3.

> I am going to look at the second version of the patchset.

thanks a lot for your review!

I'll refresh the patch set a bit later this week. I think it's more
or less in shape now.... well, still under the old name: alt_printk.

	-ss

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC][PATCH 6/7] printk: use alternative printk buffers
  2016-10-05  1:27           ` Sergey Senozhatsky
@ 2016-10-05  9:50             ` Petr Mladek
  2016-10-06  4:22               ` Sergey Senozhatsky
  0 siblings, 1 reply; 35+ messages in thread
From: Petr Mladek @ 2016-10-05  9:50 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Sergey Senozhatsky, Jan Kara, Andrew Morton, Tejun Heo,
	Calvin Owens, linux-kernel

On Wed 2016-10-05 10:27:14, Sergey Senozhatsky wrote:
> On (10/04/16 16:52), Petr Mladek wrote:
> > > 
> > > Or is there any other catch that I do not see at the moment?
> > 
> > And there is :-( The above logic looked at the problem only from
> > one side. It was about errors starting from the printk()
> > code itself, for example:
> 
> yes, like I said - printk recursion and printk deadlock are different
> things. and recursion cases are a subset of deadlock cases.

I see.

> > The only thing that might help here is to call
> > alt_printk_enter()/exit() in wake_up_process() itself. Otherwise,
> > we still would need to keep the printk_deferred() stuff.
> 
> yes.
> or
> - combine alt_printk and DEFERRED_WARN/etc.

The question is if alt_printk brings any win after all, see below.


> or
> - rewrite printk() to be lock-less by default (for all invocations).

We already have it and it is called trace_printk(). But it is very
tricky and have some limitations. For example, it does not support
random parallel readers. Also printing the trace log is noticeably
slow.


> > By other words, we might need to put alt_printk_enter()/exit()
> > into the scheduler and timekeeping code. In theory it might
> > be easier to maintain than the separated printk_deferred() calls.
> > But there might be some catches because we need to disable
> > the interrupts, ...
> 
> right. and I have some doubts that people will be willing to put
> alt_printk_enter/exit into those hot paths.

I have the same doubts.


> > Sigh, this 2nd scenario is much more likely than the 1st one.
> > I guess that warnings in the scheduler/timekeeping code
> > will be triggered outside printk() most of the time.
> 
> hm. may be. but the reports we received so far starts from printk()
> and end up in printk() - IOW, recursion.

My statement might have been too strong. Well, my thinking was
the following:

I am not aware of any real life bug reports caused by recursion
inside locbuf_lock garded section. I guess that it is because
the most sensitive one is guarded by that logbuf_cpu check.

I am not aware of any generic recursions inside the console code.
In fact, it is not easily possible because we console_trylock().
It means that we do not call console if it is already being handled.

We are basically down to the recursion/deadlock caused by the
wake_up_process() call. And there are much more such calls outside
printk().

In fact, I am aware only about one report. It was related to the
async printk patchset, the added wake_up_process(), and used
RT scheduler. This one started from printk() almost by definition.


> > It means that this approach might be much harder to sell
> > after all :-(
> 
> well, it solves a number of problems that the existing implementation
> cannot handle.

Please, provide a summary. I wonder if these are real life problems.

Note that we need to put aside all problems that are solvable
with printk_deferred(). It seems that printk_deferred() will
need to stay because it avoids the deadlock caused by
scheduler/timekeeping code locks. By other words, if there
is a missing printk_deferred() we need to put it there
anyway because the same code might get first called
outside printk().

Or do I miss something?

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC][PATCH 6/7] printk: use alternative printk buffers
  2016-10-05  1:36             ` Sergey Senozhatsky
@ 2016-10-05 10:18               ` Petr Mladek
  0 siblings, 0 replies; 35+ messages in thread
From: Petr Mladek @ 2016-10-05 10:18 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Sergey Senozhatsky, Jan Kara, Andrew Morton, Tejun Heo,
	Calvin Owens, linux-kernel

On Wed 2016-10-05 10:36:57, Sergey Senozhatsky wrote:
> On (10/04/16 14:22), Petr Mladek wrote:
> [..]
> > 	if (retry && console_trylock())
> > 		goto again;
> > 
> > with a safe variant, something like
> > 
> > 	if (retry) {
> > 		local_irq_save(flags);
> > 		alt_printk_enter();
> > 		lock_failed = console_trylock();
> > 		alt_printk_exit();
> > 		local_irq_restore(flags);
> > 
> > 		if (!lock_failed)
> > 			goto again;
> > 	}
> > 
> > Or do I miss anything?
> 
> nope, you don't. that's close to what I do in v3.
> 
> > I am going to look at the second version of the patchset.
> 
> thanks a lot for your review!
> 
> I'll refresh the patch set a bit later this week. I think it's more
> or less in shape now.... well, still under the old name: alt_printk.

Note that I have stopped reviewing v2 after realizing that it
did not help to get rid of printk_deferred().

I would like to revisit the usefulness of this approach first.
The motivation at
https://lkml.kernel.org/r/20160928011845.GA753@swordfish
does not longer fly. We will still need to maintain all
the printk_deferred()/WARN_*DEFERRED calls.

Also let me to reply to the mail
https://lkml.kernel.org/r/20160930004832.GA547@swordfish
The examples do not look real to me.

I am not sure what is the real group of fixed problems
at the moment.

I am sad when writing this. I was really optimistic
about this patchset.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC][PATCH 6/7] printk: use alternative printk buffers
  2016-10-05  9:50             ` Petr Mladek
@ 2016-10-06  4:22               ` Sergey Senozhatsky
  2016-10-06 11:32                 ` Petr Mladek
  0 siblings, 1 reply; 35+ messages in thread
From: Sergey Senozhatsky @ 2016-10-06  4:22 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Jan Kara, Andrew Morton,
	Tejun Heo, Calvin Owens, linux-kernel

On (10/05/16 11:50), Petr Mladek wrote:
[..]
> > well, it solves a number of problems that the existing implementation
> > cannot handle.
> 
> Please, provide a summary. I wonder if these are real life problems.

anything that starts from printk().
I'm trying to address printk() recursion only, by handling the top-most
recursion - printk -> printk, which causes all of the dependent recursions:
	printk -> spin_lock(logbuf) -> printk() -> spin_lock(logbuf)
	printk -> sem->lock() -> printk() -> sem->lock()
and so on.

I'm not building a lock dependency graph/etc.


the existing recursion detection logic is racy and quite limited in
scope: not only it protects only a tiny bit of printk, but it doesn't
even protect it fully. the very moment we do `logbuf_cpu = UINT_MAX;'
it's over - we still own the lock, but we don't remember it anymore.
a spin_dump() from raw_spin_unlock(&logbuf_lock) will kill us.


** a very quick list **
// a mixed list of theoretical and real problems can also be found
// in the patch set cover letter.


public reports:

1) some pathces/reports from Byungchul Park

the first one in my mailbox (seems to be)
https://marc.info/?l=linux-kernel&m=145406977822244
20160129121545.GH31266@X58A-UD3R

or here
https://marc.info/?l=linux-kernel&m=145570036513749

in his case, I believe, the lock was fine (not corrupted). it just he had
a spin_lock lockup on console semaphore spin_lock coming from:

static void __spin_lock_debug(raw_spinlock_t *lock)
{
	u64 i;
	u64 loops = loops_per_jiffy * HZ;

	for (i = 0; i < loops; i++) {
		if (arch_spin_trylock(&lock->raw_lock))
			return;
		__delay(1);
	}
	/* lockup suspected: */
	spin_dump(lock, "lockup suspected");


which ended up in an infinite recursion.

https://marc.info/?l=linux-kernel&m=145449014308161



2) a report from Viresh Kumar.

frankly, we were lucky to have Viresh on this: a less experienced developer
would probably give up. So would probably do a developer with no appropriate
hardware: jtag debugger/serial console/etc. and I don't know how much would
we spend on meditations to figure out it was a WARN from timekeeping. we
better be 'more prepared'.


=======
reports (unique occurrences only) that I have in internal bugzilla

4) sleeping function called from inside logbuf lock
   which resulted in spin_dump() call from spin_unlock(&logbuf_lock)
   when:
	a) we still owned the logbuf_lock
	b) yet logbuf_cpu was already reset, so printk recursion
	detection was helpless

   'a + b' leave us no chances to survive.

5) ARM specific
   an imprecise abort (http://infocenter.arm.com/help/topic/com.arm.doc.faqs/14809.html)
   hit the CPU while it was holding the printk-related spin-lock.
   that deadlocked the system, because abort handler attempted to
   printk() a message.


6) logbuf_lock corruption
   well, no cookies for us. un-fixable at the moment. we can probably
   do something about it. have a spinlock-debug bool function that would
   tell us whether the lock is corrupted, so we can re-init logbuf_lock,
   perhaps.


> Note that we need to put aside all problems that are solvable
> with printk_deferred(). It seems that printk_deferred() will
> need to stay because it avoids the deadlock caused by
> scheduler/timekeeping code locks.

agree. printk_deferred() takes only one lock and avoids console_unlock()
loop. as long as logbuf_lock is not on it's way printk_deferred() may be
helpful.

> By other words, if there is a missing printk_deferred() we
> need to put it there anyway because the same code might get
> first called outside printk().

right. and I'm not addressing this. there are just too many locks
that can be acquired out of order. not only timekeeping and sched
locks, but any of serial console locks adn so on. we need something
like lockdep locks graph here that would not report the issues (any
printk() can result in deadlock when we detect that at least one of
printk related locks was acquired out of order), but instead would
somehow selectively fix/workaround them.


can we somehow transparently for the rest of the system (just in
printk()) detect that we are in a potentially risky situation? hmm,
I don't know...

something *very* radical?

	vprintk_func()
	{
		if (this_cpu_read(alt_printk_ctx) & ALT_PRINTK_NMI_CONTEXT_MASK)
		return vprintk_nmi(fmt, args);

		if (in_atomic() ||
		/* ^^^^^^^^^^^^ */
			this_cpu_read(alt_printk_ctx) & ALT_PRINTK_CONTEXT_MASK)
			return vprintk_alt(fmt, args);

		return vprintk_default(fmt, args);
	}

	-ss

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC][PATCH 6/7] printk: use alternative printk buffers
  2016-10-06  4:22               ` Sergey Senozhatsky
@ 2016-10-06 11:32                 ` Petr Mladek
  2016-10-10  4:09                   ` Sergey Senozhatsky
  0 siblings, 1 reply; 35+ messages in thread
From: Petr Mladek @ 2016-10-06 11:32 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Sergey Senozhatsky, Jan Kara, Andrew Morton, Tejun Heo,
	Calvin Owens, linux-kernel

On Thu 2016-10-06 13:22:48, Sergey Senozhatsky wrote:
> On (10/05/16 11:50), Petr Mladek wrote:
> [..]
> > > well, it solves a number of problems that the existing implementation
> > > cannot handle.
> > 
> > Please, provide a summary. I wonder if these are real life problems.
> 
> 1) some pathces/reports from Byungchul Park
> 2) a report from Viresh Kumar.
> 4) sleeping function called from inside logbuf lock
> 5) ARM specific
> 6) logbuf_lock corruption

It is great that you have such a list in hands. It might help
to push this solution.

I actually have one more reason for this approach:

It seems that we will need to keep printk_deferred()/WARN_*DEFERRED().
We do not know about a better solution for the deadlocks caused
by scheduler/timekeeping/console_drivers locks.

The pain is that the list of affected locations is hard to maintain.
It would definitely help if such problems are reported by lockdep
in advance. But lockdep is disabled because it creates the deadlock
on its own.

The alternative printk() allows to enable lockdep and effectively
hunt other possible bugs.

Also the printk context per-CPU variable perfectly fits the
lockdep approach. It will allow to monitor in which printk context
all the other locks are taken and detect possible problems.


> can we somehow transparently for the rest of the system (just in
> printk()) detect that we are in a potentially risky situation? hmm,
> I don't know...
> 
> something *very* radical?
> 
> 	vprintk_func()
> 	{
> 		if (this_cpu_read(alt_printk_ctx) & ALT_PRINTK_NMI_CONTEXT_MASK)
> 		return vprintk_nmi(fmt, args);
> 
> 		if (in_atomic() ||
> 		/* ^^^^^^^^^^^^ */
> 			this_cpu_read(alt_printk_ctx) & ALT_PRINTK_CONTEXT_MASK)
> 			return vprintk_alt(fmt, args);
> 
> 		return vprintk_default(fmt, args);
> 	}

This would affect too many messages. If the error is too serious,
there is a risk that the messages from the alternative per-CPU
buffers will not appear in the main log buffer and the console.
IMHO, this is acceptable for printk-related errors. But people
would complain if other bugs are harder to debug because
the error messages were hidden.

I am going to continue reviewing v2 of the patch set.

BTW: I would like to ask you to slow down a bit. More versions
of such a non-trivial patchset, that are sent within few days,
are far too much. I have some other tasks that I need to work on.
Also I would like to hear opinion from other people. Note that
many people are busy with the merge window at the moment.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC][PATCH 6/7] printk: use alternative printk buffers
  2016-10-06 11:32                 ` Petr Mladek
@ 2016-10-10  4:09                   ` Sergey Senozhatsky
  2016-10-10 11:17                     ` Petr Mladek
  0 siblings, 1 reply; 35+ messages in thread
From: Sergey Senozhatsky @ 2016-10-10  4:09 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Jan Kara, Andrew Morton,
	Tejun Heo, Calvin Owens, linux-kernel

On (10/06/16 13:32), Petr Mladek wrote:
> On Thu 2016-10-06 13:22:48, Sergey Senozhatsky wrote:
> > On (10/05/16 11:50), Petr Mladek wrote:
> > [..]
> > > > well, it solves a number of problems that the existing implementation
> > > > cannot handle.
> > > 
> > > Please, provide a summary. I wonder if these are real life problems.
> > 
> > 1) some pathces/reports from Byungchul Park
> > 2) a report from Viresh Kumar.
> > 4) sleeping function called from inside logbuf lock
> > 5) ARM specific
> > 6) logbuf_lock corruption
> 
> It is great that you have such a list in hands. It might help
> to push this solution.
> 
> I actually have one more reason for this approach:
> 
> It seems that we will need to keep printk_deferred()/WARN_*DEFERRED().
> We do not know about a better solution for the deadlocks caused
> by scheduler/timekeeping/console_drivers locks.

yes, seems so.

> The pain is that the list of affected locations is hard to maintain.
> It would definitely help if such problems are reported by lockdep
> in advance. But lockdep is disabled because it creates the deadlock
> on its own.

right. another issue is that those potentially recursive printk/WARN_ON
calls may be coming from error-handling branches, not all of which are
easily reachable for automated solutions. so in order to find out there
is a problem we must hit it [in some cases].

it may look that lockdep *probably* can report the issues via 'safe' printk,
but that's a notably huge behavior breakage -- if lockdep report comes from
an about-to-deadlock irq handler, then we won't see anything from that CPU
unless there is a panic/nmi panic.

so it probably has to be semi-automatic/semi-manual:
- add might_printk() that would acquire/release console sem; or
  logbuf_lock (which is probably even better)
- find all functions that do printk/WARN in kernel/time and kernel/sched
- add might_printk() to those functions (just like might_sleep())
- run the kernel
- ...
- profit

#ifdef CONFIG_VALIDATE_PRINTK_CALLS

#define might_printk()							\
	do {								\
		if (!printk_in_safe_mode()) {				\
			unsigned long flags;				\
									\
			printk_safe_enter(flags);			\
			mutex_acquire(&console_lock_dep_map...);	\
			mutex_release(&console_lock_dep_map...);	\
			/*						\
			 * or printk_deferred("");			\
			 */						\
			printk_safe_exit(flags);			\
		}							\
	} while (0)

#else

#define might_printk()

#endif


may be that will make it easier. need to think more.


> BTW: I would like to ask you to slow down a bit. More versions
> of such a non-trivial patchset, that are sent within few days,
> are far too much. I have some other tasks that I need to work on.
> Also I would like to hear opinion from other people. Note that
> many people are busy with the merge window at the moment.

sure, sorry about that! wasn't really happy with that as well.
and thanks for your help.

	-ss

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC][PATCH 6/7] printk: use alternative printk buffers
  2016-10-10  4:09                   ` Sergey Senozhatsky
@ 2016-10-10 11:17                     ` Petr Mladek
  2016-10-11  7:35                       ` Sergey Senozhatsky
  0 siblings, 1 reply; 35+ messages in thread
From: Petr Mladek @ 2016-10-10 11:17 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Sergey Senozhatsky, Jan Kara, Andrew Morton, Tejun Heo,
	Calvin Owens, linux-kernel

On Mon 2016-10-10 13:09:57, Sergey Senozhatsky wrote:
> On (10/06/16 13:32), Petr Mladek wrote:
> > On Thu 2016-10-06 13:22:48, Sergey Senozhatsky wrote:
> > > On (10/05/16 11:50), Petr Mladek wrote:
> > > [..]
> > > > > well, it solves a number of problems that the existing implementation
> > > > > cannot handle.
> > > > 
> > > > Please, provide a summary. I wonder if these are real life problems.
> > > 
> > > 1) some pathces/reports from Byungchul Park
> > > 2) a report from Viresh Kumar.
> > > 4) sleeping function called from inside logbuf lock
> > > 5) ARM specific
> > > 6) logbuf_lock corruption
> > 
> > It is great that you have such a list in hands. It might help
> > to push this solution.
> > 
> > I actually have one more reason for this approach:
> > 
> > It seems that we will need to keep printk_deferred()/WARN_*DEFERRED().
> > We do not know about a better solution for the deadlocks caused
> > by scheduler/timekeeping/console_drivers locks.
> 
> yes, seems so.
> 
> > The pain is that the list of affected locations is hard to maintain.
> > It would definitely help if such problems are reported by lockdep
> > in advance. But lockdep is disabled because it creates the deadlock
> > on its own.
> 
> right. another issue is that those potentially recursive printk/WARN_ON
> calls may be coming from error-handling branches, not all of which are
> easily reachable for automated solutions. so in order to find out there
> is a problem we must hit it [in some cases].

yes

> it may look that lockdep *probably* can report the issues via 'safe' printk,
> but that's a notably huge behavior breakage -- if lockdep report comes from
> an about-to-deadlock irq handler, then we won't see anything from that CPU
> unless there is a panic/nmi panic.
> 
> so it probably has to be semi-automatic/semi-manual:
> - add might_printk() that would acquire/release console sem; or
>   logbuf_lock (which is probably even better)
> - find all functions that do printk/WARN in kernel/time and kernel/sched
> - add might_printk() to those functions (just like might_sleep())
> - run the kernel
> - ...
> - profit

I like the idea with might_printk(). I hope that it will be acceptable
for the scheduler/timekeeping people.

JFYI, I could work on the printk-context handling in lockdep.
I am just working on a lockdep support in NMI and am getting
kind of familiar with that code.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC][PATCH 6/7] printk: use alternative printk buffers
  2016-10-10 11:17                     ` Petr Mladek
@ 2016-10-11  7:35                       ` Sergey Senozhatsky
  2016-10-11  9:30                         ` Petr Mladek
  0 siblings, 1 reply; 35+ messages in thread
From: Sergey Senozhatsky @ 2016-10-11  7:35 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Jan Kara, Andrew Morton,
	Tejun Heo, Calvin Owens, linux-kernel

On (10/10/16 13:17), Petr Mladek wrote:
[..]
> > it may look that lockdep *probably* can report the issues via 'safe' printk,
> > but that's a notably huge behavior breakage -- if lockdep report comes from
> > an about-to-deadlock irq handler, then we won't see anything from that CPU
> > unless there is a panic/nmi panic.
> > 
> > so it probably has to be semi-automatic/semi-manual:
> > - add might_printk() that would acquire/release console sem; or
> >   logbuf_lock (which is probably even better)
> > - find all functions that do printk/WARN in kernel/time and kernel/sched
> > - add might_printk() to those functions (just like might_sleep())
> > - run the kernel
> > - ...
> > - profit
> 
> I like the idea with might_printk(). I hope that it will be acceptable
> for the scheduler/timekeeping people.
> 
> JFYI, I could work on the printk-context handling in lockdep.
> I am just working on a lockdep support in NMI and am getting
> kind of familiar with that code.

sorry, what do you mean by 'printk-context handling in lockdep'?
wouldn't `lockdep + might_printk() + printk_safe' be enough? am I
missing something?

	-ss

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC][PATCH 6/7] printk: use alternative printk buffers
  2016-10-11  7:35                       ` Sergey Senozhatsky
@ 2016-10-11  9:30                         ` Petr Mladek
  0 siblings, 0 replies; 35+ messages in thread
From: Petr Mladek @ 2016-10-11  9:30 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Sergey Senozhatsky, Jan Kara, Andrew Morton, Tejun Heo,
	Calvin Owens, linux-kernel

On Tue 2016-10-11 16:35:28, Sergey Senozhatsky wrote:
> On (10/10/16 13:17), Petr Mladek wrote:
> [..]
> > > it may look that lockdep *probably* can report the issues via 'safe' printk,
> > > but that's a notably huge behavior breakage -- if lockdep report comes from
> > > an about-to-deadlock irq handler, then we won't see anything from that CPU
> > > unless there is a panic/nmi panic.
> > > 
> > > so it probably has to be semi-automatic/semi-manual:
> > > - add might_printk() that would acquire/release console sem; or
> > >   logbuf_lock (which is probably even better)
> > > - find all functions that do printk/WARN in kernel/time and kernel/sched
> > > - add might_printk() to those functions (just like might_sleep())
> > > - run the kernel
> > > - ...
> > > - profit
> > 
> > I like the idea with might_printk(). I hope that it will be acceptable
> > for the scheduler/timekeeping people.
> > 
> > JFYI, I could work on the printk-context handling in lockdep.
> > I am just working on a lockdep support in NMI and am getting
> > kind of familiar with that code.
> 
> sorry, what do you mean by 'printk-context handling in lockdep'?
> wouldn't `lockdep + might_printk() + printk_safe' be enough? am I
> missing something?

Good question. It was my intuition that we would need some extra
support in lockdep. It is quite complicated and I have not thought
about it deep enough yet.

Let's unwind some of my thoughts:

Well, the nice thing about the lockdep is that it is able to detect
dangerous lock chains in advance. It means even without being
in the deadlock situation.

IMHO, the printk-related deadlocks will be solved in two ways.
Either by replacing the classic printk() with the deferred
variant or by surrounding a problematic code with
printk_safe_enter()/exit() calls. If we do the fix correctly,
the lockdep warning should disappear.

Now, replacing printk() with printk_deferred() will have effect
even with the current lockdep code. It is because printk_deferred()
will not longer take the console-related locks, so they will not
longer appear in the call chain.

But what about adding printk_safe_enter()/exit()? Of course,
if we do it correctly, it will prevent a deadlock. But lockdep
is supposed to detect this in advance. And these functions just
handle the printk_context variable. The guarded code will still
take the same locks. Therefore my feeling is that lockdep will
need to be aware of the printk_context. It would have similar
effect like taking lock with interrupts enabled or disabled.
By other words, safe printk context would prevent entering
the same chain of locks recursively. It is like the disabled
interrupts prevent a recursion.

Finally, note that the proposed might_printk() emulates the classic
printk() call. It the classic printk() causes problems somewhere,
it will be replaced by printk_deferred() and might_printk()
will be removed. By other words, might_printk() shows where
we need the first solution (using printk_deferred) but it
does not tell the lockdep about the entire story.


I am not sure if the above makes sense. I need much more
coffee, sleep, and thinking to sort the problem completely.


Let me look at it from a slightly different angle:

The deadlock is caused by a chain of taken locks. lockdep
is able to warn about it in advance because it knows that
two partial lock chains might eventually happen together.
For this, the information about the interrupt context
and disabled interrupts is important. It tells what
chains of locks might happen together and what are
prevented. IMHO, the printk_context has similar effect
like the interrupt context stuff.

Well, there is a difference. Being in interrupt context
and having disabled interrupts are two separate values.
printk_context defines just one value. It either means
that the situation with printk is easier or that we
will need one more variable for the lockdep handling
of the printk state.

Anyway, I would not solve lockdep in this patchset.
Let's keep it as another challenge.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2016-10-11  9:31 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-27 14:22 [RFC][PATCH 0/7] printk: use alt_printk to handle printk() recursive calls Sergey Senozhatsky
2016-09-27 14:22 ` [RFC][PATCH 1/7] printk: use vprintk_func in vprintk() Sergey Senozhatsky
2016-09-27 14:22 ` [RFC][PATCH 2/7] printk: rename nmi.c and exported api Sergey Senozhatsky
2016-09-27 14:22 ` [RFC][PATCH 3/7] printk: introduce per-cpu alt_print seq buffer Sergey Senozhatsky
2016-09-29 12:26   ` Petr Mladek
2016-09-30  1:05     ` Sergey Senozhatsky
2016-09-30 11:35       ` Petr Mladek
2016-09-27 14:22 ` [RFC][PATCH 4/7] printk: make alt_printk available when config printk set Sergey Senozhatsky
2016-09-27 14:22 ` [RFC][PATCH 5/7] printk: drop vprintk_func function Sergey Senozhatsky
2016-09-27 14:22 ` [RFC][PATCH 6/7] printk: use alternative printk buffers Sergey Senozhatsky
2016-09-29 13:00   ` Petr Mladek
2016-09-30  1:15     ` Sergey Senozhatsky
2016-09-30 11:15       ` Petr Mladek
2016-10-01  2:48         ` Sergey Senozhatsky
2016-10-04 12:22           ` Petr Mladek
2016-10-05  1:36             ` Sergey Senozhatsky
2016-10-05 10:18               ` Petr Mladek
2016-10-03  7:53         ` Sergey Senozhatsky
2016-10-04 14:52         ` Petr Mladek
2016-10-05  1:27           ` Sergey Senozhatsky
2016-10-05  9:50             ` Petr Mladek
2016-10-06  4:22               ` Sergey Senozhatsky
2016-10-06 11:32                 ` Petr Mladek
2016-10-10  4:09                   ` Sergey Senozhatsky
2016-10-10 11:17                     ` Petr Mladek
2016-10-11  7:35                       ` Sergey Senozhatsky
2016-10-11  9:30                         ` Petr Mladek
2016-09-27 14:22 ` [RFC][PATCH 7/7] printk: new printk() recursion detection Sergey Senozhatsky
2016-09-29 13:19   ` Petr Mladek
2016-09-30  2:00     ` Sergey Senozhatsky
2016-09-29 13:25 ` [RFC][PATCH 0/7] printk: use alt_printk to handle printk() recursive calls Petr Mladek
2016-09-30  2:43   ` Sergey Senozhatsky
2016-09-30 11:27     ` Petr Mladek
2016-10-01  3:02       ` Sergey Senozhatsky
2016-10-04 11:35         ` Petr Mladek

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.