linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] kdb: Fix locking in vkdb_printf()
@ 2016-10-21 12:50 Petr Mladek
  2016-10-21 12:50 ` [PATCH 1/2] kdb: Properly synchronize vkdb_printf() calls with other CPUs Petr Mladek
  2016-10-21 12:50 ` [PATCH 2/2] kdb: Call vkdb_printf() from vprintk_default() only when wanted Petr Mladek
  0 siblings, 2 replies; 11+ messages in thread
From: Petr Mladek @ 2016-10-21 12:50 UTC (permalink / raw)
  To: Jason Wessel
  Cc: Daniel Thompson, Peter Zijlstra, Andrew Morton,
	Sergey Senozhatsky, linux-kernel, Petr Mladek

I have been asked whether it is safe to call vkdb_printf() from
vprintk_nmi() in NMI context. It seems that it is not safe.
Well, is not a big deal.

But I have noticed suspicious patterns when looking at the
vkdb_printf() locking code and prepared two patches that
should avoid some possible races.

Please, note that I am not familiar with the kdb implementation.
I hope that I have got the requested behavior right. I did
some basic testing and did not found any problem. But I am sure
that I did not test all usecases.

Petr Mladek (2):
  kdb: Properly synchronize vkdb_printf() calls with other CPUs
  kdb: Call vkdb_printf() from vprintk_default() only when wanted

 include/linux/kdb.h            |  1 +
 kernel/debug/kdb/kdb_io.c      | 41 +++++++++++++++++------------------------
 kernel/debug/kdb/kdb_private.h |  1 -
 kernel/printk/printk.c         |  4 +++-
 4 files changed, 21 insertions(+), 26 deletions(-)

-- 
1.8.5.6

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/2] kdb: Properly synchronize vkdb_printf() calls with other CPUs
  2016-10-21 12:50 [PATCH 0/2] kdb: Fix locking in vkdb_printf() Petr Mladek
@ 2016-10-21 12:50 ` Petr Mladek
  2016-11-07 10:07   ` Daniel Thompson
  2016-10-21 12:50 ` [PATCH 2/2] kdb: Call vkdb_printf() from vprintk_default() only when wanted Petr Mladek
  1 sibling, 1 reply; 11+ messages in thread
From: Petr Mladek @ 2016-10-21 12:50 UTC (permalink / raw)
  To: Jason Wessel
  Cc: Daniel Thompson, Peter Zijlstra, Andrew Morton,
	Sergey Senozhatsky, linux-kernel, Petr Mladek

kdb_printf_lock does not prevent other CPUs from entering the critical
section because it is ignored when KDB_STATE_PRINTF_LOCK is set.

The problematic situation might look like:

CPU0					CPU1

vkdb_printf()
  if (!KDB_STATE(PRINTF_LOCK))
    KDB_STATE_SET(PRINTF_LOCK);
    spin_lock_irqsave(&kdb_printf_lock, flags);

					vkdb_printf()
					  if (!KDB_STATE(PRINTF_LOCK))

BANG: The PRINTF_LOCK state is set and CPU1 is entering the critical
section without spinning on the lock.

The problem is that the code tries to implement locking using two state
variables that are not handled atomically. Well, we need a custom
locking because we want to allow reentering the critical section on
the very same CPU.

Let's use solution from Petr Zijlstra that was proposed for a similar
scenario, see
https://lkml.kernel.org/r/20161018171513.734367391@infradead.org

This patch uses the same trick with cmpxchg(). The only difference
is that we want to handle only recursion from the same context and
therefore we disable interrupts.

In addition, we want to be on the safe side and update "kdb_event" outside
of the critical section. Therefore we need an extra barrier before it gets
decremented.

Note that "kdb_event" is always incremented/decremented now. But it should
not cause any harm. The important information is whether it is zero or not.

Finally, KDB_STATE_PRINTF_LOCK is removed. In fact, we are not able to set
it a non-racy way.

Signed-off-by: Petr Mladek <pmladek@suse.com>
---
 kernel/debug/kdb/kdb_io.c      | 36 +++++++++++++++++-------------------
 kernel/debug/kdb/kdb_private.h |  1 -
 2 files changed, 17 insertions(+), 20 deletions(-)

diff --git a/kernel/debug/kdb/kdb_io.c b/kernel/debug/kdb/kdb_io.c
index fc1ef736253c..227b59ec7dbe 100644
--- a/kernel/debug/kdb/kdb_io.c
+++ b/kernel/debug/kdb/kdb_io.c
@@ -555,16 +555,16 @@ int vkdb_printf(enum kdb_msgsrc src, const char *fmt, va_list ap)
 	int colcount;
 	int logging, saved_loglevel = 0;
 	int saved_trap_printk;
-	int got_printf_lock = 0;
 	int retlen = 0;
 	int fnd, len;
+	int this_cpu, old_cpu;
+	static int kdb_printf_cpu = -1;
 	char *cp, *cp2, *cphold = NULL, replaced_byte = ' ';
 	char *moreprompt = "more> ";
 	struct console *c = console_drivers;
-	static DEFINE_SPINLOCK(kdb_printf_lock);
 	unsigned long uninitialized_var(flags);
 
-	preempt_disable();
+	local_irq_save(flags);
 	saved_trap_printk = kdb_trap_printk;
 	kdb_trap_printk = 0;
 
@@ -572,13 +572,14 @@ int vkdb_printf(enum kdb_msgsrc src, const char *fmt, va_list ap)
 	 * But if any cpu goes recursive in kdb, just print the output,
 	 * even if it is interleaved with any other text.
 	 */
-	if (!KDB_STATE(PRINTF_LOCK)) {
-		KDB_STATE_SET(PRINTF_LOCK);
-		spin_lock_irqsave(&kdb_printf_lock, flags);
-		got_printf_lock = 1;
-		atomic_inc(&kdb_event);
-	} else {
-		__acquire(kdb_printf_lock);
+	this_cpu = smp_processor_id();
+	atomic_inc(&kdb_event);
+	for (;;) {
+		old_cpu = cmpxchg(&kdb_printf_cpu, -1, this_cpu);
+		if (old_cpu == -1 || old_cpu == this_cpu)
+			break;
+
+		cpu_relax();
 	}
 
 	diag = kdbgetintenv("LINES", &linecount);
@@ -847,16 +848,13 @@ int vkdb_printf(enum kdb_msgsrc src, const char *fmt, va_list ap)
 	suspend_grep = 0; /* end of what may have been a recursive call */
 	if (logging)
 		console_loglevel = saved_loglevel;
-	if (KDB_STATE(PRINTF_LOCK) && got_printf_lock) {
-		got_printf_lock = 0;
-		spin_unlock_irqrestore(&kdb_printf_lock, flags);
-		KDB_STATE_CLEAR(PRINTF_LOCK);
-		atomic_dec(&kdb_event);
-	} else {
-		__release(kdb_printf_lock);
-	}
+	/* kdb_printf_cpu locked the code above. */
+	smp_store_release(&kdb_printf_cpu, old_cpu);
+	/* Update kdb_event around the locked section. */
+	smp_mb__before_atomic();
+	atomic_dec(&kdb_event);
 	kdb_trap_printk = saved_trap_printk;
-	preempt_enable();
+	local_irq_restore(flags);
 	return retlen;
 }
 
diff --git a/kernel/debug/kdb/kdb_private.h b/kernel/debug/kdb/kdb_private.h
index 75014d7f4568..fc224fbcf954 100644
--- a/kernel/debug/kdb/kdb_private.h
+++ b/kernel/debug/kdb/kdb_private.h
@@ -132,7 +132,6 @@ extern int kdbgetaddrarg(int, const char **, int*, unsigned long *,
 #define KDB_STATE_PAGER		0x00000400	/* pager is available */
 #define KDB_STATE_GO_SWITCH	0x00000800	/* go is switching
 						 * back to initial cpu */
-#define KDB_STATE_PRINTF_LOCK	0x00001000	/* Holds kdb_printf lock */
 #define KDB_STATE_WAIT_IPI	0x00002000	/* Waiting for kdb_ipi() NMI */
 #define KDB_STATE_RECURSE	0x00004000	/* Recursive entry to kdb */
 #define KDB_STATE_IP_ADJUSTED	0x00008000	/* Restart IP has been
-- 
1.8.5.6

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/2] kdb: Call vkdb_printf() from vprintk_default() only when wanted
  2016-10-21 12:50 [PATCH 0/2] kdb: Fix locking in vkdb_printf() Petr Mladek
  2016-10-21 12:50 ` [PATCH 1/2] kdb: Properly synchronize vkdb_printf() calls with other CPUs Petr Mladek
@ 2016-10-21 12:50 ` Petr Mladek
  2016-10-23 13:23   ` Sergey Senozhatsky
  2016-11-07 10:24   ` Daniel Thompson
  1 sibling, 2 replies; 11+ messages in thread
From: Petr Mladek @ 2016-10-21 12:50 UTC (permalink / raw)
  To: Jason Wessel
  Cc: Daniel Thompson, Peter Zijlstra, Andrew Morton,
	Sergey Senozhatsky, linux-kernel, Petr Mladek

kdb_trap_printk allows to pass normal printk() messages to kdb via
vkdb_printk(). For example, it is used to get backtrace using
the classic show_stack(), see kdb_show_stack().

vkdb_printf() tries to avoid a potential infinite loop by disabling
the trap. But this approach is racy, for example:

CPU1					CPU2

vkdb_printf()
  // assume that kdb_trap_printk == 0
  saved_trap_printk = kdb_trap_printk;
  kdb_trap_printk = 0;

					kdb_show_stack()
					  kdb_trap_printk++;

Problem1: Now, a nested printk() on CPU0 calls vkdb_printf()
	  even when it should have been disabled. It will not
	  cause a deadlock but...

   // using the outdated saved value: 0
   kdb_trap_printk = saved_trap_printk;

					  kdb_trap_printk--;

Problem2: Now, kdb_trap_printk == -1 and will stay like this.
   It means that all messages will get passed to kdb from
   now on.

This patch removes the racy saved_trap_printk handling. Instead,
the recursion is prevented by a check for the locked CPU.

Signed-off-by: Petr Mladek <pmladek@suse.com>
---
 include/linux/kdb.h       | 1 +
 kernel/debug/kdb/kdb_io.c | 9 ++-------
 kernel/printk/printk.c    | 4 +++-
 3 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/include/linux/kdb.h b/include/linux/kdb.h
index 410decacff8f..ccd4cf838c66 100644
--- a/include/linux/kdb.h
+++ b/include/linux/kdb.h
@@ -162,6 +162,7 @@ enum kdb_msgsrc {
 };
 
 extern int kdb_trap_printk;
+extern int kdb_printf_cpu;
 extern __printf(2, 0) int vkdb_printf(enum kdb_msgsrc src, const char *fmt,
 				      va_list args);
 extern __printf(1, 2) int kdb_printf(const char *, ...);
diff --git a/kernel/debug/kdb/kdb_io.c b/kernel/debug/kdb/kdb_io.c
index 227b59ec7dbe..209507e30bd4 100644
--- a/kernel/debug/kdb/kdb_io.c
+++ b/kernel/debug/kdb/kdb_io.c
@@ -30,6 +30,7 @@
 char kdb_prompt_str[CMD_BUFLEN];
 
 int kdb_trap_printk;
+int kdb_printf_cpu = -1;
 
 static int kgdb_transition_check(char *buffer)
 {
@@ -554,24 +555,19 @@ int vkdb_printf(enum kdb_msgsrc src, const char *fmt, va_list ap)
 	int linecount;
 	int colcount;
 	int logging, saved_loglevel = 0;
-	int saved_trap_printk;
 	int retlen = 0;
 	int fnd, len;
 	int this_cpu, old_cpu;
-	static int kdb_printf_cpu = -1;
 	char *cp, *cp2, *cphold = NULL, replaced_byte = ' ';
 	char *moreprompt = "more> ";
 	struct console *c = console_drivers;
 	unsigned long uninitialized_var(flags);
 
-	local_irq_save(flags);
-	saved_trap_printk = kdb_trap_printk;
-	kdb_trap_printk = 0;
-
 	/* Serialize kdb_printf if multiple cpus try to write at once.
 	 * But if any cpu goes recursive in kdb, just print the output,
 	 * even if it is interleaved with any other text.
 	 */
+	local_irq_save(flags);
 	this_cpu = smp_processor_id();
 	atomic_inc(&kdb_event);
 	for (;;) {
@@ -853,7 +849,6 @@ int vkdb_printf(enum kdb_msgsrc src, const char *fmt, va_list ap)
 	/* Update kdb_event around the locked section. */
 	smp_mb__before_atomic();
 	atomic_dec(&kdb_event);
-	kdb_trap_printk = saved_trap_printk;
 	local_irq_restore(flags);
 	return retlen;
 }
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index d5e397315473..db73e33811e7 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1941,7 +1941,9 @@ int vprintk_default(const char *fmt, va_list args)
 	int r;
 
 #ifdef CONFIG_KGDB_KDB
-	if (unlikely(kdb_trap_printk)) {
+	/* Allow to pass printk() to kdb but avoid a recursion. */
+	if (unlikely(kdb_trap_printk &&
+		     kdb_printf_cpu != smp_processor_id())) {
 		r = vkdb_printf(KDB_MSGSRC_PRINTK, fmt, args);
 		return r;
 	}
-- 
1.8.5.6

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] kdb: Call vkdb_printf() from vprintk_default() only when wanted
  2016-10-21 12:50 ` [PATCH 2/2] kdb: Call vkdb_printf() from vprintk_default() only when wanted Petr Mladek
@ 2016-10-23 13:23   ` Sergey Senozhatsky
  2016-11-22 12:14     ` Petr Mladek
  2016-11-07 10:24   ` Daniel Thompson
  1 sibling, 1 reply; 11+ messages in thread
From: Sergey Senozhatsky @ 2016-10-23 13:23 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Jason Wessel, Daniel Thompson, Peter Zijlstra, Andrew Morton,
	Sergey Senozhatsky, linux-kernel

On (10/21/16 14:50), Petr Mladek wrote:
[..]
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index d5e397315473..db73e33811e7 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -1941,7 +1941,9 @@ int vprintk_default(const char *fmt, va_list args)
>  	int r;
>  
>  #ifdef CONFIG_KGDB_KDB
> -	if (unlikely(kdb_trap_printk)) {
> +	/* Allow to pass printk() to kdb but avoid a recursion. */
> +	if (unlikely(kdb_trap_printk &&
> +		     kdb_printf_cpu != smp_processor_id())) {
					^^^^^
		aren't we are in preemptible here?

	-ss

>  		r = vkdb_printf(KDB_MSGSRC_PRINTK, fmt, args);
>  		return r;

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] kdb: Properly synchronize vkdb_printf() calls with other CPUs
  2016-10-21 12:50 ` [PATCH 1/2] kdb: Properly synchronize vkdb_printf() calls with other CPUs Petr Mladek
@ 2016-11-07 10:07   ` Daniel Thompson
  2016-11-22 10:34     ` Petr Mladek
  0 siblings, 1 reply; 11+ messages in thread
From: Daniel Thompson @ 2016-11-07 10:07 UTC (permalink / raw)
  To: Petr Mladek, Jason Wessel
  Cc: Peter Zijlstra, Andrew Morton, Sergey Senozhatsky, linux-kernel

On 21/10/16 13:50, Petr Mladek wrote:
> kdb_printf_lock does not prevent other CPUs from entering the critical
> section because it is ignored when KDB_STATE_PRINTF_LOCK is set.
>
> The problematic situation might look like:
>
 > ...
>
> Signed-off-by: Petr Mladek <pmladek@suse.com>
> ---
>  kernel/debug/kdb/kdb_io.c      | 36 +++++++++++++++++-------------------
>  kernel/debug/kdb/kdb_private.h |  1 -
>  2 files changed, 17 insertions(+), 20 deletions(-)
>
> diff --git a/kernel/debug/kdb/kdb_io.c b/kernel/debug/kdb/kdb_io.c
> index fc1ef736253c..227b59ec7dbe 100644
> --- a/kernel/debug/kdb/kdb_io.c
> +++ b/kernel/debug/kdb/kdb_io.c
> @@ -555,16 +555,16 @@ int vkdb_printf(enum kdb_msgsrc src, const char *fmt, va_list ap)
>  	int colcount;
>  	int logging, saved_loglevel = 0;
>  	int saved_trap_printk;
> -	int got_printf_lock = 0;
>  	int retlen = 0;
>  	int fnd, len;
> +	int this_cpu, old_cpu;
> +	static int kdb_printf_cpu = -1;
>  	char *cp, *cp2, *cphold = NULL, replaced_byte = ' ';
>  	char *moreprompt = "more> ";
>  	struct console *c = console_drivers;
> -	static DEFINE_SPINLOCK(kdb_printf_lock);
>  	unsigned long uninitialized_var(flags);
>
> -	preempt_disable();
> +	local_irq_save(flags);
>  	saved_trap_printk = kdb_trap_printk;
>  	kdb_trap_printk = 0;
>
> @@ -572,13 +572,14 @@ int vkdb_printf(enum kdb_msgsrc src, const char *fmt, va_list ap)
>  	 * But if any cpu goes recursive in kdb, just print the output,
>  	 * even if it is interleaved with any other text.
>  	 */
> -	if (!KDB_STATE(PRINTF_LOCK)) {
> -		KDB_STATE_SET(PRINTF_LOCK);
> -		spin_lock_irqsave(&kdb_printf_lock, flags);
> -		got_printf_lock = 1;
> -		atomic_inc(&kdb_event);
> -	} else {
> -		__acquire(kdb_printf_lock);
> +	this_cpu = smp_processor_id();
> +	atomic_inc(&kdb_event);

When reviewing I noticed that, when we recursively enter, kdb_event is 
handled differently after this patch so I wanted to figure out if this 
alternative handling of kdb_event was safe.

In the end I concluded it is safe but that's mostly because the *only* 
thing we ever seem to do with kdb_event is increment and decrement it. 
So perhaps adding another patch at the front of this series to nuke this 
variable would be worthwhile (whilst making this patch easier to read).

However, your choice and, either way:
Reviewed-by: Daniel Thompson <daniel.thompson@linaro.org>


Daniel.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] kdb: Call vkdb_printf() from vprintk_default() only when wanted
  2016-10-21 12:50 ` [PATCH 2/2] kdb: Call vkdb_printf() from vprintk_default() only when wanted Petr Mladek
  2016-10-23 13:23   ` Sergey Senozhatsky
@ 2016-11-07 10:24   ` Daniel Thompson
  2016-11-22 12:45     ` Petr Mladek
  1 sibling, 1 reply; 11+ messages in thread
From: Daniel Thompson @ 2016-11-07 10:24 UTC (permalink / raw)
  To: Petr Mladek, Jason Wessel
  Cc: Peter Zijlstra, Andrew Morton, Sergey Senozhatsky, linux-kernel

On 21/10/16 13:50, Petr Mladek wrote:
> kdb_trap_printk allows to pass normal printk() messages to kdb via
> vkdb_printk(). For example, it is used to get backtrace using
> the classic show_stack(), see kdb_show_stack().
>
> vkdb_printf() tries to avoid a potential infinite loop by disabling
> the trap. But this approach is racy, for example:
>
> CPU1					CPU2
>
> vkdb_printf()
>   // assume that kdb_trap_printk == 0
>   saved_trap_printk = kdb_trap_printk;
>   kdb_trap_printk = 0;
>
> 					kdb_show_stack()
> 					  kdb_trap_printk++;

When kdb is running any of the commands that use kdb_trap_printk there 
is a single active CPU and the other CPUs should be in a holding pen 
inside kgdb_cpu_enter().

The only time this is violated is when there is a timeout waiting for 
the other CPUs to report to the holding pen.


> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index d5e397315473..db73e33811e7 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -1941,7 +1941,9 @@ int vprintk_default(const char *fmt, va_list args)
>  	int r;
>
>  #ifdef CONFIG_KGDB_KDB
> -	if (unlikely(kdb_trap_printk)) {
> +	/* Allow to pass printk() to kdb but avoid a recursion. */
> +	if (unlikely(kdb_trap_printk &&
> +		     kdb_printf_cpu != smp_processor_id())) {

Firstly, why !=?

Secondly, if kdb_trap_printk is set and the "wrong" CPU calls printk 
then we have an opportunity to trap a rouge processor in the holding pen 
meaning the test should probably be part of vkdb_printk() anyway.


Daniel.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] kdb: Properly synchronize vkdb_printf() calls with other CPUs
  2016-11-07 10:07   ` Daniel Thompson
@ 2016-11-22 10:34     ` Petr Mladek
  0 siblings, 0 replies; 11+ messages in thread
From: Petr Mladek @ 2016-11-22 10:34 UTC (permalink / raw)
  To: Daniel Thompson
  Cc: Jason Wessel, Peter Zijlstra, Andrew Morton, Sergey Senozhatsky,
	linux-kernel

On Mon 2016-11-07 10:07:36, Daniel Thompson wrote:
> On 21/10/16 13:50, Petr Mladek wrote:
> >kdb_printf_lock does not prevent other CPUs from entering the critical
> >section because it is ignored when KDB_STATE_PRINTF_LOCK is set.
> >
> >The problematic situation might look like:
> >
> > ...
> >
> >Signed-off-by: Petr Mladek <pmladek@suse.com>
> >---
> > kernel/debug/kdb/kdb_io.c      | 36 +++++++++++++++++-------------------
> > kernel/debug/kdb/kdb_private.h |  1 -
> > 2 files changed, 17 insertions(+), 20 deletions(-)
> >
> >diff --git a/kernel/debug/kdb/kdb_io.c b/kernel/debug/kdb/kdb_io.c
> >index fc1ef736253c..227b59ec7dbe 100644
> >--- a/kernel/debug/kdb/kdb_io.c
> >+++ b/kernel/debug/kdb/kdb_io.c
> >@@ -555,16 +555,16 @@ int vkdb_printf(enum kdb_msgsrc src, const char *fmt, va_list ap)
> > 	int colcount;
> > 	int logging, saved_loglevel = 0;
> > 	int saved_trap_printk;
> >-	int got_printf_lock = 0;
> > 	int retlen = 0;
> > 	int fnd, len;
> >+	int this_cpu, old_cpu;
> >+	static int kdb_printf_cpu = -1;
> > 	char *cp, *cp2, *cphold = NULL, replaced_byte = ' ';
> > 	char *moreprompt = "more> ";
> > 	struct console *c = console_drivers;
> >-	static DEFINE_SPINLOCK(kdb_printf_lock);
> > 	unsigned long uninitialized_var(flags);
> >
> >-	preempt_disable();
> >+	local_irq_save(flags);
> > 	saved_trap_printk = kdb_trap_printk;
> > 	kdb_trap_printk = 0;
> >
> >@@ -572,13 +572,14 @@ int vkdb_printf(enum kdb_msgsrc src, const char *fmt, va_list ap)
> > 	 * But if any cpu goes recursive in kdb, just print the output,
> > 	 * even if it is interleaved with any other text.
> > 	 */
> >-	if (!KDB_STATE(PRINTF_LOCK)) {
> >-		KDB_STATE_SET(PRINTF_LOCK);
> >-		spin_lock_irqsave(&kdb_printf_lock, flags);
> >-		got_printf_lock = 1;
> >-		atomic_inc(&kdb_event);
> >-	} else {
> >-		__acquire(kdb_printf_lock);
> >+	this_cpu = smp_processor_id();
> >+	atomic_inc(&kdb_event);
> 
> When reviewing I noticed that, when we recursively enter, kdb_event
> is handled differently after this patch so I wanted to figure out if
> this alternative handling of kdb_event was safe.
> 
> In the end I concluded it is safe but that's mostly because the
> *only* thing we ever seem to do with kdb_event is increment and
> decrement it. So perhaps adding another patch at the front of this
> series to nuke this variable would be worthwhile (whilst making this
> patch easier to read).

Yeah, I was curious about kdb_event as well. I wondered if it might
be used by some 3rd party stuff.

I can find only one usage. It was supposed to affect
WARN_CONSOLE_UNLOCKED(), see
http://www.spinics.net/lists/kdb/msg01733.html
But this was never upstreamed, so kdb_event really looks like an
historical artifact that might get removed. I am going to so in v2.

> However, your choice and, either way:
> Reviewed-by: Daniel Thompson <daniel.thompson@linaro.org>

Thanks a lot for review.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] kdb: Call vkdb_printf() from vprintk_default() only when wanted
  2016-10-23 13:23   ` Sergey Senozhatsky
@ 2016-11-22 12:14     ` Petr Mladek
  0 siblings, 0 replies; 11+ messages in thread
From: Petr Mladek @ 2016-11-22 12:14 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Jason Wessel, Daniel Thompson, Peter Zijlstra, Andrew Morton,
	linux-kernel

On Sun 2016-10-23 22:23:43, Sergey Senozhatsky wrote:
> On (10/21/16 14:50), Petr Mladek wrote:
> [..]
> > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> > index d5e397315473..db73e33811e7 100644
> > --- a/kernel/printk/printk.c
> > +++ b/kernel/printk/printk.c
> > @@ -1941,7 +1941,9 @@ int vprintk_default(const char *fmt, va_list args)
> >  	int r;
> >  
> >  #ifdef CONFIG_KGDB_KDB
> > -	if (unlikely(kdb_trap_printk)) {
> > +	/* Allow to pass printk() to kdb but avoid a recursion. */
> > +	if (unlikely(kdb_trap_printk &&
> > +		     kdb_printf_cpu != smp_processor_id())) {
> 					^^^^^
> 		aren't we are in preemptible here?

Yeah, I looked on this from one side only. "kdb_printf_cpu" is set
with disabled IRQs. Therefore the preemption is disabled if we
are in the recursion scenario. But you are right that we might
get a false positive if we are preempted in the middle of this check
and later scheduled on the CPU that called kdb_vprintf before.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] kdb: Call vkdb_printf() from vprintk_default() only when wanted
  2016-11-07 10:24   ` Daniel Thompson
@ 2016-11-22 12:45     ` Petr Mladek
  2016-11-22 14:32       ` Daniel Thompson
  0 siblings, 1 reply; 11+ messages in thread
From: Petr Mladek @ 2016-11-22 12:45 UTC (permalink / raw)
  To: Daniel Thompson
  Cc: Jason Wessel, Peter Zijlstra, Andrew Morton, Sergey Senozhatsky,
	linux-kernel

On Mon 2016-11-07 10:24:22, Daniel Thompson wrote:
> On 21/10/16 13:50, Petr Mladek wrote:
> >kdb_trap_printk allows to pass normal printk() messages to kdb via
> >vkdb_printk(). For example, it is used to get backtrace using
> >the classic show_stack(), see kdb_show_stack().
> >
> >vkdb_printf() tries to avoid a potential infinite loop by disabling
> >the trap. But this approach is racy, for example:
> >
> >CPU1					CPU2
> >
> >vkdb_printf()
> >  // assume that kdb_trap_printk == 0
> >  saved_trap_printk = kdb_trap_printk;
> >  kdb_trap_printk = 0;
> >
> >					kdb_show_stack()
> >					  kdb_trap_printk++;
> 
> When kdb is running any of the commands that use kdb_trap_printk
> there is a single active CPU and the other CPUs should be in a
> holding pen inside kgdb_cpu_enter().
> 
> The only time this is violated is when there is a timeout waiting
> for the other CPUs to report to the holding pen.

It means that the race window is small but it is there. Do I get
it correctly, please?

Thanks a lot for explanation. I was not sure how exactly this worked.
I only saw the games with kdb_printf_cpu in vkdb_printf(). Therefore
I expected that some parallelism was possible.

> 
> >diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> >index d5e397315473..db73e33811e7 100644
> >--- a/kernel/printk/printk.c
> >+++ b/kernel/printk/printk.c
> >@@ -1941,7 +1941,9 @@ int vprintk_default(const char *fmt, va_list args)
> > 	int r;
> >
> > #ifdef CONFIG_KGDB_KDB
> >-	if (unlikely(kdb_trap_printk)) {
> >+	/* Allow to pass printk() to kdb but avoid a recursion. */
> >+	if (unlikely(kdb_trap_printk &&
> >+		     kdb_printf_cpu != smp_processor_id())) {
> 
> Firstly, why !=?
>
> Secondly, if kdb_trap_printk is set and the "wrong" CPU calls printk
> then we have an opportunity to trap a rouge processor in the holding
> pen meaning the test should probably be part of vkdb_printk()
> anyway.

I agree that it is confusing:

On one hand, vkdb_printf() explicitly allows recursion on the same
CPU. See the handling of kdb_printf_lock before the 1st patch from
this series. Also it mentioned by the comment:

	/* Serialize kdb_printf if multiple cpus try to write at once.
	 * But if any cpu goes recursive in kdb, just print the output,
	 * even if it is interleaved with any other text.
	 */


On the other hand. The lines

       saved_trap_printk = kdb_trap_printk;
       kdb_trap_printk = 0;

means that someone wanted to explicitly disable recursion via
the generic printk(). This is the reason why I used "!=" and why
I added this check into vprintk_default().


By other words, we allow recursion caused by kdb internal messages
that are printed directly by kdb_printf()). But we disable recursion
caused by all other messages that are printed using the generic
printk(). This patch keeps the logic. It might make some sense.
But it is hard for me to judge.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] kdb: Call vkdb_printf() from vprintk_default() only when wanted
  2016-11-22 12:45     ` Petr Mladek
@ 2016-11-22 14:32       ` Daniel Thompson
  2016-11-23 16:29         ` Petr Mladek
  0 siblings, 1 reply; 11+ messages in thread
From: Daniel Thompson @ 2016-11-22 14:32 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Jason Wessel, Peter Zijlstra, Andrew Morton, Sergey Senozhatsky,
	linux-kernel

On 22/11/16 12:45, Petr Mladek wrote:
> On Mon 2016-11-07 10:24:22, Daniel Thompson wrote:
>> On 21/10/16 13:50, Petr Mladek wrote:
>>> kdb_trap_printk allows to pass normal printk() messages to kdb via
>>> vkdb_printk(). For example, it is used to get backtrace using
>>> the classic show_stack(), see kdb_show_stack().
>>>
>>> vkdb_printf() tries to avoid a potential infinite loop by disabling
>>> the trap. But this approach is racy, for example:
>>>
>>> CPU1					CPU2
>>>
>>> vkdb_printf()
>>>  // assume that kdb_trap_printk == 0
>>>  saved_trap_printk = kdb_trap_printk;
>>>  kdb_trap_printk = 0;
>>>
>>> 					kdb_show_stack()
>>> 					  kdb_trap_printk++;
>>
>> When kdb is running any of the commands that use kdb_trap_printk
>> there is a single active CPU and the other CPUs should be in a
>> holding pen inside kgdb_cpu_enter().
>>
>> The only time this is violated is when there is a timeout waiting
>> for the other CPUs to report to the holding pen.
>
> It means that the race window is small but it is there. Do I get
> it correctly, please?

I don't think the size of the race window is particularly interesting; 
it does exist.

It is more important to be clear that the circumstances when the race 
exists are when an attempt to stop-the-world has failed. So rather than 
close the race it might be more useful to make the race unreachable by 
bringing the CPU that has failed to report under control.


> Thanks a lot for explanation. I was not sure how exactly this worked.
> I only saw the games with kdb_printf_cpu in vkdb_printf(). Therefore
> I expected that some parallelism was possible.

Yes. I'm coming round to the point-of-view that if the code looks like 
it expected to run in parallel if should be correct for it to run in 
parallel! Otherwise its leads to broken thinking when changing things.


>>> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
>>> index d5e397315473..db73e33811e7 100644
>>> --- a/kernel/printk/printk.c
>>> +++ b/kernel/printk/printk.c
>>> @@ -1941,7 +1941,9 @@ int vprintk_default(const char *fmt, va_list args)
>>> 	int r;
>>>
>>> #ifdef CONFIG_KGDB_KDB
>>> -	if (unlikely(kdb_trap_printk)) {
>>> +	/* Allow to pass printk() to kdb but avoid a recursion. */
>>> +	if (unlikely(kdb_trap_printk &&
>>> +		     kdb_printf_cpu != smp_processor_id())) {
>>
>> Firstly, why !=?

Having re-read things, this comment was spurious.


>> Secondly, if kdb_trap_printk is set and the "wrong" CPU calls printk
>> then we have an opportunity to trap a rouge processor in the holding
>> pen meaning the test should probably be part of vkdb_printk()
>> anyway.
>
> I agree that it is confusing:
>
> On one hand, vkdb_printf() explicitly allows recursion on the same
> CPU. See the handling of kdb_printf_lock before the 1st patch from
> this series. Also it mentioned by the comment:
>
> 	/* Serialize kdb_printf if multiple cpus try to write at once.
> 	 * But if any cpu goes recursive in kdb, just print the output,
> 	 * even if it is interleaved with any other text.
> 	 */
 >
> On the other hand. The lines
>
>        saved_trap_printk = kdb_trap_printk;
>        kdb_trap_printk = 0;
>
> means that someone wanted to explicitly disable recursion via
> the generic printk(). This is the reason why I used "!=" and why
> I added this check into vprintk_default().

vkdb_printf()'s rough job is to:

   1. Issue the string to the kdb console (not the same as kernel
      console)
   2. Store the string in the kernel logs.

Of the above, #2 is achieved by temporarily setting the console log 
level to 0, calling printk() and restoring the log level. This is why 
vkdb_printf() needs the silly dance to avoid the recursion.


> By other words, we allow recursion caused by kdb internal messages
> that are printed directly by kdb_printf(). But we disable recursion
> caused by all other messages that are printed using the generic
> printk(). This patch keeps the logic. It might make some sense.
> But it is hard for me to judge.

Yes, I think it makes sense.

Right now I'm toying with the idea of using the printk_func code for 
printk() interception.

Firstly printk_func is per-cpu which removes a huge chain of complex 
reasoning (for a case that doesn't really exist).

Secondly we could call the saved_printk_func directly making recursion 
impossible.

Sadly, if we accept that a rouge CPU might still be calling printk() 
then the approach above just opens up a race on the console log level...

Sigh...


Daniel.
> Best Regards,
> Petr
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] kdb: Call vkdb_printf() from vprintk_default() only when wanted
  2016-11-22 14:32       ` Daniel Thompson
@ 2016-11-23 16:29         ` Petr Mladek
  0 siblings, 0 replies; 11+ messages in thread
From: Petr Mladek @ 2016-11-23 16:29 UTC (permalink / raw)
  To: Daniel Thompson
  Cc: Jason Wessel, Peter Zijlstra, Andrew Morton, Sergey Senozhatsky,
	linux-kernel

On Tue 2016-11-22 14:32:39, Daniel Thompson wrote:
> On 22/11/16 12:45, Petr Mladek wrote:
> >On Mon 2016-11-07 10:24:22, Daniel Thompson wrote:
> >>On 21/10/16 13:50, Petr Mladek wrote:
> >>>kdb_trap_printk allows to pass normal printk() messages to kdb via
> >>>vkdb_printk(). For example, it is used to get backtrace using
> >>>the classic show_stack(), see kdb_show_stack().
> >>>
> >>>vkdb_printf() tries to avoid a potential infinite loop by disabling
> >>>the trap. But this approach is racy, for example:
> >>>
> >>>CPU1					CPU2
> >>>
> >>>vkdb_printf()
> >>> // assume that kdb_trap_printk == 0
> >>> saved_trap_printk = kdb_trap_printk;
> >>> kdb_trap_printk = 0;
> >>>
> >>>					kdb_show_stack()
> >>>					  kdb_trap_printk++;
> >>
> >>When kdb is running any of the commands that use kdb_trap_printk
> >>there is a single active CPU and the other CPUs should be in a
> >>holding pen inside kgdb_cpu_enter().
> >>
> >>The only time this is violated is when there is a timeout waiting
> >>for the other CPUs to report to the holding pen.
> >
> >It means that the race window is small but it is there. Do I get
> >it correctly, please?
> 
> I don't think the size of the race window is particularly
> interesting; it does exist.
> 
> It is more important to be clear that the circumstances when the
> race exists are when an attempt to stop-the-world has failed. So
> rather than close the race it might be more useful to make the race
> unreachable by bringing the CPU that has failed to report under
> control.

Yup. The question is how the stop-the-world could fail and if we could
prevent all failures now and in the future.

> vkdb_printf()'s rough job is to:
> 
>   1. Issue the string to the kdb console (not the same as kernel
>      console)
>   2. Store the string in the kernel logs.
> 
> Of the above, #2 is achieved by temporarily setting the console log
> level to 0, calling printk() and restoring the log level. This is
> why vkdb_printf() needs the silly dance to avoid the recursion.
> 
> Right now I'm toying with the idea of using the printk_func code for
> printk() interception.
> 
> Firstly printk_func is per-cpu which removes a huge chain of complex
> reasoning (for a case that doesn't really exist).
> 
> Secondly we could call the saved_printk_func directly making
> recursion impossible.

Note that there are plans to define per-CPU printk_context variable,
see https://lkml.kernel.org/r/20161027154933.1211-4-sergey.senozhatsky@gmail.com
It will be used to choose the right printk_func. It will allow
to have more contexts and enter some of them recursively.

Then it should be easier to add support for KDB context.


> Sadly, if we accept that a rouge CPU might still be calling printk()
> then the approach above just opens up a race on the console log
> level...

We are also very close to add async printk. It will allow to
delegate the console handling into separate kthread. Then printk()
should not longer call console directly. The only exception would
be a system panic, when we really want to get messages out ASAP.

This last patchset can be seen at
https://lkml.kernel.org/r/1461333180-2897-1-git-send-email-sergey.senozhatsky@gmail.com
But Sergey plans to get in the printk_context/printk_safe stuff first.

Anyway, then it should be easier to disable the console in kdb
context.

Best Regards,
Petr

PS: I am going to prepare v2 of this patchset. It might take
some time as I am repeatedly interrupted by some other urgent work.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2016-11-23 16:29 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-21 12:50 [PATCH 0/2] kdb: Fix locking in vkdb_printf() Petr Mladek
2016-10-21 12:50 ` [PATCH 1/2] kdb: Properly synchronize vkdb_printf() calls with other CPUs Petr Mladek
2016-11-07 10:07   ` Daniel Thompson
2016-11-22 10:34     ` Petr Mladek
2016-10-21 12:50 ` [PATCH 2/2] kdb: Call vkdb_printf() from vprintk_default() only when wanted Petr Mladek
2016-10-23 13:23   ` Sergey Senozhatsky
2016-11-22 12:14     ` Petr Mladek
2016-11-07 10:24   ` Daniel Thompson
2016-11-22 12:45     ` Petr Mladek
2016-11-22 14:32       ` Daniel Thompson
2016-11-23 16:29         ` Petr Mladek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).