linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] kgdb: Fix broken handling of printk() in NMI context
@ 2020-05-12  8:48 Sumit Garg
  2020-05-12 14:25 ` Daniel Thompson
  0 siblings, 1 reply; 22+ messages in thread
From: Sumit Garg @ 2020-05-12  8:48 UTC (permalink / raw)
  To: daniel.thompson, jason.wessel, dianders
  Cc: kgdb-bugreport, linux-kernel, Sumit Garg

Since commit 42a0bb3f7138 ("printk/nmi: generic solution for safe printk
in NMI"), kgdb entry in NMI context defaults to use safe NMI printk()
which involves CPU specific buffers and deferred printk() until exit from
NMI context.

But kgdb being a stop-the-world debugger, we don't want to defer printk()
especially backtrace on corresponding CPUs. So instead switch to normal
printk() mode in kgdb_cpu_enter() if entry is in NMI context.

Signed-off-by: Sumit Garg <sumit.garg@linaro.org>
---

Similar change was posted earlier specific to arm64 here [1]. But after
discussions it emerged out that this broken handling of printk() in NMI
context should be a common problem that is relevant to other archs as well.
So fix this handling in kgdb_cpu_enter() as there can be multiple entry
points to kgdb in NMI context.

[1] https://lkml.org/lkml/2020/4/24/328

 kernel/debug/debug_core.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/kernel/debug/debug_core.c b/kernel/debug/debug_core.c
index 2b7c9b6..ab2933f 100644
--- a/kernel/debug/debug_core.c
+++ b/kernel/debug/debug_core.c
@@ -567,6 +567,15 @@ static int kgdb_cpu_enter(struct kgdb_state *ks, struct pt_regs *regs,
 	kgdb_info[ks->cpu].enter_kgdb++;
 	kgdb_info[ks->cpu].exception_state |= exception_state;
 
+	/*
+	 * kgdb entry in NMI context defaults to use safe NMI printk() which
+	 * involves CPU specific buffers and deferred printk() until exit from
+	 * NMI context. But kgdb being a stop-the-world debugger, we don't want
+	 * to defer printk(). So instead switch to normal printk() mode here.
+	 */
+	if (in_nmi())
+		printk_nmi_exit();
+
 	if (exception_state == DCPU_WANT_MASTER)
 		atomic_inc(&masters_in_kgdb);
 	else
@@ -635,6 +644,8 @@ static int kgdb_cpu_enter(struct kgdb_state *ks, struct pt_regs *regs,
 			atomic_dec(&slaves_in_kgdb);
 			dbg_touch_watchdogs();
 			local_irq_restore(flags);
+			if (in_nmi())
+				printk_nmi_enter();
 			return 0;
 		}
 		cpu_relax();
@@ -772,6 +783,8 @@ static int kgdb_cpu_enter(struct kgdb_state *ks, struct pt_regs *regs,
 	raw_spin_unlock(&dbg_master_lock);
 	dbg_touch_watchdogs();
 	local_irq_restore(flags);
+	if (in_nmi())
+		printk_nmi_enter();
 
 	return kgdb_info[cpu].ret_state;
 }
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH] kgdb: Fix broken handling of printk() in NMI context
  2020-05-12  8:48 [PATCH] kgdb: Fix broken handling of printk() in NMI context Sumit Garg
@ 2020-05-12 14:25 ` Daniel Thompson
  2020-05-13 13:34   ` Sumit Garg
  0 siblings, 1 reply; 22+ messages in thread
From: Daniel Thompson @ 2020-05-12 14:25 UTC (permalink / raw)
  To: Sumit Garg
  Cc: jason.wessel, dianders, kgdb-bugreport, linux-kernel, arnd, akpm,
	Petr Mladek, Peter Zijlstra, Steven Rostedt

On Tue, May 12, 2020 at 02:18:34PM +0530, Sumit Garg wrote:
> Since commit 42a0bb3f7138 ("printk/nmi: generic solution for safe printk
> in NMI"), kgdb entry in NMI context defaults to use safe NMI printk()

I didn't see the author on Cc: nor any of the folks whose hands it
passed through. It would definitely be good to involve them in this
discussion.


> which involves CPU specific buffers and deferred printk() until exit from
> NMI context.
> 
> But kgdb being a stop-the-world debugger, we don't want to defer printk()
> especially backtrace on corresponding CPUs. So instead switch to normal
> printk() mode in kgdb_cpu_enter() if entry is in NMI context.

So, firstly I should *definitely* take a mea cupla for not shouting
about this at the time (I was on Cc:... twice). Only thing I can say
confidently is that the test suite didn't yell about this and so I
didn't look at this as closely as I should have done (and that it
didn't yell is mostly because I'm still building out the test suite
coverage).

Anyhow...

This feels a little like we are smearing the printk() interception logic
across the kernel in ways that make things hard to read. If we accepted
this patch we then have, the new NMI interception logic, the old kdb
interception logic and some hacks in the kgdb trap handler to defang the
NMI interception logic and force the kdb logic to kick in.

Wouldn't it be better to migrate kdb interception logic up a couple of
levels so that it continues to function even when we are in nmi printk
mode. That way *all* the printk() interception code would end up in
one place.

Finally some clue description of how to provoke the problem would be
useful... that sort of things helps me to grow the test suite coverage.


Daniel.


> 
> Signed-off-by: Sumit Garg <sumit.garg@linaro.org>
> ---
> 
> Similar change was posted earlier specific to arm64 here [1]. But after
> discussions it emerged out that this broken handling of printk() in NMI
> context should be a common problem that is relevant to other archs as well.
> So fix this handling in kgdb_cpu_enter() as there can be multiple entry
> points to kgdb in NMI context.
> 
> [1] https://lkml.org/lkml/2020/4/24/328
> 
>  kernel/debug/debug_core.c | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
> 
> diff --git a/kernel/debug/debug_core.c b/kernel/debug/debug_core.c
> index 2b7c9b6..ab2933f 100644
> --- a/kernel/debug/debug_core.c
> +++ b/kernel/debug/debug_core.c
> @@ -567,6 +567,15 @@ static int kgdb_cpu_enter(struct kgdb_state *ks, struct pt_regs *regs,
>  	kgdb_info[ks->cpu].enter_kgdb++;
>  	kgdb_info[ks->cpu].exception_state |= exception_state;
>  
> +	/*
> +	 * kgdb entry in NMI context defaults to use safe NMI printk() which
> +	 * involves CPU specific buffers and deferred printk() until exit from
> +	 * NMI context. But kgdb being a stop-the-world debugger, we don't want
> +	 * to defer printk(). So instead switch to normal printk() mode here.
> +	 */
> +	if (in_nmi())
> +		printk_nmi_exit();
> +
>  	if (exception_state == DCPU_WANT_MASTER)
>  		atomic_inc(&masters_in_kgdb);
>  	else
> @@ -635,6 +644,8 @@ static int kgdb_cpu_enter(struct kgdb_state *ks, struct pt_regs *regs,
>  			atomic_dec(&slaves_in_kgdb);
>  			dbg_touch_watchdogs();
>  			local_irq_restore(flags);
> +			if (in_nmi())
> +				printk_nmi_enter();
>  			return 0;
>  		}
>  		cpu_relax();
> @@ -772,6 +783,8 @@ static int kgdb_cpu_enter(struct kgdb_state *ks, struct pt_regs *regs,
>  	raw_spin_unlock(&dbg_master_lock);
>  	dbg_touch_watchdogs();
>  	local_irq_restore(flags);
> +	if (in_nmi())
> +		printk_nmi_enter();
>  
>  	return kgdb_info[cpu].ret_state;
>  }
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] kgdb: Fix broken handling of printk() in NMI context
  2020-05-12 14:25 ` Daniel Thompson
@ 2020-05-13 13:34   ` Sumit Garg
  2020-05-14  8:42     ` Petr Mladek
  0 siblings, 1 reply; 22+ messages in thread
From: Sumit Garg @ 2020-05-13 13:34 UTC (permalink / raw)
  To: Daniel Thompson
  Cc: Jason Wessel, Douglas Anderson, kgdb-bugreport,
	Linux Kernel Mailing List, Arnd Bergmann, Andrew Morton,
	Petr Mladek, Peter Zijlstra, Steven Rostedt

On Tue, 12 May 2020 at 19:55, Daniel Thompson
<daniel.thompson@linaro.org> wrote:
>
> On Tue, May 12, 2020 at 02:18:34PM +0530, Sumit Garg wrote:
> > Since commit 42a0bb3f7138 ("printk/nmi: generic solution for safe printk
> > in NMI"), kgdb entry in NMI context defaults to use safe NMI printk()
>
> I didn't see the author on Cc: nor any of the folks whose hands it
> passed through. It would definitely be good to involve them in this
> discussion.
>

Thanks for updating the Cc: list.

>
> > which involves CPU specific buffers and deferred printk() until exit from
> > NMI context.
> >
> > But kgdb being a stop-the-world debugger, we don't want to defer printk()
> > especially backtrace on corresponding CPUs. So instead switch to normal
> > printk() mode in kgdb_cpu_enter() if entry is in NMI context.
>
> So, firstly I should *definitely* take a mea cupla for not shouting
> about this at the time (I was on Cc:... twice). Only thing I can say
> confidently is that the test suite didn't yell about this and so I
> didn't look at this as closely as I should have done (and that it
> didn't yell is mostly because I'm still building out the test suite
> coverage).
>
> Anyhow...
>
> This feels a little like we are smearing the printk() interception logic
> across the kernel in ways that make things hard to read. If we accepted
> this patch we then have, the new NMI interception logic, the old kdb
> interception logic and some hacks in the kgdb trap handler to defang the
> NMI interception logic and force the kdb logic to kick in.
>
> Wouldn't it be better to migrate kdb interception logic up a couple of
> levels so that it continues to function even when we are in nmi printk
> mode. That way *all* the printk() interception code would end up in
> one place.
>

Yes it would be better to have all printk() interception code at one
place. Let me see if I can come up with an integrated logic.

> Finally some clue description of how to provoke the problem would be
> useful... that sort of things helps me to grow the test suite coverage.
>

Sure I will update the description. BTW, this issue can be easily
reproduced via issuing a backtrace (kdb command: "bt or btc") on a CPU
which entered kgdb in NMI context.

-Sumit

>
> Daniel.
>
>
> >
> > Signed-off-by: Sumit Garg <sumit.garg@linaro.org>
> > ---
> >
> > Similar change was posted earlier specific to arm64 here [1]. But after
> > discussions it emerged out that this broken handling of printk() in NMI
> > context should be a common problem that is relevant to other archs as well.
> > So fix this handling in kgdb_cpu_enter() as there can be multiple entry
> > points to kgdb in NMI context.
> >
> > [1] https://lkml.org/lkml/2020/4/24/328
> >
> >  kernel/debug/debug_core.c | 13 +++++++++++++
> >  1 file changed, 13 insertions(+)
> >
> > diff --git a/kernel/debug/debug_core.c b/kernel/debug/debug_core.c
> > index 2b7c9b6..ab2933f 100644
> > --- a/kernel/debug/debug_core.c
> > +++ b/kernel/debug/debug_core.c
> > @@ -567,6 +567,15 @@ static int kgdb_cpu_enter(struct kgdb_state *ks, struct pt_regs *regs,
> >       kgdb_info[ks->cpu].enter_kgdb++;
> >       kgdb_info[ks->cpu].exception_state |= exception_state;
> >
> > +     /*
> > +      * kgdb entry in NMI context defaults to use safe NMI printk() which
> > +      * involves CPU specific buffers and deferred printk() until exit from
> > +      * NMI context. But kgdb being a stop-the-world debugger, we don't want
> > +      * to defer printk(). So instead switch to normal printk() mode here.
> > +      */
> > +     if (in_nmi())
> > +             printk_nmi_exit();
> > +
> >       if (exception_state == DCPU_WANT_MASTER)
> >               atomic_inc(&masters_in_kgdb);
> >       else
> > @@ -635,6 +644,8 @@ static int kgdb_cpu_enter(struct kgdb_state *ks, struct pt_regs *regs,
> >                       atomic_dec(&slaves_in_kgdb);
> >                       dbg_touch_watchdogs();
> >                       local_irq_restore(flags);
> > +                     if (in_nmi())
> > +                             printk_nmi_enter();
> >                       return 0;
> >               }
> >               cpu_relax();
> > @@ -772,6 +783,8 @@ static int kgdb_cpu_enter(struct kgdb_state *ks, struct pt_regs *regs,
> >       raw_spin_unlock(&dbg_master_lock);
> >       dbg_touch_watchdogs();
> >       local_irq_restore(flags);
> > +     if (in_nmi())
> > +             printk_nmi_enter();
> >
> >       return kgdb_info[cpu].ret_state;
> >  }
> > --
> > 2.7.4
> >

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] kgdb: Fix broken handling of printk() in NMI context
  2020-05-13 13:34   ` Sumit Garg
@ 2020-05-14  8:42     ` Petr Mladek
  2020-05-15  5:46       ` Sumit Garg
  0 siblings, 1 reply; 22+ messages in thread
From: Petr Mladek @ 2020-05-14  8:42 UTC (permalink / raw)
  To: Sumit Garg
  Cc: Daniel Thompson, Jason Wessel, Douglas Anderson, kgdb-bugreport,
	Linux Kernel Mailing List, Arnd Bergmann, Andrew Morton,
	Peter Zijlstra, Steven Rostedt

On Wed 2020-05-13 19:04:48, Sumit Garg wrote:
> On Tue, 12 May 2020 at 19:55, Daniel Thompson
> <daniel.thompson@linaro.org> wrote:
> >
> > On Tue, May 12, 2020 at 02:18:34PM +0530, Sumit Garg wrote:
> > > Since commit 42a0bb3f7138 ("printk/nmi: generic solution for safe printk
> > > in NMI"), kgdb entry in NMI context defaults to use safe NMI printk()
> >
> > I didn't see the author on Cc: nor any of the folks whose hands it
> > passed through. It would definitely be good to involve them in this
> > discussion.
> >
> 
> Thanks for updating the Cc: list.
> 
> >
> > > which involves CPU specific buffers and deferred printk() until exit from
> > > NMI context.
> > >
> > > But kgdb being a stop-the-world debugger, we don't want to defer printk()
> > > especially backtrace on corresponding CPUs. So instead switch to normal
> > > printk() mode in kgdb_cpu_enter() if entry is in NMI context.
> >
> > So, firstly I should *definitely* take a mea cupla for not shouting
> > about this at the time (I was on Cc:... twice). Only thing I can say
> > confidently is that the test suite didn't yell about this and so I
> > didn't look at this as closely as I should have done (and that it
> > didn't yell is mostly because I'm still building out the test suite
> > coverage).
> >
> > Anyhow...
> >
> > This feels a little like we are smearing the printk() interception logic
> > across the kernel in ways that make things hard to read. If we accepted
> > this patch we then have, the new NMI interception logic, the old kdb
> > interception logic and some hacks in the kgdb trap handler to defang the
> > NMI interception logic and force the kdb logic to kick in.
> >
> > Wouldn't it be better to migrate kdb interception logic up a couple of
> > levels so that it continues to function even when we are in nmi printk
> > mode. That way *all* the printk() interception code would end up in
> > one place.
> >
> 
> Yes it would be better to have all printk() interception code at one
> place. Let me see if I can come up with an integrated logic.

It might be enough to move the kdb_check from vprintk_default()
to vprintk_func().

I have never used kdb. I did not know that it was able to stop
kernel in any context.

Would this work? It is only compile tested!

From 14ae6c9f0cbd1479cb898c864c7ab46e20f3cf6f Mon Sep 17 00:00:00 2001
From: Petr Mladek <pmladek@suse.com>
Date: Thu, 14 May 2020 10:37:44 +0200
Subject: [PATCH] printk/kdb: Redirect printk messages into kdb in any context

kdb is able to stop kernel even in NMI context where printk() is redirected
to the printk_safe() lockless variant. Move the check and redirect to kdb
even in this case.

Signed-off-by: Petr Mladek <pmladek@suse.com>
---
 kernel/printk/printk.c      | 14 +-------------
 kernel/printk/printk_safe.c |  8 ++++++++
 2 files changed, 9 insertions(+), 13 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 9a9b6156270b..63a1aa377cd9 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -35,7 +35,6 @@
 #include <linux/memblock.h>
 #include <linux/syscalls.h>
 #include <linux/crash_core.h>
-#include <linux/kdb.h>
 #include <linux/ratelimit.h>
 #include <linux/kmsg_dump.h>
 #include <linux/syslog.h>
@@ -2036,18 +2035,7 @@ EXPORT_SYMBOL(vprintk);
 
 int vprintk_default(const char *fmt, va_list args)
 {
-	int r;
-
-#ifdef CONFIG_KGDB_KDB
-	/* Allow to pass printk() to kdb but avoid a recursion. */
-	if (unlikely(kdb_trap_printk && kdb_printf_cpu < 0)) {
-		r = vkdb_printf(KDB_MSGSRC_PRINTK, fmt, args);
-		return r;
-	}
-#endif
-	r = vprintk_emit(0, LOGLEVEL_DEFAULT, NULL, 0, fmt, args);
-
-	return r;
+	return vprintk_emit(0, LOGLEVEL_DEFAULT, NULL, 0, fmt, args);
 }
 EXPORT_SYMBOL_GPL(vprintk_default);
 
diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c
index d9a659a686f3..81734497c625 100644
--- a/kernel/printk/printk_safe.c
+++ b/kernel/printk/printk_safe.c
@@ -6,6 +6,7 @@
 #include <linux/preempt.h>
 #include <linux/spinlock.h>
 #include <linux/debug_locks.h>
+#include <linux/kdb.h>
 #include <linux/smp.h>
 #include <linux/cpumask.h>
 #include <linux/irq_work.h>
@@ -359,6 +360,13 @@ void __printk_safe_exit(void)
 
 __printf(1, 0) int vprintk_func(const char *fmt, va_list args)
 {
+#ifdef CONFIG_KGDB_KDB
+	/* Allow to pass printk() to kdb but avoid a recursion. */
+	if (unlikely(kdb_trap_printk && kdb_printf_cpu < 0)) {
+		return vkdb_printf(KDB_MSGSRC_PRINTK, fmt, args);
+	}
+#endif
+
 	/*
 	 * Try to use the main logbuf even in NMI. But avoid calling console
 	 * drivers that might have their own locks.
-- 
2.26.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH] kgdb: Fix broken handling of printk() in NMI context
  2020-05-14  8:42     ` Petr Mladek
@ 2020-05-15  5:46       ` Sumit Garg
  2020-05-15  8:50         ` [PATCH] printk/kdb: Redirect printk messages into kdb in any context Petr Mladek
  0 siblings, 1 reply; 22+ messages in thread
From: Sumit Garg @ 2020-05-15  5:46 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Daniel Thompson, Jason Wessel, Douglas Anderson, kgdb-bugreport,
	Linux Kernel Mailing List, Arnd Bergmann, Andrew Morton,
	Peter Zijlstra, Steven Rostedt

Hi Petr,

On Thu, 14 May 2020 at 14:13, Petr Mladek <pmladek@suse.com> wrote:
>
> On Wed 2020-05-13 19:04:48, Sumit Garg wrote:
> > On Tue, 12 May 2020 at 19:55, Daniel Thompson
> > <daniel.thompson@linaro.org> wrote:
> > >
> > > On Tue, May 12, 2020 at 02:18:34PM +0530, Sumit Garg wrote:
> > > > Since commit 42a0bb3f7138 ("printk/nmi: generic solution for safe printk
> > > > in NMI"), kgdb entry in NMI context defaults to use safe NMI printk()
> > >
> > > I didn't see the author on Cc: nor any of the folks whose hands it
> > > passed through. It would definitely be good to involve them in this
> > > discussion.
> > >
> >
> > Thanks for updating the Cc: list.
> >
> > >
> > > > which involves CPU specific buffers and deferred printk() until exit from
> > > > NMI context.
> > > >
> > > > But kgdb being a stop-the-world debugger, we don't want to defer printk()
> > > > especially backtrace on corresponding CPUs. So instead switch to normal
> > > > printk() mode in kgdb_cpu_enter() if entry is in NMI context.
> > >
> > > So, firstly I should *definitely* take a mea cupla for not shouting
> > > about this at the time (I was on Cc:... twice). Only thing I can say
> > > confidently is that the test suite didn't yell about this and so I
> > > didn't look at this as closely as I should have done (and that it
> > > didn't yell is mostly because I'm still building out the test suite
> > > coverage).
> > >
> > > Anyhow...
> > >
> > > This feels a little like we are smearing the printk() interception logic
> > > across the kernel in ways that make things hard to read. If we accepted
> > > this patch we then have, the new NMI interception logic, the old kdb
> > > interception logic and some hacks in the kgdb trap handler to defang the
> > > NMI interception logic and force the kdb logic to kick in.
> > >
> > > Wouldn't it be better to migrate kdb interception logic up a couple of
> > > levels so that it continues to function even when we are in nmi printk
> > > mode. That way *all* the printk() interception code would end up in
> > > one place.
> > >
> >
> > Yes it would be better to have all printk() interception code at one
> > place. Let me see if I can come up with an integrated logic.
>
> It might be enough to move the kdb_check from vprintk_default()
> to vprintk_func().
>
> I have never used kdb. I did not know that it was able to stop
> kernel in any context.
>
> Would this work? It is only compile tested!

Thanks for this fix patch. It did resolve the issue and now I am able
to see the backtrace in the kdb shell. So we can go ahead with your
patch and drop mine.

>
> From 14ae6c9f0cbd1479cb898c864c7ab46e20f3cf6f Mon Sep 17 00:00:00 2001
> From: Petr Mladek <pmladek@suse.com>
> Date: Thu, 14 May 2020 10:37:44 +0200
> Subject: [PATCH] printk/kdb: Redirect printk messages into kdb in any context
>
> kdb is able to stop kernel even in NMI context where printk() is redirected
> to the printk_safe() lockless variant. Move the check and redirect to kdb
> even in this case.
>
> Signed-off-by: Petr Mladek <pmladek@suse.com>
> ---
>  kernel/printk/printk.c      | 14 +-------------
>  kernel/printk/printk_safe.c |  8 ++++++++
>  2 files changed, 9 insertions(+), 13 deletions(-)
>

Reported-by: Sumit Garg <sumit.garg@linaro.org>
Tested-by: Sumit Garg <sumit.garg@linaro.org>

-Sumit

> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 9a9b6156270b..63a1aa377cd9 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -35,7 +35,6 @@
>  #include <linux/memblock.h>
>  #include <linux/syscalls.h>
>  #include <linux/crash_core.h>
> -#include <linux/kdb.h>
>  #include <linux/ratelimit.h>
>  #include <linux/kmsg_dump.h>
>  #include <linux/syslog.h>
> @@ -2036,18 +2035,7 @@ EXPORT_SYMBOL(vprintk);
>
>  int vprintk_default(const char *fmt, va_list args)
>  {
> -       int r;
> -
> -#ifdef CONFIG_KGDB_KDB
> -       /* Allow to pass printk() to kdb but avoid a recursion. */
> -       if (unlikely(kdb_trap_printk && kdb_printf_cpu < 0)) {
> -               r = vkdb_printf(KDB_MSGSRC_PRINTK, fmt, args);
> -               return r;
> -       }
> -#endif
> -       r = vprintk_emit(0, LOGLEVEL_DEFAULT, NULL, 0, fmt, args);
> -
> -       return r;
> +       return vprintk_emit(0, LOGLEVEL_DEFAULT, NULL, 0, fmt, args);
>  }
>  EXPORT_SYMBOL_GPL(vprintk_default);
>
> diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c
> index d9a659a686f3..81734497c625 100644
> --- a/kernel/printk/printk_safe.c
> +++ b/kernel/printk/printk_safe.c
> @@ -6,6 +6,7 @@
>  #include <linux/preempt.h>
>  #include <linux/spinlock.h>
>  #include <linux/debug_locks.h>
> +#include <linux/kdb.h>
>  #include <linux/smp.h>
>  #include <linux/cpumask.h>
>  #include <linux/irq_work.h>
> @@ -359,6 +360,13 @@ void __printk_safe_exit(void)
>
>  __printf(1, 0) int vprintk_func(const char *fmt, va_list args)
>  {
> +#ifdef CONFIG_KGDB_KDB
> +       /* Allow to pass printk() to kdb but avoid a recursion. */
> +       if (unlikely(kdb_trap_printk && kdb_printf_cpu < 0)) {
> +               return vkdb_printf(KDB_MSGSRC_PRINTK, fmt, args);
> +       }
> +#endif
> +
>         /*
>          * Try to use the main logbuf even in NMI. But avoid calling console
>          * drivers that might have their own locks.
> --
> 2.26.1
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH] printk/kdb: Redirect printk messages into kdb in any context
  2020-05-15  5:46       ` Sumit Garg
@ 2020-05-15  8:50         ` Petr Mladek
  2020-05-15 10:33           ` Sergey Senozhatsky
  0 siblings, 1 reply; 22+ messages in thread
From: Petr Mladek @ 2020-05-15  8:50 UTC (permalink / raw)
  To: Sumit Garg
  Cc: Daniel Thompson, Jason Wessel, Douglas Anderson, kgdb-bugreport,
	Linux Kernel Mailing List, Arnd Bergmann, Andrew Morton,
	Peter Zijlstra, Steven Rostedt, Sergey Senozhatsky, John Ogness,
	Petr Mladek

kdb is able to stop kernel even in NMI context where printk() is redirected
to the printk_safe() lockless variant. Move the check and redirect to kdb
even in this case.

Reported-by: Sumit Garg <sumit.garg@linaro.org>
Tested-by: Sumit Garg <sumit.garg@linaro.org>
Signed-off-by: Petr Mladek <pmladek@suse.com>
---
Sending as proper patch for review.
Adding Sergey into CC and John into CC.

kernel/printk/printk.c      | 14 +-------------
 kernel/printk/printk_safe.c |  7 +++++++
 2 files changed, 8 insertions(+), 13 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 9a9b6156270b..63a1aa377cd9 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -35,7 +35,6 @@
 #include <linux/memblock.h>
 #include <linux/syscalls.h>
 #include <linux/crash_core.h>
-#include <linux/kdb.h>
 #include <linux/ratelimit.h>
 #include <linux/kmsg_dump.h>
 #include <linux/syslog.h>
@@ -2036,18 +2035,7 @@ EXPORT_SYMBOL(vprintk);
 
 int vprintk_default(const char *fmt, va_list args)
 {
-	int r;
-
-#ifdef CONFIG_KGDB_KDB
-	/* Allow to pass printk() to kdb but avoid a recursion. */
-	if (unlikely(kdb_trap_printk && kdb_printf_cpu < 0)) {
-		r = vkdb_printf(KDB_MSGSRC_PRINTK, fmt, args);
-		return r;
-	}
-#endif
-	r = vprintk_emit(0, LOGLEVEL_DEFAULT, NULL, 0, fmt, args);
-
-	return r;
+	return vprintk_emit(0, LOGLEVEL_DEFAULT, NULL, 0, fmt, args);
 }
 EXPORT_SYMBOL_GPL(vprintk_default);
 
diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c
index d9a659a686f3..7ccb821d0bfe 100644
--- a/kernel/printk/printk_safe.c
+++ b/kernel/printk/printk_safe.c
@@ -6,6 +6,7 @@
 #include <linux/preempt.h>
 #include <linux/spinlock.h>
 #include <linux/debug_locks.h>
+#include <linux/kdb.h>
 #include <linux/smp.h>
 #include <linux/cpumask.h>
 #include <linux/irq_work.h>
@@ -359,6 +360,12 @@ void __printk_safe_exit(void)
 
 __printf(1, 0) int vprintk_func(const char *fmt, va_list args)
 {
+#ifdef CONFIG_KGDB_KDB
+	/* Allow to pass printk() to kdb but avoid a recursion. */
+	if (unlikely(kdb_trap_printk && kdb_printf_cpu < 0))
+		return vkdb_printf(KDB_MSGSRC_PRINTK, fmt, args);
+#endif
+
 	/*
 	 * Try to use the main logbuf even in NMI. But avoid calling console
 	 * drivers that might have their own locks.
-- 
2.26.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH] printk/kdb: Redirect printk messages into kdb in any context
  2020-05-15  8:50         ` [PATCH] printk/kdb: Redirect printk messages into kdb in any context Petr Mladek
@ 2020-05-15 10:33           ` Sergey Senozhatsky
  2020-05-15 12:02             ` Sumit Garg
  2020-05-15 13:48             ` Daniel Thompson
  0 siblings, 2 replies; 22+ messages in thread
From: Sergey Senozhatsky @ 2020-05-15 10:33 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sumit Garg, Daniel Thompson, Jason Wessel, Douglas Anderson,
	kgdb-bugreport, Linux Kernel Mailing List, Arnd Bergmann,
	Andrew Morton, Peter Zijlstra, Steven Rostedt,
	Sergey Senozhatsky, John Ogness

On (20/05/15 10:50), Petr Mladek wrote:
> kdb is able to stop kernel even in NMI context where printk() is redirected
> to the printk_safe() lockless variant. Move the check and redirect to kdb
> even in this case.

Can I please have some context what problem does this solve?
I can see that vkdb_printf() calls into console drivers:

	for_each_console(c) {
		c->write(c, cp, retlen - (cp - kdb_buffer));
		touch_nmi_watchdog();
	}

Is this guaranteed that we never execute this path from NMI?
If so, can this please be added to the commit message? A more
detailed commit message will help a lot.

	-ss

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] printk/kdb: Redirect printk messages into kdb in any context
  2020-05-15 10:33           ` Sergey Senozhatsky
@ 2020-05-15 12:02             ` Sumit Garg
  2020-05-15 16:36               ` Sergey Senozhatsky
  2020-05-15 13:48             ` Daniel Thompson
  1 sibling, 1 reply; 22+ messages in thread
From: Sumit Garg @ 2020-05-15 12:02 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Petr Mladek, Daniel Thompson, Jason Wessel, Douglas Anderson,
	kgdb-bugreport, Linux Kernel Mailing List, Arnd Bergmann,
	Andrew Morton, Peter Zijlstra, Steven Rostedt, John Ogness

Hi Sergey,

On Fri, 15 May 2020 at 16:03, Sergey Senozhatsky
<sergey.senozhatsky@gmail.com> wrote:
>
> On (20/05/15 10:50), Petr Mladek wrote:
> > kdb is able to stop kernel even in NMI context where printk() is redirected
> > to the printk_safe() lockless variant. Move the check and redirect to kdb
> > even in this case.
>
> Can I please have some context what problem does this solve?

You can find the problem description here [1] which leads to this fix.

> I can see that vkdb_printf() calls into console drivers:
>
>         for_each_console(c) {
>                 c->write(c, cp, retlen - (cp - kdb_buffer));
>                 touch_nmi_watchdog();
>         }
>
> Is this guaranteed that we never execute this path from NMI?

Yes the above code could run in NMI context but it will only run after
we stop the kernel (drop into debugger) and on a single CPU at a time.
AFAIK, we drop into the debugger either via sysrq or during
oops_in_progress. So I think it should be a lockless entry into the
console driver for write operations.

TBH, it's very much possible that I may miss some aspect. So please
feel free to correct me.

[1] https://lkml.org/lkml/2020/5/12/213

-Sumit

> If so, can this please be added to the commit message? A more
> detailed commit message will help a lot.
>
>         -ss

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] printk/kdb: Redirect printk messages into kdb in any context
  2020-05-15 10:33           ` Sergey Senozhatsky
  2020-05-15 12:02             ` Sumit Garg
@ 2020-05-15 13:48             ` Daniel Thompson
  2020-05-15 16:24               ` Sergey Senozhatsky
  2020-05-18  9:21               ` Petr Mladek
  1 sibling, 2 replies; 22+ messages in thread
From: Daniel Thompson @ 2020-05-15 13:48 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Petr Mladek, Sumit Garg, Jason Wessel, Douglas Anderson,
	kgdb-bugreport, Linux Kernel Mailing List, Arnd Bergmann,
	Andrew Morton, Peter Zijlstra, Steven Rostedt, John Ogness

On Fri, May 15, 2020 at 07:33:08PM +0900, Sergey Senozhatsky wrote:
> On (20/05/15 10:50), Petr Mladek wrote:
> > kdb is able to stop kernel even in NMI context where printk() is redirected
> > to the printk_safe() lockless variant. Move the check and redirect to kdb
> > even in this case.
> 
> Can I please have some context what problem does this solve?
> I can see that vkdb_printf() calls into console drivers:
> 
> 	for_each_console(c) {
> 		c->write(c, cp, retlen - (cp - kdb_buffer));
> 		touch_nmi_watchdog();
> 	}
> 
> Is this guaranteed that we never execute this path from NMI?

Absolutely not.

The execution context for kdb is pretty much unique... we are running a
debug mode with all CPUs parked in a holding loop with interrupts
disabled. One CPU is at an unknown exception state and the others are
either handling an IRQ or NMI depending on architecture[1].

However there are a number of factors that IMHO weigh in favour of
allowing kdb to intercept here.

1. kgdb/kdb are designed to work from NMI, modulo the bugs that are
   undoubtedly present.

2. A synchronous breakpoint (including an implicit breakpoint-on-oops)
   from any code that executes with irqs disabled will exhibit most of
   the same problems as an NMI but without waking up all the NMI logic.

3. kdb_trap_printk is only set for *very* narrow time intervals by the
   debug master (the single CPU in the system that is *not* in a
   holding loop). Thus in all cases the system has already successfully
   executed kdb_printf() several times before we ever call the printk()
   interception code.

   Or put another way, even if we did tickle a bug speculated about in
   #1, it won't be the call to printk() that triggers it; we'd never
   get that far!


> If so, can this please be added to the commit message? A more
> detailed commit message will help a lot.

I suspect Petr might prefer any future flames about kdb_printf() to be
pointed at me rather than him ;-) so if adding anything to the commit
message then I'd suggest it be based on the reasoning in #3 above.


Daniel.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] printk/kdb: Redirect printk messages into kdb in any context
  2020-05-15 13:48             ` Daniel Thompson
@ 2020-05-15 16:24               ` Sergey Senozhatsky
  2020-05-18  9:21               ` Petr Mladek
  1 sibling, 0 replies; 22+ messages in thread
From: Sergey Senozhatsky @ 2020-05-15 16:24 UTC (permalink / raw)
  To: Daniel Thompson
  Cc: Sergey Senozhatsky, Petr Mladek, Sumit Garg, Jason Wessel,
	Douglas Anderson, kgdb-bugreport, Linux Kernel Mailing List,
	Arnd Bergmann, Andrew Morton, Peter Zijlstra, Steven Rostedt,
	John Ogness

On (20/05/15 14:48), Daniel Thompson wrote:
> On Fri, May 15, 2020 at 07:33:08PM +0900, Sergey Senozhatsky wrote:
> > On (20/05/15 10:50), Petr Mladek wrote:

[..]

> > Is this guaranteed that we never execute this path from NMI?
> 
> Absolutely not.
> 
> The execution context for kdb is pretty much unique...

OK, that was what I expected.

> we are running a debug mode with all CPUs parked in a holding loop with
> interrupts disabled. One CPU is at an unknown exception state and the
> others are either handling an IRQ or NMI depending on architecture[1].

Can a CPU be parked while holding the console driver port lock?

Hmm, a side note - this also means that we cannot handle it from
poll-ing console drivers and need to switch to direct console writes.

> However there are a number of factors that IMHO weigh in favour of
> allowing kdb to intercept here.
> 
> 1. kgdb/kdb are designed to work from NMI, modulo the bugs that are
>    undoubtedly present.
> 
> 2. A synchronous breakpoint (including an implicit breakpoint-on-oops)
>    from any code that executes with irqs disabled will exhibit most of
>    the same problems as an NMI but without waking up all the NMI logic.
> 
> 3. kdb_trap_printk is only set for *very* narrow time intervals by the
>    debug master (the single CPU in the system that is *not* in a
>    holding loop). Thus in all cases the system has already successfully
>    executed kdb_printf() several times before we ever call the printk()
>    interception code.
> 
>    Or put another way, even if we did tickle a bug speculated about in
>    #1, it won't be the call to printk() that triggers it; we'd never
>    get that far!

OK. I would appreciate a more detailed commit message:
- what do we fix, and what risks do we take. Just for the record.

+ a small nit: looking at for_each_console() loop -- not all consoles
can be invoked at any time and not all consoles are enabled at any time.
You _probably_ might want to do what printk does in call_console_drivers()
loop. printk also had problems with console callbacks being placed in
sections that get discarded, but that's way too niche.

	-ss

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] printk/kdb: Redirect printk messages into kdb in any context
  2020-05-15 12:02             ` Sumit Garg
@ 2020-05-15 16:36               ` Sergey Senozhatsky
  2020-05-15 16:52                 ` Doug Anderson
  2020-06-10 16:41                 ` Daniel Thompson
  0 siblings, 2 replies; 22+ messages in thread
From: Sergey Senozhatsky @ 2020-05-15 16:36 UTC (permalink / raw)
  To: Sumit Garg
  Cc: Sergey Senozhatsky, Petr Mladek, Daniel Thompson, Jason Wessel,
	Douglas Anderson, kgdb-bugreport, Linux Kernel Mailing List,
	Arnd Bergmann, Andrew Morton, Peter Zijlstra, Steven Rostedt,
	John Ogness

On (20/05/15 17:32), Sumit Garg wrote:
> > Can I please have some context what problem does this solve?
> 
> You can find the problem description here [1] which leads to this fix.

[..]

> [1] https://lkml.org/lkml/2020/5/12/213

Thanks for the link. I'm slightly surprised it took so many years
to notice the addition of printk_nmi/printk_safe :)

	-ss

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] printk/kdb: Redirect printk messages into kdb in any context
  2020-05-15 16:36               ` Sergey Senozhatsky
@ 2020-05-15 16:52                 ` Doug Anderson
  2020-05-18  6:19                   ` Sumit Garg
  2020-06-10 16:41                 ` Daniel Thompson
  1 sibling, 1 reply; 22+ messages in thread
From: Doug Anderson @ 2020-05-15 16:52 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Sumit Garg, Petr Mladek, Daniel Thompson, Jason Wessel,
	kgdb-bugreport, Linux Kernel Mailing List, Arnd Bergmann,
	Andrew Morton, Peter Zijlstra, Steven Rostedt, John Ogness

Hi,

On Fri, May 15, 2020 at 9:36 AM Sergey Senozhatsky
<sergey.senozhatsky@gmail.com> wrote:
>
> On (20/05/15 17:32), Sumit Garg wrote:
> > > Can I please have some context what problem does this solve?
> >
> > You can find the problem description here [1] which leads to this fix.
>
> [..]
>
> > [1] https://lkml.org/lkml/2020/5/12/213
>
> Thanks for the link. I'm slightly surprised it took so many years
> to notice the addition of printk_nmi/printk_safe :)

I haven't looked at all the details, but IIUC we don't normally enter
kgdb on the primary CPU through a NMI context, but the secondary ones
(on x86) always do.  Most things are run on the primary CPU and I
think it's relatively unlikely for people to change the primary CPU
(though it is possible).

Probably things got worse when I changed the way "btc" worked to make
it common between all architectures.  See commit 9ef50a686b53
("UPSTREAM: kdb: Fix stack crawling on 'running' CPUs that aren't the
master").  Though theoretically someone could have changed masters and
reproduced the problem with a simple "bt" before my patch, now a
relatively normal command "btc" would tickle the problem.  I didn't
notice it because I work almost totally on arm/arm64 machines and they
don't have NMI (yet).

In general I've always wondered about why (historically) kgdb bugs
have sometimes gone unnoticed for a period of time.  That does seem to
be changing, though, and I've seen a few longstanding bugs getting
fixed recently.  :-)


-Doug

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] printk/kdb: Redirect printk messages into kdb in any context
  2020-05-15 16:52                 ` Doug Anderson
@ 2020-05-18  6:19                   ` Sumit Garg
  0 siblings, 0 replies; 22+ messages in thread
From: Sumit Garg @ 2020-05-18  6:19 UTC (permalink / raw)
  To: Doug Anderson, Sergey Senozhatsky
  Cc: Petr Mladek, Daniel Thompson, Jason Wessel, kgdb-bugreport,
	Linux Kernel Mailing List, Arnd Bergmann, Andrew Morton,
	Peter Zijlstra, Steven Rostedt, John Ogness

On Fri, 15 May 2020 at 22:22, Doug Anderson <dianders@chromium.org> wrote:
>
> Hi,
>
> On Fri, May 15, 2020 at 9:36 AM Sergey Senozhatsky
> <sergey.senozhatsky@gmail.com> wrote:
> >
> > On (20/05/15 17:32), Sumit Garg wrote:
> > > > Can I please have some context what problem does this solve?
> > >
> > > You can find the problem description here [1] which leads to this fix.
> >
> > [..]
> >
> > > [1] https://lkml.org/lkml/2020/5/12/213
> >
> > Thanks for the link. I'm slightly surprised it took so many years
> > to notice the addition of printk_nmi/printk_safe :)

It's been noticed now since I started playing with NMIs support on
arm64 for kgdb. And that's been only possible with the advent of
pseudo NMIs on arm64 since Linux 5.3 release.

>
> I haven't looked at all the details, but IIUC we don't normally enter
> kgdb on the primary CPU through a NMI context, but the secondary ones
> (on x86) always do.

There's a case for the primary CPU to enter kgdb in NMI context too.
Consider hard-lockup detection based kernel panic. I guess that's
already been working on x86 and should be able to work on arm64 after
this patch [1].

[1] http://lists.infradead.org/pipermail/linux-arm-kernel/2020-May/732227.html

-Sumit

>  Most things are run on the primary CPU and I
> think it's relatively unlikely for people to change the primary CPU
> (though it is possible).
>
> Probably things got worse when I changed the way "btc" worked to make
> it common between all architectures.  See commit 9ef50a686b53
> ("UPSTREAM: kdb: Fix stack crawling on 'running' CPUs that aren't the
> master").  Though theoretically someone could have changed masters and
> reproduced the problem with a simple "bt" before my patch, now a
> relatively normal command "btc" would tickle the problem.  I didn't
> notice it because I work almost totally on arm/arm64 machines and they
> don't have NMI (yet).
>
> In general I've always wondered about why (historically) kgdb bugs
> have sometimes gone unnoticed for a period of time.  That does seem to
> be changing, though, and I've seen a few longstanding bugs getting
> fixed recently.  :-)
>
>
> -Doug

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] printk/kdb: Redirect printk messages into kdb in any context
  2020-05-15 13:48             ` Daniel Thompson
  2020-05-15 16:24               ` Sergey Senozhatsky
@ 2020-05-18  9:21               ` Petr Mladek
  2020-05-20  4:21                 ` Sergey Senozhatsky
  1 sibling, 1 reply; 22+ messages in thread
From: Petr Mladek @ 2020-05-18  9:21 UTC (permalink / raw)
  To: Daniel Thompson
  Cc: Sergey Senozhatsky, Sumit Garg, Jason Wessel, Douglas Anderson,
	kgdb-bugreport, Linux Kernel Mailing List, Arnd Bergmann,
	Andrew Morton, Peter Zijlstra, Steven Rostedt, John Ogness

On Fri 2020-05-15 14:48:06, Daniel Thompson wrote:
> On Fri, May 15, 2020 at 07:33:08PM +0900, Sergey Senozhatsky wrote:
> > On (20/05/15 10:50), Petr Mladek wrote:
> > > kdb is able to stop kernel even in NMI context where printk() is redirected
> > > to the printk_safe() lockless variant. Move the check and redirect to kdb
> > > even in this case.
> > 
> > Can I please have some context what problem does this solve?
> > I can see that vkdb_printf() calls into console drivers:
> > 
> > 	for_each_console(c) {
> > 		c->write(c, cp, retlen - (cp - kdb_buffer));
> > 		touch_nmi_watchdog();
> > 	}
> > 
> > Is this guaranteed that we never execute this path from NMI?

Good question!

> Absolutely not.
> 
> The execution context for kdb is pretty much unique... we are running a
> debug mode with all CPUs parked in a holding loop with interrupts
> disabled. One CPU is at an unknown exception state and the others are
> either handling an IRQ or NMI depending on architecture[1].

This is similar to the situation in panic() when other CPUs are
stopped. It is more safe when the CPUs are stopped using IRQ.
There is higher danger of a deadlock when NMI is used.

bust_spinlock() is used in panic() to increase the chance to go over
the deadlock and actually see the messages. It is not enough when
more locks are used by the console (VT/TTY is good example). And
it is not guaranteed that the console will still work after
the hack is disabled by bust_spinlocks(0).


> However there are a number of factors that IMHO weigh in favour of
> allowing kdb to intercept here.
> 
> 1. kgdb/kdb are designed to work from NMI, modulo the bugs that are
>    undoubtedly present.

There is definitely a risk of deadlock when console drivers are called
by KDB. There are plans to create some lockless console drivers to
handle panic(). It might be usable in KDB as well.

The question is what are the expectations of KDB users. How often does
it happen that KDB does not work or that the system does not longer
work when continue is called in KDB?


> 2. A synchronous breakpoint (including an implicit breakpoint-on-oops)
>    from any code that executes with irqs disabled will exhibit most of
>    the same problems as an NMI but without waking up all the NMI logic.

Makes sense.


> 3. kdb_trap_printk is only set for *very* narrow time intervals by the
>    debug master (the single CPU in the system that is *not* in a
>    holding loop). Thus in all cases the system has already successfully
>    executed kdb_printf() several times before we ever call the printk()
>    interception code.
>
>    Or put another way, even if we did tickle a bug speculated about in
>    #1, it won't be the call to printk() that triggers it; we'd never
>    get that far!

Good point. I would say that this patch will not make the situation
worse. The code is called when KDB already uses consoles. It is just
a trick how to use existing code to print even more messages in
KDB context.

> 
> > If so, can this please be added to the commit message? A more
> > detailed commit message will help a lot.

What about?

"KDB has to get messages on consoles even when the system is stopped.
It uses kdb_printf() internally and calls console drivers on its own.

It uses a hack to reuse an existing code. It sets "kdb_trap_printk"
global variable to redirect even the normal printk() into the
kdb_printf() variant.

The variable "kdb_trap_printk" is checked in printk_default() and
it is ignored when printk is redirected to printk_safe in NMI context.
Solve this by moving the check into printk_func().

It is obvious that it is not fully safe. But it does not make things
worse. The console drivers are already called in this context by
kdb_printf() direct calls."

> I suspect Petr might prefer any future flames about kdb_printf() to be
> pointed at me rather than him ;-) so if adding anything to the commit
> message then I'd suggest it be based on the reasoning in #3 above.

This is fair :-)

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] printk/kdb: Redirect printk messages into kdb in any context
  2020-05-18  9:21               ` Petr Mladek
@ 2020-05-20  4:21                 ` Sergey Senozhatsky
  2020-05-20  9:35                   ` Daniel Thompson
  0 siblings, 1 reply; 22+ messages in thread
From: Sergey Senozhatsky @ 2020-05-20  4:21 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Daniel Thompson, Sergey Senozhatsky, Sumit Garg, Jason Wessel,
	Douglas Anderson, kgdb-bugreport, Linux Kernel Mailing List,
	Arnd Bergmann, Andrew Morton, Peter Zijlstra, Steven Rostedt,
	John Ogness

On (20/05/18 11:21), Petr Mladek wrote:
[..]
> > > Is this guaranteed that we never execute this path from NMI?
> 
> Good question!
> 
> > Absolutely not.
> > 
> > The execution context for kdb is pretty much unique... we are running a
> > debug mode with all CPUs parked in a holding loop with interrupts
> > disabled. One CPU is at an unknown exception state and the others are
> > either handling an IRQ or NMI depending on architecture[1].
> 
> This is similar to the situation in panic() when other CPUs are
> stopped. It is more safe when the CPUs are stopped using IRQ.
> There is higher danger of a deadlock when NMI is used.
> 
> bust_spinlock() is used in panic() to increase the chance to go over
> the deadlock and actually see the messages. It is not enough when
> more locks are used by the console (VT/TTY is good example). And
> it is not guaranteed that the console will still work after
> the hack is disabled by bust_spinlocks(0).

Good point. It's not guaranteed to help, but bust_spinlocks() does
help in general, many serial drivers do check oops_in_progress and
use a deadlock safe approach when locking port lock. I don't see
bust_spinlocks() being used in kdb, so it probably better start
doing so (along with general for_each_console() loop improvements,
like checking if console is enabled/available/etc).

[..]
> > > If so, can this please be added to the commit message? A more
> > > detailed commit message will help a lot.
> 
> What about?
> 
> "KDB has to get messages on consoles even when the system is stopped.
> It uses kdb_printf() internally and calls console drivers on its own.
> 
> It uses a hack to reuse an existing code. It sets "kdb_trap_printk"
> global variable to redirect even the normal printk() into the
> kdb_printf() variant.
> 
> The variable "kdb_trap_printk" is checked in printk_default() and
> it is ignored when printk is redirected to printk_safe in NMI context.
> Solve this by moving the check into printk_func().
> 
> It is obvious that it is not fully safe. But it does not make things
> worse. The console drivers are already called in this context by
> kdb_printf() direct calls."

This looks more informative indeed. Thanks!

	-ss

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] printk/kdb: Redirect printk messages into kdb in any context
  2020-05-20  4:21                 ` Sergey Senozhatsky
@ 2020-05-20  9:35                   ` Daniel Thompson
  2020-05-20 10:22                     ` [PATCH v2] " Petr Mladek
  0 siblings, 1 reply; 22+ messages in thread
From: Daniel Thompson @ 2020-05-20  9:35 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Petr Mladek, Sumit Garg, Jason Wessel, Douglas Anderson,
	kgdb-bugreport, Linux Kernel Mailing List, Arnd Bergmann,
	Andrew Morton, Peter Zijlstra, Steven Rostedt, John Ogness

On Wed, May 20, 2020 at 01:21:02PM +0900, Sergey Senozhatsky wrote:
> On (20/05/18 11:21), Petr Mladek wrote:
> [..]
> > > > Is this guaranteed that we never execute this path from NMI?
> > 
> > Good question!
> > 
> > > Absolutely not.
> > > 
> > > The execution context for kdb is pretty much unique... we are running a
> > > debug mode with all CPUs parked in a holding loop with interrupts
> > > disabled. One CPU is at an unknown exception state and the others are
> > > either handling an IRQ or NMI depending on architecture[1].
> > 
> > This is similar to the situation in panic() when other CPUs are
> > stopped. It is more safe when the CPUs are stopped using IRQ.
> > There is higher danger of a deadlock when NMI is used.
> > 
> > bust_spinlock() is used in panic() to increase the chance to go over
> > the deadlock and actually see the messages. It is not enough when
> > more locks are used by the console (VT/TTY is good example). And
> > it is not guaranteed that the console will still work after
> > the hack is disabled by bust_spinlocks(0).
> 
> Good point. It's not guaranteed to help, but bust_spinlocks() does
> help in general, many serial drivers do check oops_in_progress and
> use a deadlock safe approach when locking port lock. I don't see
> bust_spinlocks() being used in kdb, so it probably better start
> doing so (along with general for_each_console() loop improvements,
> like checking if console is enabled/available/etc).

Agree.

I am also interested in whether we can figure out a way to match
consoles and polling drivers. It is better to use the polling
driver rather than the console since the polling drivers "know"
they will execute from all sorts of crazy places. For the most common
use cases this would also result in no console handler ever being
called.

BTW for those asking how this issue an submarine for so long I think
the main factor is that not all architectures implement NMI.

There are exceptions but kdb is typically only useful when either:

1. We have a real (e.g. not USB) serial port, or
2. We have a real (e.g. not USB) keyboard

On x86, where the SMP peers are rounded up using using an NMI, the
above grows increasingly hard to find (although they are certainly
still here). Combined with this very few commonly embedded
architectures round up using an NMI so these problems cannot occur.

This has allowed kdb to fall rather far behind the rest of the kernel
w.r.t. NMI robustness whilst not being entirely useless.

Sumit's recent work to exploit NMIs on arm64 is bringing some of our
debt to the surface... happily I think that will also help us to tackle
it!


> [..]
> > > > If so, can this please be added to the commit message? A more
> > > > detailed commit message will help a lot.
> > 
> > What about?
> > 
> > "KDB has to get messages on consoles even when the system is stopped.
> > It uses kdb_printf() internally and calls console drivers on its own.
> > 
> > It uses a hack to reuse an existing code. It sets "kdb_trap_printk"
> > global variable to redirect even the normal printk() into the
> > kdb_printf() variant.
> > 
> > The variable "kdb_trap_printk" is checked in printk_default() and
> > it is ignored when printk is redirected to printk_safe in NMI context.
> > Solve this by moving the check into printk_func().

Maybe a "Currently" thrown in we switch from general background
information to describing the problem the patch is about to fix helps us
read it:

  Currently the variable "kdb_trap_printk" is checked

But other than that LGTM.


Daniel.

> > 
> > It is obvious that it is not fully safe. But it does not make things
> > worse. The console drivers are already called in this context by
> > kdb_printf() direct calls."
> 
> This looks more informative indeed. Thanks!

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v2] printk/kdb: Redirect printk messages into kdb in any context
  2020-05-20  9:35                   ` Daniel Thompson
@ 2020-05-20 10:22                     ` Petr Mladek
  2020-05-20 11:17                       ` Sergey Senozhatsky
  2020-05-20 16:07                       ` Daniel Thompson
  0 siblings, 2 replies; 22+ messages in thread
From: Petr Mladek @ 2020-05-20 10:22 UTC (permalink / raw)
  To: Daniel Thompson
  Cc: Sergey Senozhatsky, Sumit Garg, Jason Wessel, Douglas Anderson,
	kgdb-bugreport, Linux Kernel Mailing List, Arnd Bergmann,
	Andrew Morton, Peter Zijlstra, Steven Rostedt, John Ogness

kdb has to get messages on consoles even when the system is stopped.
It uses kdb_printf() internally and calls console drivers on its own.

It uses a hack to reuse an existing code. It sets "kdb_trap_printk"
global variable to redirect even the normal printk() into the
kdb_printf() variant.

The variable "kdb_trap_printk" is checked in printk_default() and
it is ignored when printk is redirected to printk_safe in NMI context.
Solve this by moving the check into printk_func().

It is obvious that it is not fully safe. But it does not make things
worse. The console drivers are already called in this context by
db_printf() direct calls.

Reported-by: Sumit Garg <sumit.garg@linaro.org>
Tested-by: Sumit Garg <sumit.garg@linaro.org>
Signed-off-by: Petr Mladek <pmladek@suse.com>
---
Changes in v2:

   + more detailed commit message

 kernel/printk/printk.c      | 14 +-------------
 kernel/printk/printk_safe.c |  7 +++++++
 2 files changed, 8 insertions(+), 13 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 9a9b6156270b..63a1aa377cd9 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -35,7 +35,6 @@
 #include <linux/memblock.h>
 #include <linux/syscalls.h>
 #include <linux/crash_core.h>
-#include <linux/kdb.h>
 #include <linux/ratelimit.h>
 #include <linux/kmsg_dump.h>
 #include <linux/syslog.h>
@@ -2036,18 +2035,7 @@ EXPORT_SYMBOL(vprintk);
 
 int vprintk_default(const char *fmt, va_list args)
 {
-	int r;
-
-#ifdef CONFIG_KGDB_KDB
-	/* Allow to pass printk() to kdb but avoid a recursion. */
-	if (unlikely(kdb_trap_printk && kdb_printf_cpu < 0)) {
-		r = vkdb_printf(KDB_MSGSRC_PRINTK, fmt, args);
-		return r;
-	}
-#endif
-	r = vprintk_emit(0, LOGLEVEL_DEFAULT, NULL, 0, fmt, args);
-
-	return r;
+	return vprintk_emit(0, LOGLEVEL_DEFAULT, NULL, 0, fmt, args);
 }
 EXPORT_SYMBOL_GPL(vprintk_default);
 
diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c
index d9a659a686f3..7ccb821d0bfe 100644
--- a/kernel/printk/printk_safe.c
+++ b/kernel/printk/printk_safe.c
@@ -6,6 +6,7 @@
 #include <linux/preempt.h>
 #include <linux/spinlock.h>
 #include <linux/debug_locks.h>
+#include <linux/kdb.h>
 #include <linux/smp.h>
 #include <linux/cpumask.h>
 #include <linux/irq_work.h>
@@ -359,6 +360,12 @@ void __printk_safe_exit(void)
 
 __printf(1, 0) int vprintk_func(const char *fmt, va_list args)
 {
+#ifdef CONFIG_KGDB_KDB
+	/* Allow to pass printk() to kdb but avoid a recursion. */
+	if (unlikely(kdb_trap_printk && kdb_printf_cpu < 0))
+		return vkdb_printf(KDB_MSGSRC_PRINTK, fmt, args);
+#endif
+
 	/*
 	 * Try to use the main logbuf even in NMI. But avoid calling console
 	 * drivers that might have their own locks.
-- 
2.26.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v2] printk/kdb: Redirect printk messages into kdb in any context
  2020-05-20 10:22                     ` [PATCH v2] " Petr Mladek
@ 2020-05-20 11:17                       ` Sergey Senozhatsky
  2020-05-20 16:07                       ` Daniel Thompson
  1 sibling, 0 replies; 22+ messages in thread
From: Sergey Senozhatsky @ 2020-05-20 11:17 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Daniel Thompson, Sergey Senozhatsky, Sumit Garg, Jason Wessel,
	Douglas Anderson, kgdb-bugreport, Linux Kernel Mailing List,
	Arnd Bergmann, Andrew Morton, Peter Zijlstra, Steven Rostedt,
	John Ogness

On (20/05/20 12:22), Petr Mladek wrote:
> kdb has to get messages on consoles even when the system is stopped.
> It uses kdb_printf() internally and calls console drivers on its own.
> 
> It uses a hack to reuse an existing code. It sets "kdb_trap_printk"
> global variable to redirect even the normal printk() into the
> kdb_printf() variant.
> 
> The variable "kdb_trap_printk" is checked in printk_default() and
> it is ignored when printk is redirected to printk_safe in NMI context.
> Solve this by moving the check into printk_func().
> 
> It is obvious that it is not fully safe. But it does not make things
> worse. The console drivers are already called in this context by
> db_printf() direct calls.
> 
> Reported-by: Sumit Garg <sumit.garg@linaro.org>
> Tested-by: Sumit Garg <sumit.garg@linaro.org>
> Signed-off-by: Petr Mladek <pmladek@suse.com>

Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>

	-ss

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2] printk/kdb: Redirect printk messages into kdb in any context
  2020-05-20 10:22                     ` [PATCH v2] " Petr Mladek
  2020-05-20 11:17                       ` Sergey Senozhatsky
@ 2020-05-20 16:07                       ` Daniel Thompson
  1 sibling, 0 replies; 22+ messages in thread
From: Daniel Thompson @ 2020-05-20 16:07 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Sumit Garg, Jason Wessel, Douglas Anderson,
	kgdb-bugreport, Linux Kernel Mailing List, Arnd Bergmann,
	Andrew Morton, Peter Zijlstra, Steven Rostedt, John Ogness

On Wed, May 20, 2020 at 12:22:33PM +0200, Petr Mladek wrote:
> kdb has to get messages on consoles even when the system is stopped.
> It uses kdb_printf() internally and calls console drivers on its own.
> 
> It uses a hack to reuse an existing code. It sets "kdb_trap_printk"
> global variable to redirect even the normal printk() into the
> kdb_printf() variant.
> 
> The variable "kdb_trap_printk" is checked in printk_default() and
> it is ignored when printk is redirected to printk_safe in NMI context.
> Solve this by moving the check into printk_func().
> 
> It is obvious that it is not fully safe. But it does not make things
> worse. The console drivers are already called in this context by
> db_printf() direct calls.
> 
> Reported-by: Sumit Garg <sumit.garg@linaro.org>
> Tested-by: Sumit Garg <sumit.garg@linaro.org>
> Signed-off-by: Petr Mladek <pmladek@suse.com>

Reviewed-by: Daniel Thompson <daniel.thompson@linaro.org>


> ---
> Changes in v2:
> 
>    + more detailed commit message
> 
>  kernel/printk/printk.c      | 14 +-------------
>  kernel/printk/printk_safe.c |  7 +++++++
>  2 files changed, 8 insertions(+), 13 deletions(-)
> 
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 9a9b6156270b..63a1aa377cd9 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -35,7 +35,6 @@
>  #include <linux/memblock.h>
>  #include <linux/syscalls.h>
>  #include <linux/crash_core.h>
> -#include <linux/kdb.h>
>  #include <linux/ratelimit.h>
>  #include <linux/kmsg_dump.h>
>  #include <linux/syslog.h>
> @@ -2036,18 +2035,7 @@ EXPORT_SYMBOL(vprintk);
>  
>  int vprintk_default(const char *fmt, va_list args)
>  {
> -	int r;
> -
> -#ifdef CONFIG_KGDB_KDB
> -	/* Allow to pass printk() to kdb but avoid a recursion. */
> -	if (unlikely(kdb_trap_printk && kdb_printf_cpu < 0)) {
> -		r = vkdb_printf(KDB_MSGSRC_PRINTK, fmt, args);
> -		return r;
> -	}
> -#endif
> -	r = vprintk_emit(0, LOGLEVEL_DEFAULT, NULL, 0, fmt, args);
> -
> -	return r;
> +	return vprintk_emit(0, LOGLEVEL_DEFAULT, NULL, 0, fmt, args);
>  }
>  EXPORT_SYMBOL_GPL(vprintk_default);
>  
> diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c
> index d9a659a686f3..7ccb821d0bfe 100644
> --- a/kernel/printk/printk_safe.c
> +++ b/kernel/printk/printk_safe.c
> @@ -6,6 +6,7 @@
>  #include <linux/preempt.h>
>  #include <linux/spinlock.h>
>  #include <linux/debug_locks.h>
> +#include <linux/kdb.h>
>  #include <linux/smp.h>
>  #include <linux/cpumask.h>
>  #include <linux/irq_work.h>
> @@ -359,6 +360,12 @@ void __printk_safe_exit(void)
>  
>  __printf(1, 0) int vprintk_func(const char *fmt, va_list args)
>  {
> +#ifdef CONFIG_KGDB_KDB
> +	/* Allow to pass printk() to kdb but avoid a recursion. */
> +	if (unlikely(kdb_trap_printk && kdb_printf_cpu < 0))
> +		return vkdb_printf(KDB_MSGSRC_PRINTK, fmt, args);
> +#endif
> +
>  	/*
>  	 * Try to use the main logbuf even in NMI. But avoid calling console
>  	 * drivers that might have their own locks.
> -- 
> 2.26.1
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] printk/kdb: Redirect printk messages into kdb in any context
  2020-05-15 16:36               ` Sergey Senozhatsky
  2020-05-15 16:52                 ` Doug Anderson
@ 2020-06-10 16:41                 ` Daniel Thompson
  2020-06-11  7:27                   ` Sergey Senozhatsky
  2020-06-11  7:58                   ` Petr Mladek
  1 sibling, 2 replies; 22+ messages in thread
From: Daniel Thompson @ 2020-06-10 16:41 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Sumit Garg, Petr Mladek, Jason Wessel, Douglas Anderson,
	kgdb-bugreport, Linux Kernel Mailing List, Arnd Bergmann,
	Andrew Morton, Peter Zijlstra, Steven Rostedt, John Ogness

On Sat, May 16, 2020 at 01:36:38AM +0900, Sergey Senozhatsky wrote:
> On (20/05/15 17:32), Sumit Garg wrote:
> > > Can I please have some context what problem does this solve?
> > 
> > You can find the problem description here [1] which leads to this fix.
> 
> [..]
> 
> > [1] https://lkml.org/lkml/2020/5/12/213
> 
> Thanks for the link. I'm slightly surprised it took so many years
> to notice the addition of printk_nmi/printk_safe :)

Rather by coincidence (at least I think its a coincidence) the problem
has recently become much more obvious.

0d00449c7a28 ("x86: Replace ist_enter() with nmi_enter()") just brought
this to the surface by treating debug traps as NMIs. This means the CPU
that takes a breakpoint, and where almost all of the kdb printk() calls
take place, will now unconditionally have printk() interception enabled.


Daniel.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] printk/kdb: Redirect printk messages into kdb in any context
  2020-06-10 16:41                 ` Daniel Thompson
@ 2020-06-11  7:27                   ` Sergey Senozhatsky
  2020-06-11  7:58                   ` Petr Mladek
  1 sibling, 0 replies; 22+ messages in thread
From: Sergey Senozhatsky @ 2020-06-11  7:27 UTC (permalink / raw)
  To: Daniel Thompson
  Cc: Sergey Senozhatsky, Sumit Garg, Petr Mladek, Jason Wessel,
	Douglas Anderson, kgdb-bugreport, Linux Kernel Mailing List,
	Arnd Bergmann, Andrew Morton, Peter Zijlstra, Steven Rostedt,
	John Ogness

On (20/06/10 17:41), Daniel Thompson wrote:
> > Thanks for the link. I'm slightly surprised it took so many years
> > to notice the addition of printk_nmi/printk_safe :)
> 
> Rather by coincidence (at least I think its a coincidence) the problem
> has recently become much more obvious.
> 
> 0d00449c7a28 ("x86: Replace ist_enter() with nmi_enter()") just brought
> this to the surface by treating debug traps as NMIs. This means the CPU
> that takes a breakpoint, and where almost all of the kdb printk() calls
> take place, will now unconditionally have printk() interception enabled.

Interesting. Feels like ist_enter() should have been using
printk_nmi_enter/exit() in the first place.

	-ss

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] printk/kdb: Redirect printk messages into kdb in any context
  2020-06-10 16:41                 ` Daniel Thompson
  2020-06-11  7:27                   ` Sergey Senozhatsky
@ 2020-06-11  7:58                   ` Petr Mladek
  1 sibling, 0 replies; 22+ messages in thread
From: Petr Mladek @ 2020-06-11  7:58 UTC (permalink / raw)
  To: Daniel Thompson
  Cc: Sergey Senozhatsky, Sumit Garg, Jason Wessel, Douglas Anderson,
	kgdb-bugreport, Linux Kernel Mailing List, Arnd Bergmann,
	Andrew Morton, Peter Zijlstra, Steven Rostedt, John Ogness

On Wed 2020-06-10 17:41:40, Daniel Thompson wrote:
> On Sat, May 16, 2020 at 01:36:38AM +0900, Sergey Senozhatsky wrote:
> > On (20/05/15 17:32), Sumit Garg wrote:
> > > > Can I please have some context what problem does this solve?
> > > 
> > > You can find the problem description here [1] which leads to this fix.
> > 
> > [..]
> > 
> > > [1] https://lkml.org/lkml/2020/5/12/213
> > 
> > Thanks for the link. I'm slightly surprised it took so many years
> > to notice the addition of printk_nmi/printk_safe :)
> 
> Rather by coincidence (at least I think its a coincidence) the problem
> has recently become much more obvious.
> 
> 0d00449c7a28 ("x86: Replace ist_enter() with nmi_enter()") just brought
> this to the surface by treating debug traps as NMIs. This means the CPU
> that takes a breakpoint, and where almost all of the kdb printk() calls
> take place, will now unconditionally have printk() interception enabled.

Mea culpa. I have marked this patch as proceed by mistake. It has got
enough acks and is ready for 5.8.

I have just commited the patch into printk/linux.git,
branch for-5.8-kdb-nmi.

I am going to push it to Linus when it passes linux-next integration,
hopefully tomorrow.

Thanks a lot for poking me.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2020-06-11  7:58 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-12  8:48 [PATCH] kgdb: Fix broken handling of printk() in NMI context Sumit Garg
2020-05-12 14:25 ` Daniel Thompson
2020-05-13 13:34   ` Sumit Garg
2020-05-14  8:42     ` Petr Mladek
2020-05-15  5:46       ` Sumit Garg
2020-05-15  8:50         ` [PATCH] printk/kdb: Redirect printk messages into kdb in any context Petr Mladek
2020-05-15 10:33           ` Sergey Senozhatsky
2020-05-15 12:02             ` Sumit Garg
2020-05-15 16:36               ` Sergey Senozhatsky
2020-05-15 16:52                 ` Doug Anderson
2020-05-18  6:19                   ` Sumit Garg
2020-06-10 16:41                 ` Daniel Thompson
2020-06-11  7:27                   ` Sergey Senozhatsky
2020-06-11  7:58                   ` Petr Mladek
2020-05-15 13:48             ` Daniel Thompson
2020-05-15 16:24               ` Sergey Senozhatsky
2020-05-18  9:21               ` Petr Mladek
2020-05-20  4:21                 ` Sergey Senozhatsky
2020-05-20  9:35                   ` Daniel Thompson
2020-05-20 10:22                     ` [PATCH v2] " Petr Mladek
2020-05-20 11:17                       ` Sergey Senozhatsky
2020-05-20 16:07                       ` Daniel Thompson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).