linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH -tip] kgdb, x86: Pull up NMI notifier handler priority
@ 2011-03-23 20:32 Cyrill Gorcunov
  2011-03-23 21:16 ` Don Zickus
  2011-03-24  3:30 ` Dongdong Deng
  0 siblings, 2 replies; 9+ messages in thread
From: Cyrill Gorcunov @ 2011-03-23 20:32 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Lin Ming, Don Zickus, Jason Wessel, lkml

kgdb needs IPI to be sent and handled before perf
or anything else NMI, otherwise kgdb hangs with bootup
self-tests (found on P4 HT SMP machine). Raise its priority
so that we're called first in a notifier chain.

Reported-by: Don Zickus <dzickus@redhat.com>
Tested-by: Lin Ming <ming.m.lin@intel.com>
CC: Jason Wessel <jason.wessel@windriver.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
---

Don, Jason, take a look please.

 arch/x86/kernel/kgdb.c |    7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

Index: linux-2.6.git/arch/x86/kernel/kgdb.c
=====================================================================
--- linux-2.6.git.orig/arch/x86/kernel/kgdb.c
+++ linux-2.6.git/arch/x86/kernel/kgdb.c
@@ -592,9 +592,12 @@ static struct notifier_block kgdb_notifi
 	.notifier_call	= kgdb_notify,

 	/*
-	 * Lowest-prio notifier priority, we want to be notified last:
+	 * We might need to send an IPI and
+	 * do cpu roundup before anything else
+	 * in notifier chain so high priority
+	 * is needed.
 	 */
-	.priority	= NMI_LOCAL_LOW_PRIOR,
+	.priority	= NMI_LOCAL_HIGH_PRIOR,
 };

 /**

-- 
    Cyrill

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH -tip] kgdb, x86: Pull up NMI notifier handler priority
  2011-03-23 20:32 [PATCH -tip] kgdb, x86: Pull up NMI notifier handler priority Cyrill Gorcunov
@ 2011-03-23 21:16 ` Don Zickus
  2011-03-23 21:33   ` Cyrill Gorcunov
  2011-03-24  3:30 ` Dongdong Deng
  1 sibling, 1 reply; 9+ messages in thread
From: Don Zickus @ 2011-03-23 21:16 UTC (permalink / raw)
  To: Cyrill Gorcunov; +Cc: Ingo Molnar, Lin Ming, Jason Wessel, lkml

On Wed, Mar 23, 2011 at 11:32:33PM +0300, Cyrill Gorcunov wrote:
> kgdb needs IPI to be sent and handled before perf
> or anything else NMI, otherwise kgdb hangs with bootup
> self-tests (found on P4 HT SMP machine). Raise its priority
> so that we're called first in a notifier chain.

This is only because P4 perf swallows all the nmis.  If that is the case
you are arguing to make the perf nmi at the bottom of the priority list,
which is probably not where it should be due to its volume.

I am stuck debugging P4 problems again for RHEL-6 and I noticed a small
change that is needed (didn't help my problem though) but it looked like
an oversight that might help your case.

Cheers,
Don


diff --git a/arch/x86/kernel/cpu/perf_event_p4.c b/arch/x86/kernel/cpu/perf_event_p4.c
index 3769ac8..d945314 100644
--- a/arch/x86/kernel/cpu/perf_event_p4.c
+++ b/arch/x86/kernel/cpu/perf_event_p4.c
@@ -777,6 +787,7 @@ static inline int p4_pmu_clear_cccr_ovf(struct hw_perf_event *hwc)
 	 * the counter has reached zero value and continued counting before
 	 * real NMI signal was received:
 	 */
+	rdmsrl(hwc->event_base, v);
 	if (!(v & ARCH_P4_UNFLAGGED_BIT))
 		return 1;
 

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH -tip] kgdb, x86: Pull up NMI notifier handler priority
  2011-03-23 21:16 ` Don Zickus
@ 2011-03-23 21:33   ` Cyrill Gorcunov
  0 siblings, 0 replies; 9+ messages in thread
From: Cyrill Gorcunov @ 2011-03-23 21:33 UTC (permalink / raw)
  To: Don Zickus; +Cc: Ingo Molnar, Lin Ming, Jason Wessel, lkml

On 03/24/2011 12:16 AM, Don Zickus wrote:
> On Wed, Mar 23, 2011 at 11:32:33PM +0300, Cyrill Gorcunov wrote:
>> kgdb needs IPI to be sent and handled before perf
>> or anything else NMI, otherwise kgdb hangs with bootup
>> self-tests (found on P4 HT SMP machine). Raise its priority
>> so that we're called first in a notifier chain.
> 
> This is only because P4 perf swallows all the nmis.  If that is the case
> you are arguing to make the perf nmi at the bottom of the priority list,
> which is probably not where it should be due to its volume.

The problem is that there IPI wait cycle inside kgdb and we are to be sure
to handle it early. And perf eventually can consume kgdb NMI which would
lead to infinite wait loop so I don't see any easier way to deal with it.

> 
> I am stuck debugging P4 problems again for RHEL-6 and I noticed a small
> change that is needed (didn't help my problem though) but it looked like
> an oversight that might help your case.
> 
> Cheers,
> Don
> 
> 
> diff --git a/arch/x86/kernel/cpu/perf_event_p4.c b/arch/x86/kernel/cpu/perf_event_p4.c
> index 3769ac8..d945314 100644
> --- a/arch/x86/kernel/cpu/perf_event_p4.c
> +++ b/arch/x86/kernel/cpu/perf_event_p4.c
> @@ -777,6 +787,7 @@ static inline int p4_pmu_clear_cccr_ovf(struct hw_perf_event *hwc)
>  	 * the counter has reached zero value and continued counting before
>  	 * real NMI signal was received:
>  	 */
> +	rdmsrl(hwc->event_base, v);
>  	if (!(v & ARCH_P4_UNFLAGGED_BIT))
>  		return 1;
>  

Good catch! Ack! It seems to be escaped in first place (I fear I forgot to
refresh patch before send it). Mind to send the full patch to Ingo?

-- 
    Cyrill

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH -tip] kgdb, x86: Pull up NMI notifier handler priority
  2011-03-23 20:32 [PATCH -tip] kgdb, x86: Pull up NMI notifier handler priority Cyrill Gorcunov
  2011-03-23 21:16 ` Don Zickus
@ 2011-03-24  3:30 ` Dongdong Deng
  2011-03-24  5:24   ` Cyrill Gorcunov
  1 sibling, 1 reply; 9+ messages in thread
From: Dongdong Deng @ 2011-03-24  3:30 UTC (permalink / raw)
  To: Cyrill Gorcunov, Jason Wessel
  Cc: Ingo Molnar, Lin Ming, Don Zickus, lkml, KGDB Mailing List

On Thu, Mar 24, 2011 at 4:32 AM, Cyrill Gorcunov <gorcunov@openvz.org> wrote:
> kgdb needs IPI to be sent and handled before perf
> or anything else NMI, otherwise kgdb hangs with bootup
> self-tests (found on P4 HT SMP machine). Raise its priority
> so that we're called first in a notifier chain.
>
> Reported-by: Don Zickus <dzickus@redhat.com>
> Tested-by: Lin Ming <ming.m.lin@intel.com>
> CC: Jason Wessel <jason.wessel@windriver.com>
> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
> ---
>
> Don, Jason, take a look please.
>
>  arch/x86/kernel/kgdb.c |    7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
>
> Index: linux-2.6.git/arch/x86/kernel/kgdb.c
> =====================================================================
> --- linux-2.6.git.orig/arch/x86/kernel/kgdb.c
> +++ linux-2.6.git/arch/x86/kernel/kgdb.c
> @@ -592,9 +592,12 @@ static struct notifier_block kgdb_notifi
>        .notifier_call  = kgdb_notify,
>
>        /*
> -        * Lowest-prio notifier priority, we want to be notified last:
> +        * We might need to send an IPI and
> +        * do cpu roundup before anything else
> +        * in notifier chain so high priority
> +        * is needed.
>         */
> -       .priority       = NMI_LOCAL_LOW_PRIOR,
> +       .priority       = NMI_LOCAL_HIGH_PRIOR,
>  };

CC: kgdb maillist.

I quote Jason's early email to explain why debugger tends to
set the low level of die_notifier.

"The original thinking was that if you are using a low level debugger
that you would want to stop on such a event or breakpoint because there
is nothing else handling it and your system is about to print an oops
message."

For keeping above purpose, I have a "ugly" proposal that splitting
kgdb die handling to two parts.

1: The first part with HIGH priority and just handle NMI event,
 if the NMI event is not belong kgdb, it return NOTIFY_DONE and pass to others.

2: The second part with low priority to handling others.

Due to above handling logic could let kgdb source get complexly, I
couldn't make sure
it is a suitable method here, if it is OK, I can send a formal patch. :-)


$git diff
-----------
arch/x86/kernel/kgdb.c |   67 +++++++++++++++++++++++++++++++++++------------
 1 files changed, 50 insertions(+), 17 deletions(-)

diff --git a/arch/x86/kernel/kgdb.c b/arch/x86/kernel/kgdb.c
index dba0b36..afd18db 100644
--- a/arch/x86/kernel/kgdb.c
+++ b/arch/x86/kernel/kgdb.c
@@ -516,23 +516,6 @@ static int __kgdb_notify(struct die_args *args,
unsigned long cmd)
 	struct pt_regs *regs = args->regs;

 	switch (cmd) {
-	case DIE_NMI:
-		if (atomic_read(&kgdb_active) != -1) {
-			/* KGDB CPU roundup */
-			kgdb_nmicallback(raw_smp_processor_id(), regs);
-			was_in_debug_nmi[raw_smp_processor_id()] = 1;
-			touch_nmi_watchdog();
-			return NOTIFY_STOP;
-		}
-		return NOTIFY_DONE;
-
-	case DIE_NMIUNKNOWN:
-		if (was_in_debug_nmi[raw_smp_processor_id()]) {
-			was_in_debug_nmi[raw_smp_processor_id()] = 0;
-			return NOTIFY_STOP;
-		}
-		return NOTIFY_DONE;
-
 	case DIE_DEBUG:
 		if (atomic_read(&kgdb_cpu_doing_single_step) != -1) {
 			if (user_mode(regs))
@@ -588,6 +571,52 @@ kgdb_notify(struct notifier_block *self, unsigned
long cmd, void *ptr)
 	return ret;
 }

+static int
+kgdb_nmi_notify(struct notifier_block *self, unsigned long cmd, void *ptr)
+{
+	unsigned long flags;
+	struct pt_regs *regs = ((struct die_args*)ptr)->regs;
+	int ret = NOTIFY_DONE;
+
+	local_irq_save(flags);
+
+	switch (cmd) {
+	case DIE_NMI:
+		if (atomic_read(&kgdb_active) != -1) {
+			/* KGDB CPU roundup */
+			kgdb_nmicallback(raw_smp_processor_id(), regs);
+			was_in_debug_nmi[raw_smp_processor_id()] = 1;
+			touch_nmi_watchdog();
+			ret = NOTIFY_STOP;
+		}
+		break;
+
+	case DIE_NMIUNKNOWN:
+		if (was_in_debug_nmi[raw_smp_processor_id()]) {
+			was_in_debug_nmi[raw_smp_processor_id()] = 0;
+			ret = NOTIFY_STOP;
+		}
+		break;
+	default:
+		break;
+	}
+
+	local_irq_restore(flags);
+	return ret;
+}
+
+static struct notifier_block kgdb_nmi_notifier = {
+	.notifier_call	= kgdb_nmi_notify,
+
+	/*
+	 * We might need to send an IPI and
+	 * do cpu roundup before anything else
+	 * in notifier chain so high priority
+	 * is needed.
+	 */
+	.priority	= NMI_LOCAL_HIGH_PRIOR,
+};
+
 static struct notifier_block kgdb_notifier = {
 	.notifier_call	= kgdb_notify,

@@ -605,6 +634,9 @@ static struct notifier_block kgdb_notifier = {
  */
 int kgdb_arch_init(void)
 {
+	int err = register_die_notifier(&kgdb_nmi_notifier);
+	if (err)
+		return err;
 	return register_die_notifier(&kgdb_notifier);
 }

@@ -673,6 +705,7 @@ void kgdb_arch_exit(void)
 			breakinfo[i].pev = NULL;
 		}
 	}
+	unregister_die_notifier(&kgdb_nmi_notifier);
 	unregister_die_notifier(&kgdb_notifier);
 }

-- 
1.6.0.4

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH -tip] kgdb, x86: Pull up NMI notifier handler priority
  2011-03-24  3:30 ` Dongdong Deng
@ 2011-03-24  5:24   ` Cyrill Gorcunov
  2011-03-31 17:40     ` Jason Wessel
  0 siblings, 1 reply; 9+ messages in thread
From: Cyrill Gorcunov @ 2011-03-24  5:24 UTC (permalink / raw)
  To: Dongdong Deng
  Cc: Cyrill Gorcunov, Jason Wessel, Ingo Molnar, Lin Ming, Don Zickus,
	lkml, KGDB Mailing List

If Jason is ok with such splitting -- I dont mind either ;)

/sorry for toppost/

On Thursday, March 24, 2011, Dongdong Deng <libfetion@gmail.com> wrote:
> On Thu, Mar 24, 2011 at 4:32 AM, Cyrill Gorcunov <gorcunov@openvz.org> wrote:
>> kgdb needs IPI to be sent and handled before perf
>> or anything else NMI, otherwise kgdb hangs with bootup
>> self-tests (found on P4 HT SMP machine). Raise its priority
>> so that we're called first in a notifier chain.
>>
>> Reported-by: Don Zickus <dzickus@redhat.com>
>> Tested-by: Lin Ming <ming.m.lin@intel.com>
>> CC: Jason Wessel <jason.wessel@windriver.com>
>> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
>> ---
>>
>> Don, Jason, take a look please.
>>
>>  arch/x86/kernel/kgdb.c |    7 +++++--
>>  1 file changed, 5 insertions(+), 2 deletions(-)
>>
>> Index: linux-2.6.git/arch/x86/kernel/kgdb.c
>> =====================================================================
>> --- linux-2.6.git.orig/arch/x86/kernel/kgdb.c
>> +++ linux-2.6.git/arch/x86/kernel/kgdb.c
>> @@ -592,9 +592,12 @@ static struct notifier_block kgdb_notifi
>>        .notifier_call  = kgdb_notify,
>>
>>        /*
>> -        * Lowest-prio notifier priority, we want to be notified last:
>> +        * We might need to send an IPI and
>> +        * do cpu roundup before anything else
>> +        * in notifier chain so high priority
>> +        * is needed.
>>         */
>> -       .priority       = NMI_LOCAL_LOW_PRIOR,
>> +       .priority       = NMI_LOCAL_HIGH_PRIOR,
>>  };
>
> CC: kgdb maillist.
>
> I quote Jason's early email to explain why debugger tends to
> set the low level of die_notifier.
>
> "The original thinking was that if you are using a low level debugger
> that you would want to stop on such a event or breakpoint because there
> is nothing else handling it and your system is about to print an oops
> message."
>
> For keeping above purpose, I have a "ugly" proposal that splitting
> kgdb die handling to two parts.
>
> 1: The first part with HIGH priority and just handle NMI event,
>  if the NMI event is not belong kgdb, it return NOTIFY_DONE and pass to others.
>
> 2: The second part with low priority to handling others.
>
> Due to above handling logic could let kgdb source get complexly, I
> couldn't make sure
> it is a suitable method here, if it is OK, I can send a formal patch. :-)
>
>
> $git diff
> -----------
> arch/x86/kernel/kgdb.c |   67 +++++++++++++++++++++++++++++++++++------------
>  1 files changed, 50 insertions(+), 17 deletions(-)
>
> diff --git a/arch/x86/kernel/kgdb.c b/arch/x86/kernel/kgdb.c
> index dba0b36..afd18db 100644
> --- a/arch/x86/kernel/kgdb.c
> +++ b/arch/x86/kernel/kgdb.c
> @@ -516,23 +516,6 @@ static int __kgdb_notify(struct die_args *args,
> unsigned long cmd)
>         struct pt_regs *regs = args->regs;
>
>         switch (cmd) {
> -       case DIE_NMI:
> -               if (atomic_read(&kgdb_active) != -1) {
> -                       /* KGDB CPU roundup */
> -                       kgdb_nmicallback(raw_smp_processor_id(), regs);
> -                       was_in_debug_nmi[raw_smp_processor_id()] = 1;
> -                       touch_nmi_watchdog();
> -                       return NOTIFY_STOP;
> -               }
> -               return NOTIFY_DONE;
> -
> -       case DIE_NMIUNKNOWN:
> -               if (was_in_debug_nmi[raw_smp_processor_id()]) {
> -                       was_in_debug_nmi[raw_smp_processor_id()] = 0;
> -                       return NOTIFY_STOP;
> -               }
> -               return NOTIFY_DONE;
> -
>         case DIE_DEBUG:
>                 if (atomic_read(&kgdb_cpu_doing_single_step) != -1) {
>                         if (user_mode(regs))
> @@ -588,6 +571,52 @@ kgdb_notify(struct notifier_block *self, unsigned
> long cmd, void *ptr)
>         return ret;
>  }
>
> +static int
> +kgdb_nmi_notify(struct notifier_block *self, unsigned long cmd, void *ptr)
> +{
> +       unsigned long flags;
> +       struct pt_regs *regs = ((struct die_args*)ptr)->regs;
> +       int ret = NOTIFY_DONE;
> +
> +       local_irq_save(flags);
> +
> +       switch (cmd) {
> +       case DIE_NMI:
> +               if (atomic_read(&kgdb_active) != -1) {
> +                       /* KGDB CPU roundup */
> +                       kgdb_nmicallback(raw_smp_processor_id(), regs);
> +                       was_in_debug_nmi[raw_smp_processor_id()] = 1;
> +                       touch_nmi_watchdog();
> +                       ret = NOTIFY_STOP;
> +               }
> +               break;
> +
> +       case DIE_NMIUNKNOWN:
> +               if (was_in_debug_nmi[raw_smp_processor_id()]) {
> +                       was_in_debug_nmi[raw_smp_processor_id()] = 0;
> +                       ret = NOTIFY_STOP;
> +               }
> +               break;
> +       default:
> +               break;
> +       }
> +
> +       local_irq_restore(flags);
> +       return ret;
> +}
> +
> +static struct notifier_block kgdb_nmi_notifier = {
> +       .notifier_call  = kgdb_nmi_notify,
> +
> +       /*
> +        * We might need to send an IPI and
> +        * do cpu roundup before anything else
> +        * in notifier chain so high priority
> +        * is needed.
> +        */
> +       .priority       = NMI_LOCAL_HIGH_PRIOR,
> +};
> +
>  static struct notifier_block kgdb_notifier = {
>         .notifier_call  = kgdb_notify,
>
> @@ -605,6 +634,9 @@ static struct notifier_block kgdb_notifier = {
>   */
>  int kgdb_arch_init(void)
>  {
> +       int err = register_die_notifier(&kgdb_nmi_notifier);
> +       if (err)
> +               return err;
>         return register_die_notifier(&kgdb_notifier);
>  }
>
> @@ -673,6 +705,7 @@ void kgdb_arch_exit(void)
>                         breakinfo[i].pev = NULL;
>                 }
>         }
> +       unregister_die_notifier(&kgdb_nmi_notifier);
>         unregister_die_notifier(&kgdb_notifier);
>  }
>
> --
> 1.6.0.4
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH -tip] kgdb, x86: Pull up NMI notifier handler priority
  2011-03-24  5:24   ` Cyrill Gorcunov
@ 2011-03-31 17:40     ` Jason Wessel
  2011-03-31 20:25       ` Cyrill Gorcunov
  2011-04-01  9:26       ` Dongdong Deng
  0 siblings, 2 replies; 9+ messages in thread
From: Jason Wessel @ 2011-03-31 17:40 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: Dongdong Deng, Ingo Molnar, Lin Ming, Don Zickus, lkml,
	KGDB Mailing List

On 03/24/2011 12:24 AM, Cyrill Gorcunov wrote:
> If Jason is ok with such splitting -- I dont mind either ;)
>
> On Thursday, March 24, 2011, Dongdong Deng <libfetion@gmail.com> wrote:
>> On Thu, Mar 24, 2011 at 4:32 AM, Cyrill Gorcunov <gorcunov@openvz.org> wrote:
>>> kgdb needs IPI to be sent and handled before perf
>>> or anything else NMI, otherwise kgdb hangs with bootup
>>> self-tests (found on P4 HT SMP machine). Raise its priority
>>> so that we're called first in a notifier chain.
>>>

I talked with Cyrill outside the mailing list since he pinged me and I will summarize here.

My initial thought about the patch Deng Dongdong posted was that it was really ugly to have kgdb registered in the notifier chain twice.  I would be willing to live with this for now if we agree that when jump labels are merged to the kernel that we can make use of that instead.

The jump labels would allow us to invoke the debugger directly when the debugger is active much like we do when CONFIG_KGDB_LOW_LEVEL_TRAP is set.  In fact the code that is ifdef'ed with CONFIG_KGDB_LOW_LEVEL_TRAP can make use of the same jump label as the NMI entry and no longer be #ifdef'ed when jump labels come to pass.

The very discussion of the patch raised the question of "why not always have the debugger be first?"  The answer for that lies in that some code needs to run before the debugger to keep the system running assuming you are planning on restarting it after entering the debugger.   The generic die notifier is used for lots of circumstances and the priority the debugger cares about only matter for a select few exception types.

The kmmio, mce-inject, and crash_nmi_nb (from reboot.c) are good examples of in tree code that should run with a higher priority than the debugger because the debugger doesn't know what to do with these code paths, so it sits last in line hoping someone else will deal with the exception else enter the debugger.  For the trap paths the debugger needs to be first in line to deal with the case where where a breakpoint is in a notifier to avoid non-recoverable recursive faults. For NMI it appears we need to run before the perf code or the perf code will eat an nmi event intended for kgdb and result in a dead locked system.

The net result.   I'll sign-off on the kgdb change and add a TODO item to wait for the jump patching to enter the kernel.

Cyrill, I am assuming this is something we want to aim to merge into the 2.6.39 as a regression fix?  I'll try to get a version of Deng Dongdong's patch into linux-next as soon as possible in the mean time.

Cheers,
Jason.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH -tip] kgdb, x86: Pull up NMI notifier handler priority
  2011-03-31 17:40     ` Jason Wessel
@ 2011-03-31 20:25       ` Cyrill Gorcunov
  2011-04-01  9:26       ` Dongdong Deng
  1 sibling, 0 replies; 9+ messages in thread
From: Cyrill Gorcunov @ 2011-03-31 20:25 UTC (permalink / raw)
  To: Jason Wessel
  Cc: Dongdong Deng, Ingo Molnar, Lin Ming, Don Zickus, lkml,
	KGDB Mailing List

On 03/31/2011 09:40 PM, Jason Wessel wrote:
...
> The net result. I'll sign-off on the kgdb change and add a TODO item to wait for the jump patching to enter the kernel.
> 
> Cyrill, I am assuming this is something we want to aim to merge into the 2.6.39 as a regression fix? 
> I'll try to get a version of Deng Dongdong's patch into linux-next as soon as possible in the mean time.
> 
> Cheers,
> Jason.

 Hi Jason, I think Deng Dongdong's patch is a good candidate for a while. Though Don mention another issue
on his Celerone machine so I need to think about the reason for unknown NMIs he observes. I try to get access
to p4 machine tomorrow and test some ideas. Will keep you in touch! (initially I thought to write a long post
about possible hang scenarios but then found they all are crap and I better to check details on real p4 machine
before claiming anything :).

-- 
    Cyrill

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH -tip] kgdb, x86: Pull up NMI notifier handler priority
  2011-03-31 17:40     ` Jason Wessel
  2011-03-31 20:25       ` Cyrill Gorcunov
@ 2011-04-01  9:26       ` Dongdong Deng
  2011-04-01 11:39         ` Cyrill Gorcunov
  1 sibling, 1 reply; 9+ messages in thread
From: Dongdong Deng @ 2011-04-01  9:26 UTC (permalink / raw)
  To: Jason Wessel
  Cc: Cyrill Gorcunov, Ingo Molnar, Lin Ming, Don Zickus, lkml,
	KGDB Mailing List

On Fri, Apr 1, 2011 at 1:40 AM, Jason Wessel <jason.wessel@windriver.com> wrote:
> On 03/24/2011 12:24 AM, Cyrill Gorcunov wrote:
>> If Jason is ok with such splitting -- I dont mind either ;)
>>
>> On Thursday, March 24, 2011, Dongdong Deng <libfetion@gmail.com> wrote:
>>> On Thu, Mar 24, 2011 at 4:32 AM, Cyrill Gorcunov <gorcunov@openvz.org> wrote:
>>>> kgdb needs IPI to be sent and handled before perf
>>>> or anything else NMI, otherwise kgdb hangs with bootup
>>>> self-tests (found on P4 HT SMP machine). Raise its priority
>>>> so that we're called first in a notifier chain.
>>>>
>
> I talked with Cyrill outside the mailing list since he pinged me and I will summarize here.
>
> My initial thought about the patch Deng Dongdong posted was that it was really ugly to have kgdb registered in the notifier chain twice.  I would be willing to live with this for now if we agree that when jump labels are merged to the kernel that we can make use of that instead.
>
> The jump labels would allow us to invoke the debugger directly when the debugger is active much like we do when CONFIG_KGDB_LOW_LEVEL_TRAP is set.  In fact the code that is ifdef'ed with CONFIG_KGDB_LOW_LEVEL_TRAP can make use of the same jump label as the NMI entry and no longer be #ifdef'ed when jump labels come to pass.


Thanks to teach kgdb state with jump labels. :-)


>
> The very discussion of the patch raised the question of "why not always have the debugger be first?"  The answer for that lies in that some code needs to run before the debugger to keep the system running assuming you are planning on restarting it after entering the debugger.   The generic die notifier is used for lots of circumstances and the priority the debugger cares about only matter for a select few exception types.
>
> The kmmio, mce-inject, and crash_nmi_nb (from reboot.c) are good examples of in tree code that should run with a higher priority than the debugger because the debugger doesn't know what to do with these code paths, so it sits last in line hoping someone else will deal with the exception else enter the debugger.  For the trap paths the debugger needs to be first in line to deal with the case where where a breakpoint is in a notifier to avoid non-recoverable recursive faults. For NMI it appears we need to run before the perf code or the perf code will eat an nmi event intended for kgdb and result in a dead locked system.
>
> The net result.   I'll sign-off on the kgdb change and add a TODO item to wait for the jump patching to enter the kernel.
>
> Cyrill, I am assuming this is something we want to aim to merge into the 2.6.39 as a regression fix?  I'll try to get a version of Deng Dongdong's patch into linux-next as soon as possible in the mean time.

Hi Jason,

Do I need to send a patch to you?  :-)

BR,
Dongdong

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH -tip] kgdb, x86: Pull up NMI notifier handler priority
  2011-04-01  9:26       ` Dongdong Deng
@ 2011-04-01 11:39         ` Cyrill Gorcunov
  0 siblings, 0 replies; 9+ messages in thread
From: Cyrill Gorcunov @ 2011-04-01 11:39 UTC (permalink / raw)
  To: Dongdong Deng
  Cc: Jason Wessel, Ingo Molnar, Lin Ming, Don Zickus, lkml, KGDB Mailing List

On 04/01/2011 01:26 PM, Dongdong Deng wrote:
...
>>
>> The net result.   I'll sign-off on the kgdb change and add a TODO item to wait for the jump patching to enter the kernel.
>>
>> Cyrill, I am assuming this is something we want to aim to merge into the 2.6.39 as a regression fix?  I'll try to get a
>> version of Deng Dongdong's patch into linux-next as soon as possible in the mean time.
> 
> Hi Jason,
> 
> Do I need to send a patch to you?  :-)
> 
> BR,
> Dongdong

I've started latest -tip on some ancient p4 machine today and got unknown nmi issue even without kgdb compiled :(
Didn't reveal all the details yet (seems I'll be able to continue investigation at monday only in best case).
But still your patch looks correct to me.

-- 
    Cyrill

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2011-04-01 11:39 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-03-23 20:32 [PATCH -tip] kgdb, x86: Pull up NMI notifier handler priority Cyrill Gorcunov
2011-03-23 21:16 ` Don Zickus
2011-03-23 21:33   ` Cyrill Gorcunov
2011-03-24  3:30 ` Dongdong Deng
2011-03-24  5:24   ` Cyrill Gorcunov
2011-03-31 17:40     ` Jason Wessel
2011-03-31 20:25       ` Cyrill Gorcunov
2011-04-01  9:26       ` Dongdong Deng
2011-04-01 11:39         ` Cyrill Gorcunov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).