All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH][RT] x86: Fix an RT MCE crash
@ 2016-06-30 13:24 minyard
  2016-06-30 13:43 ` Steven Rostedt
  0 siblings, 1 reply; 25+ messages in thread
From: minyard @ 2016-06-30 13:24 UTC (permalink / raw)
  To: linux-rt-users, Steven Rostedt; +Cc: Corey Minyard

From: Corey Minyard <cminyard@mvista.com>

On some x86 systems an MCE interrupt would come in before the kernel
was ready for it.  Looking at the latest RT code, it has similar
(but not quite the same) code, except it adds a bool that tells if
MCE handling is initialized.  Add the same bool for older versions.

Signed-off-by: Corey Minyard <cminyard@mvista.com>
---
 arch/x86/kernel/cpu/mcheck/mce.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

We noticed this issue on a new Broadwell system when we booted RT
on it.  This patch is for 3.10, I'm not sure if it applies to
other kernel versions.

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index aaf4b9b..7125584 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -1365,6 +1365,7 @@ static void __mce_notify_work(void)
 }
 
 #ifdef CONFIG_PREEMPT_RT_FULL
+static bool notify_work_ready __read_mostly;
 struct task_struct *mce_notify_helper;
 
 static int mce_notify_helper_thread(void *unused)
@@ -1386,12 +1387,14 @@ static int mce_notify_work_init(void)
 	if (!mce_notify_helper)
 		return -ENOMEM;
 
+	notify_work_ready = true;
 	return 0;
 }
 
 static void mce_notify_work(void)
 {
-	wake_up_process(mce_notify_helper);
+	if (notify_work_ready)
+		wake_up_process(mce_notify_helper);
 }
 #else
 static void mce_notify_work(void)
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH][RT] x86: Fix an RT MCE crash
  2016-06-30 13:24 [PATCH][RT] x86: Fix an RT MCE crash minyard
@ 2016-06-30 13:43 ` Steven Rostedt
  2016-06-30 14:49   ` Corey Minyard
  0 siblings, 1 reply; 25+ messages in thread
From: Steven Rostedt @ 2016-06-30 13:43 UTC (permalink / raw)
  To: minyard; +Cc: linux-rt-users, Corey Minyard

On Thu, 30 Jun 2016 08:24:49 -0500
minyard@acm.org wrote:

> From: Corey Minyard <cminyard@mvista.com>
> 
> On some x86 systems an MCE interrupt would come in before the kernel
> was ready for it.  Looking at the latest RT code, it has similar
> (but not quite the same) code, except it adds a bool that tells if
> MCE handling is initialized.  Add the same bool for older versions.
> 
> Signed-off-by: Corey Minyard <cminyard@mvista.com>
> ---
>  arch/x86/kernel/cpu/mcheck/mce.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> We noticed this issue on a new Broadwell system when we booted RT
> on it.  This patch is for 3.10, I'm not sure if it applies to
> other kernel versions.

Do you mean other 'older' versions? and that this works with the
versions after 3.10 without this patch?

-- Steve

> 
> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
> index aaf4b9b..7125584 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> @@ -1365,6 +1365,7 @@ static void __mce_notify_work(void)
>  }
>  
>  #ifdef CONFIG_PREEMPT_RT_FULL
> +static bool notify_work_ready __read_mostly;
>  struct task_struct *mce_notify_helper;
>  
>  static int mce_notify_helper_thread(void *unused)
> @@ -1386,12 +1387,14 @@ static int mce_notify_work_init(void)
>  	if (!mce_notify_helper)
>  		return -ENOMEM;
>  
> +	notify_work_ready = true;
>  	return 0;
>  }
>  
>  static void mce_notify_work(void)
>  {
> -	wake_up_process(mce_notify_helper);
> +	if (notify_work_ready)
> +		wake_up_process(mce_notify_helper);
>  }
>  #else
>  static void mce_notify_work(void)


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH][RT] x86: Fix an RT MCE crash
  2016-06-30 13:43 ` Steven Rostedt
@ 2016-06-30 14:49   ` Corey Minyard
  2016-06-30 15:51     ` Steven Rostedt
  0 siblings, 1 reply; 25+ messages in thread
From: Corey Minyard @ 2016-06-30 14:49 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: linux-rt-users, Corey Minyard

On 06/30/2016 08:43 AM, Steven Rostedt wrote:
> On Thu, 30 Jun 2016 08:24:49 -0500
> minyard@acm.org wrote:
>
>> From: Corey Minyard <cminyard@mvista.com>
>>
>> On some x86 systems an MCE interrupt would come in before the kernel
>> was ready for it.  Looking at the latest RT code, it has similar
>> (but not quite the same) code, except it adds a bool that tells if
>> MCE handling is initialized.  Add the same bool for older versions.
>>
>> Signed-off-by: Corey Minyard <cminyard@mvista.com>
>> ---
>>   arch/x86/kernel/cpu/mcheck/mce.c | 5 ++++-
>>   1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> We noticed this issue on a new Broadwell system when we booted RT
>> on it.  This patch is for 3.10, I'm not sure if it applies to
>> other kernel versions.
> Do you mean other 'older' versions? and that this works with the
> versions after 3.10 without this patch?

I haven't look at supported kernel versions besides 3.10 and 4.4.
The fix was from the 4.4 version of this code.  This patch fixes
v3.10-rt; I can look at finding which other versions need this.  I
was planning to do this, but I wanted to get the patch out for
comments first.

-corey

> -- Steve
>
>> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
>> index aaf4b9b..7125584 100644
>> --- a/arch/x86/kernel/cpu/mcheck/mce.c
>> +++ b/arch/x86/kernel/cpu/mcheck/mce.c
>> @@ -1365,6 +1365,7 @@ static void __mce_notify_work(void)
>>   }
>>   
>>   #ifdef CONFIG_PREEMPT_RT_FULL
>> +static bool notify_work_ready __read_mostly;
>>   struct task_struct *mce_notify_helper;
>>   
>>   static int mce_notify_helper_thread(void *unused)
>> @@ -1386,12 +1387,14 @@ static int mce_notify_work_init(void)
>>   	if (!mce_notify_helper)
>>   		return -ENOMEM;
>>   
>> +	notify_work_ready = true;
>>   	return 0;
>>   }
>>   
>>   static void mce_notify_work(void)
>>   {
>> -	wake_up_process(mce_notify_helper);
>> +	if (notify_work_ready)
>> +		wake_up_process(mce_notify_helper);
>>   }
>>   #else
>>   static void mce_notify_work(void)


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH][RT] x86: Fix an RT MCE crash
  2016-06-30 14:49   ` Corey Minyard
@ 2016-06-30 15:51     ` Steven Rostedt
  2016-06-30 15:58       ` Corey Minyard
                         ` (2 more replies)
  0 siblings, 3 replies; 25+ messages in thread
From: Steven Rostedt @ 2016-06-30 15:51 UTC (permalink / raw)
  To: Corey Minyard; +Cc: linux-rt-users, Corey Minyard, Borislav Petkov

On Thu, 30 Jun 2016 09:49:19 -0500
Corey Minyard <minyard@acm.org> wrote:

> On 06/30/2016 08:43 AM, Steven Rostedt wrote:
> > On Thu, 30 Jun 2016 08:24:49 -0500
> > minyard@acm.org wrote:
> >  
> >> From: Corey Minyard <cminyard@mvista.com>
> >>
> >> On some x86 systems an MCE interrupt would come in before the kernel
> >> was ready for it.  Looking at the latest RT code, it has similar
> >> (but not quite the same) code, except it adds a bool that tells if
> >> MCE handling is initialized.  Add the same bool for older versions.
> >>
> >> Signed-off-by: Corey Minyard <cminyard@mvista.com>
> >> ---
> >>   arch/x86/kernel/cpu/mcheck/mce.c | 5 ++++-
> >>   1 file changed, 4 insertions(+), 1 deletion(-)
> >>
> >> We noticed this issue on a new Broadwell system when we booted RT
> >> on it.  This patch is for 3.10, I'm not sure if it applies to
> >> other kernel versions.  
> > Do you mean other 'older' versions? and that this works with the
> > versions after 3.10 without this patch?  
> 
> I haven't look at supported kernel versions besides 3.10 and 4.4.
> The fix was from the 4.4 version of this code.  This patch fixes
> v3.10-rt; I can look at finding which other versions need this.  I
> was planning to do this, but I wanted to get the patch out for
> comments first.

I'm not an MCE expert (I just Cc'd one though ;-)

OK, so you are saying that the fix was from 4.4-rt? I can go and look
for it, and if so, I can add it to the "backport" patches I need to do.
Which I need to go and do that soon (backport patches from previous
versions). It may already be in that list.

-- Steve

> 
> -corey
> 
> > -- Steve
> >  
> >> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
> >> index aaf4b9b..7125584 100644
> >> --- a/arch/x86/kernel/cpu/mcheck/mce.c
> >> +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> >> @@ -1365,6 +1365,7 @@ static void __mce_notify_work(void)
> >>   }
> >>   
> >>   #ifdef CONFIG_PREEMPT_RT_FULL
> >> +static bool notify_work_ready __read_mostly;
> >>   struct task_struct *mce_notify_helper;
> >>   
> >>   static int mce_notify_helper_thread(void *unused)
> >> @@ -1386,12 +1387,14 @@ static int mce_notify_work_init(void)
> >>   	if (!mce_notify_helper)
> >>   		return -ENOMEM;
> >>   
> >> +	notify_work_ready = true;
> >>   	return 0;
> >>   }
> >>   
> >>   static void mce_notify_work(void)
> >>   {
> >> -	wake_up_process(mce_notify_helper);
> >> +	if (notify_work_ready)
> >> +		wake_up_process(mce_notify_helper);
> >>   }
> >>   #else
> >>   static void mce_notify_work(void)  


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH][RT] x86: Fix an RT MCE crash
  2016-06-30 15:51     ` Steven Rostedt
@ 2016-06-30 15:58       ` Corey Minyard
  2016-06-30 16:01       ` Borislav Petkov
  2016-06-30 16:04       ` Corey Minyard
  2 siblings, 0 replies; 25+ messages in thread
From: Corey Minyard @ 2016-06-30 15:58 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: linux-rt-users, Corey Minyard, Borislav Petkov

On 06/30/2016 10:51 AM, Steven Rostedt wrote:
> On Thu, 30 Jun 2016 09:49:19 -0500
> Corey Minyard <minyard@acm.org> wrote:
>
>> On 06/30/2016 08:43 AM, Steven Rostedt wrote:
>>> On Thu, 30 Jun 2016 08:24:49 -0500
>>> minyard@acm.org wrote:
>>>   
>>>> From: Corey Minyard <cminyard@mvista.com>
>>>>
>>>> On some x86 systems an MCE interrupt would come in before the kernel
>>>> was ready for it.  Looking at the latest RT code, it has similar
>>>> (but not quite the same) code, except it adds a bool that tells if
>>>> MCE handling is initialized.  Add the same bool for older versions.
>>>>
>>>> Signed-off-by: Corey Minyard <cminyard@mvista.com>
>>>> ---
>>>>    arch/x86/kernel/cpu/mcheck/mce.c | 5 ++++-
>>>>    1 file changed, 4 insertions(+), 1 deletion(-)
>>>>
>>>> We noticed this issue on a new Broadwell system when we booted RT
>>>> on it.  This patch is for 3.10, I'm not sure if it applies to
>>>> other kernel versions.
>>> Do you mean other 'older' versions? and that this works with the
>>> versions after 3.10 without this patch?
>> I haven't look at supported kernel versions besides 3.10 and 4.4.
>> The fix was from the 4.4 version of this code.  This patch fixes
>> v3.10-rt; I can look at finding which other versions need this.  I
>> was planning to do this, but I wanted to get the patch out for
>> comments first.
> I'm not an MCE expert (I just Cc'd one though ;-)

Ok.  It's not really an MCE bug per say, just an initialization
order bug.

>
> OK, so you are saying that the fix was from 4.4-rt? I can go and look
> for it, and if so, I can add it to the "backport" patches I need to do.
> Which I need to go and do that soon (backport patches from previous
> versions). It may already be in that list.

The fix was from 4.4-rt, but it's not a separate fix.  The 4.4 change is
d21959b8ad98 (x86/mce: use swait queue for mce wakeups)
and it's doing the same thing as the 3.10-rt change
49fe500d2abd (x86/mce: Defer mce wakeups to threads for
PREEMPT_RT).

The 3.10-rt change just doesn't have the bool that fixes the
initialization order issue.

-corey

>
> -- Steve
>
>> -corey
>>
>>> -- Steve
>>>   
>>>> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
>>>> index aaf4b9b..7125584 100644
>>>> --- a/arch/x86/kernel/cpu/mcheck/mce.c
>>>> +++ b/arch/x86/kernel/cpu/mcheck/mce.c
>>>> @@ -1365,6 +1365,7 @@ static void __mce_notify_work(void)
>>>>    }
>>>>    
>>>>    #ifdef CONFIG_PREEMPT_RT_FULL
>>>> +static bool notify_work_ready __read_mostly;
>>>>    struct task_struct *mce_notify_helper;
>>>>    
>>>>    static int mce_notify_helper_thread(void *unused)
>>>> @@ -1386,12 +1387,14 @@ static int mce_notify_work_init(void)
>>>>    	if (!mce_notify_helper)
>>>>    		return -ENOMEM;
>>>>    
>>>> +	notify_work_ready = true;
>>>>    	return 0;
>>>>    }
>>>>    
>>>>    static void mce_notify_work(void)
>>>>    {
>>>> -	wake_up_process(mce_notify_helper);
>>>> +	if (notify_work_ready)
>>>> +		wake_up_process(mce_notify_helper);
>>>>    }
>>>>    #else
>>>>    static void mce_notify_work(void)


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH][RT] x86: Fix an RT MCE crash
  2016-06-30 15:51     ` Steven Rostedt
  2016-06-30 15:58       ` Corey Minyard
@ 2016-06-30 16:01       ` Borislav Petkov
  2016-06-30 16:17         ` Luck, Tony
  2016-07-01  9:20         ` Daniel Wagner
  2016-06-30 16:04       ` Corey Minyard
  2 siblings, 2 replies; 25+ messages in thread
From: Borislav Petkov @ 2016-06-30 16:01 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Corey Minyard, linux-rt-users, Corey Minyard, Tony Luck

+ Tony.

On Thu, Jun 30, 2016 at 11:51:01AM -0400, Steven Rostedt wrote:
> > >> From: Corey Minyard <cminyard@mvista.com>
> > >>
> > >> On some x86 systems an MCE interrupt would come in before the kernel
> > >> was ready for it.  Looking at the latest RT code, it has similar
> > >> (but not quite the same) code, except it adds a bool that tells if
> > >> MCE handling is initialized.  Add the same bool for older versions.
> > >>
> > >> Signed-off-by: Corey Minyard <cminyard@mvista.com>
> > >> ---
> > >>   arch/x86/kernel/cpu/mcheck/mce.c | 5 ++++-
> > >>   1 file changed, 4 insertions(+), 1 deletion(-)
> > >>
> > >> We noticed this issue on a new Broadwell system when we booted RT

Do you have any logs which hint at when exactly the MCE gets raised?

> > >> on it.  This patch is for 3.10, I'm not sure if it applies to
> > >> other kernel versions.  
> > > Do you mean other 'older' versions? and that this works with the
> > > versions after 3.10 without this patch?  
> > 
> > I haven't look at supported kernel versions besides 3.10 and 4.4.
> > The fix was from the 4.4 version of this code.  This patch fixes
> > v3.10-rt; I can look at finding which other versions need this.  I
> > was planning to do this, but I wanted to get the patch out for
> > comments first.
> 
> I'm not an MCE expert (I just Cc'd one though ;-)
> 
> OK, so you are saying that the fix was from 4.4-rt? I can go and look
> for it, and if so, I can add it to the "backport" patches I need to do.
> Which I need to go and do that soon (backport patches from previous
> versions). It may already be in that list.
> 
> -- Steve
> 
> > 
> > -corey
> > 
> > > -- Steve
> > >  
> > >> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
> > >> index aaf4b9b..7125584 100644
> > >> --- a/arch/x86/kernel/cpu/mcheck/mce.c
> > >> +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> > >> @@ -1365,6 +1365,7 @@ static void __mce_notify_work(void)
> > >>   }
> > >>   
> > >>   #ifdef CONFIG_PREEMPT_RT_FULL
> > >> +static bool notify_work_ready __read_mostly;
> > >>   struct task_struct *mce_notify_helper;
> > >>   
> > >>   static int mce_notify_helper_thread(void *unused)
> > >> @@ -1386,12 +1387,14 @@ static int mce_notify_work_init(void)

Hmm, what is mce_notify_work_init() ?

This must be some RT-homegrown thing.

What it is supposed to do? Upstream is much different from 3.10 or
whatever that kernel version is.

> > >>   	if (!mce_notify_helper)
> > >>   		return -ENOMEM;
> > >>   
> > >> +	notify_work_ready = true;
> > >>   	return 0;
> > >>   }
> > >>   
> > >>   static void mce_notify_work(void)

That is gone upstream too AFAICT.

> > >>   {
> > >> -	wake_up_process(mce_notify_helper);
> > >> +	if (notify_work_ready)
> > >> +		wake_up_process(mce_notify_helper);
> > >>   }
> > >>   #else
> > >>   static void mce_notify_work(void)  

Color me puzzled.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH][RT] x86: Fix an RT MCE crash
  2016-06-30 15:51     ` Steven Rostedt
  2016-06-30 15:58       ` Corey Minyard
  2016-06-30 16:01       ` Borislav Petkov
@ 2016-06-30 16:04       ` Corey Minyard
  2 siblings, 0 replies; 25+ messages in thread
From: Corey Minyard @ 2016-06-30 16:04 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: linux-rt-users, Corey Minyard, Borislav Petkov

On 06/30/2016 10:51 AM, Steven Rostedt wrote:
> On Thu, 30 Jun 2016 09:49:19 -0500
> Corey Minyard <minyard@acm.org> wrote:
>
>> On 06/30/2016 08:43 AM, Steven Rostedt wrote:
>>> On Thu, 30 Jun 2016 08:24:49 -0500
>>> minyard@acm.org wrote:
>>>   
>>>> From: Corey Minyard <cminyard@mvista.com>
>>>>
>>>> On some x86 systems an MCE interrupt would come in before the kernel
>>>> was ready for it.  Looking at the latest RT code, it has similar
>>>> (but not quite the same) code, except it adds a bool that tells if
>>>> MCE handling is initialized.  Add the same bool for older versions.
>>>>
>>>> Signed-off-by: Corey Minyard <cminyard@mvista.com>
>>>> ---
>>>>    arch/x86/kernel/cpu/mcheck/mce.c | 5 ++++-
>>>>    1 file changed, 4 insertions(+), 1 deletion(-)
>>>>
>>>> We noticed this issue on a new Broadwell system when we booted RT
>>>> on it.  This patch is for 3.10, I'm not sure if it applies to
>>>> other kernel versions.
>>> Do you mean other 'older' versions? and that this works with the
>>> versions after 3.10 without this patch?
>> I haven't look at supported kernel versions besides 3.10 and 4.4.
>> The fix was from the 4.4 version of this code.  This patch fixes
>> v3.10-rt; I can look at finding which other versions need this.  I
>> was planning to do this, but I wanted to get the patch out for
>> comments first.

FYI, it looks like 3.12-rt and 3.14-rt have the same issue.  3.18-rt has
the same code as 4.4-rt.

-corey

> I'm not an MCE expert (I just Cc'd one though ;-)
>
> OK, so you are saying that the fix was from 4.4-rt? I can go and look
> for it, and if so, I can add it to the "backport" patches I need to do.
> Which I need to go and do that soon (backport patches from previous
> versions). It may already be in that list.
>
> -- Steve
>
>> -corey
>>
>>> -- Steve
>>>   
>>>> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
>>>> index aaf4b9b..7125584 100644
>>>> --- a/arch/x86/kernel/cpu/mcheck/mce.c
>>>> +++ b/arch/x86/kernel/cpu/mcheck/mce.c
>>>> @@ -1365,6 +1365,7 @@ static void __mce_notify_work(void)
>>>>    }
>>>>    
>>>>    #ifdef CONFIG_PREEMPT_RT_FULL
>>>> +static bool notify_work_ready __read_mostly;
>>>>    struct task_struct *mce_notify_helper;
>>>>    
>>>>    static int mce_notify_helper_thread(void *unused)
>>>> @@ -1386,12 +1387,14 @@ static int mce_notify_work_init(void)
>>>>    	if (!mce_notify_helper)
>>>>    		return -ENOMEM;
>>>>    
>>>> +	notify_work_ready = true;
>>>>    	return 0;
>>>>    }
>>>>    
>>>>    static void mce_notify_work(void)
>>>>    {
>>>> -	wake_up_process(mce_notify_helper);
>>>> +	if (notify_work_ready)
>>>> +		wake_up_process(mce_notify_helper);
>>>>    }
>>>>    #else
>>>>    static void mce_notify_work(void)


^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: [PATCH][RT] x86: Fix an RT MCE crash
  2016-06-30 16:01       ` Borislav Petkov
@ 2016-06-30 16:17         ` Luck, Tony
  2016-06-30 16:40           ` Corey Minyard
  2016-07-01  9:20         ` Daniel Wagner
  1 sibling, 1 reply; 25+ messages in thread
From: Luck, Tony @ 2016-06-30 16:17 UTC (permalink / raw)
  To: Borislav Petkov, Steven Rostedt
  Cc: Corey Minyard, linux-rt-users, Corey Minyard

> Do you have any logs which hint at when exactly the MCE gets raised?

And the values of the MCi_STATUS (+ADDR/MISC if appropriate) registers that were being logged.

Did the kernel try to log in response to an INT#18 machine check? A CMCI? Or was the kernel
just polling the banks (it does that early in boot to see if there are errors left over from a previous
crash).

-Tony

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH][RT] x86: Fix an RT MCE crash
  2016-06-30 16:17         ` Luck, Tony
@ 2016-06-30 16:40           ` Corey Minyard
  2016-06-30 17:01             ` Borislav Petkov
  0 siblings, 1 reply; 25+ messages in thread
From: Corey Minyard @ 2016-06-30 16:40 UTC (permalink / raw)
  To: Luck, Tony, Borislav Petkov, Steven Rostedt; +Cc: linux-rt-users, Corey Minyard

On 06/30/2016 11:17 AM, Luck, Tony wrote:
>> Do you have any logs which hint at when exactly the MCE gets raised?
> And the values of the MCi_STATUS (+ADDR/MISC if appropriate) registers that were being logged.

I don't see any of that logged.

> Did the kernel try to log in response to an INT#18 machine check? A CMCI? Or was the kernel
> just polling the banks (it does that early in boot to see if there are errors left over from a previous
> crash).

I'm not sure.  I've included the entire boot log below...

-corey

> -Tony

[    0.000000] Initializing cgroup subsys cpuset
[    0.000000] Initializing cgroup subsys cpu
[    0.000000] Initializing cgroup subsys cpuacct
[    0.000000] Linux version 3.10.102-rt112 (cminyard@t430) (gcc version 
5.3.1 20160413 (Ubuntu 5.3.1-14ubuntu2.1) ) #1 SMP PREEMPT RT Thu Jun 30 
11:26:20 CDT 2016
[    0.000000] Command line: BOOT_IMAGE=/tftpboot/Wren root=/dev/nfs rw 
ip=bootp console=ttyS0,115200n8
[    0.000000] e820: BIOS-provided physical RAM map:
[    0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
[    0.000000] BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] 
reserved
[    0.000000] BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff] 
reserved
[    0.000000] BIOS-e820: [mem 0x0000000000100000-0x0000000077720fff] usable
[    0.000000] BIOS-e820: [mem 0x0000000077721000-0x00000000777a1fff] 
reserved
[    0.000000] BIOS-e820: [mem 0x00000000777a2000-0x0000000079130fff] usable
[    0.000000] BIOS-e820: [mem 0x0000000079131000-0x000000007b309fff] 
reserved
[    0.000000] BIOS-e820: [mem 0x000000007b30a000-0x000000007b96cfff] 
ACPI NVS
[    0.000000] BIOS-e820: [mem 0x000000007b96d000-0x000000007bad8fff] 
ACPI data
[    0.000000] BIOS-e820: [mem 0x000000007bad9000-0x000000007bafffff] usable
[    0.000000] BIOS-e820: [mem 0x000000007bb00000-0x000000008fffffff] 
reserved
[    0.000000] BIOS-e820: [mem 0x00000000fed1c000-0x00000000fed1ffff] 
reserved
[    0.000000] BIOS-e820: [mem 0x00000000ff800000-0x00000000ffffffff] 
reserved
[    0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000017fffffff] usable
[    0.000000] NX (Execute Disable) protection: active
[    0.000000] SMBIOS 2.7 present.
[    0.000000] No AGP bridge found
[    0.000000] e820: last_pfn = 0x180000 max_arch_pfn = 0x400000000
[    0.000000] x86 PAT enabled: cpu 0, old 0x7040600070406, new 
0x7010600070106
[    0.000000] e820: last_pfn = 0x7bb00 max_arch_pfn = 0x400000000
[    0.000000] Scanning 1 areas for low memory corruption
[    0.000000] Using GB pages for direct mapping
[    0.000000] init_memory_mapping: [mem 0x00000000-0x000fffff]
[    0.000000] init_memory_mapping: [mem 0x17fe00000-0x17fffffff]
[    0.000000] init_memory_mapping: [mem 0x17c000000-0x17fdfffff]
[    0.000000] init_memory_mapping: [mem 0x100000000-0x17bffffff]
[    0.000000] init_memory_mapping: [mem 0x00100000-0x77720fff]
[    0.000000] init_memory_mapping: [mem 0x777a2000-0x79130fff]
[    0.000000] init_memory_mapping: [mem 0x7bad9000-0x7bafffff]
[    0.000000] ACPI: RSDP 0x00000000000F0540 00024 (v02 INTEL )
[    0.000000] ACPI: XSDT 0x000000007BAD70E8 000AC (v01 INTEL  INTEL ID 
00000000 INTL 01000013)
[    0.000000] ACPI: FACP 0x000000007BAD6000 000F4 (v04 INTEL  INTEL ID 
00000000 INTL 20091013)
[    0.000000] ACPI: DSDT 0x000000007BA9B000 3099F (v02 INTEL  INTEL ID 
00000003 INTL 20091013)
[    0.000000] ACPI: FACS 0x000000007B969000 00040
[    0.000000] ACPI: UEFI 0x000000007B95A000 00042 (v01 INTEL EDK2     
00000002      01000013)
[    0.000000] ACPI: HPET 0x000000007BAD5000 00038 (v01 INTEL  INTEL ID 
00000001 INTL 20091013)
[    0.000000] ACPI: APIC 0x000000007BAD4000 00B10 (v02 INTEL  INTEL ID 
00000000 INTL 20091013)
[    0.000000] ACPI: MCFG 0x000000007BAD3000 0003C (v01 INTEL  INTEL ID 
00000001 INTL 20091013)
[    0.000000] ACPI: MIGT 0x000000007BAD2000 00040 (v01 INTEL  INTEL ID 
00000000 INTL 20091013)
[    0.000000] ACPI: MSCT 0x000000007BAD1000 00090 (v01 INTEL  INTEL ID 
00000001 INTL 20091013)
[    0.000000] ACPI: SLIT 0x000000007BAD0000 0006C (v01 INTEL  INTEL ID 
00000001 INTL 20091013)
[    0.000000] ACPI: SRAT 0x000000007BACE000 01130 (v03 INTEL  INTEL ID 
00000001 INTL 20091013)
[    0.000000] ACPI: SVOS 0x000000007BACD000 00032 (v01 INTEL  INTEL ID 
00000000 INTL 20091013)
[    0.000000] ACPI: WDDT 0x000000007BACC000 00040 (v01 INTEL  INTEL ID 
00000000 INTL 20091013)
[    0.000000] ACPI: SSDT 0x000000007B99E000 FCD22 (v02 INTEL  SSDT PM 
00004000 INTL 20130328)
[    0.000000] ACPI: SSDT 0x000000007B99B000 027FC (v02 INTEL SpsNm    
00000002 INTL 20130328)
[    0.000000] ACPI: SSDT 0x000000007B99A000 00064 (v02 INTEL SpsNvs   
00000002 INTL 20130328)
[    0.000000] ACPI: PRAD 0x000000007B999000 00102 (v02 INTEL SpsPrAgg 
00000002 INTL 20130328)
[    0.000000] ACPI: SPCR 0x000000007B998000 00050 (v01                 
00000000      00000000)
[    0.000000] ACPI: DMAR 0x000000007B997000 00080 (v01 INTEL  INTEL ID 
00000001 INTL 20091013)
[    0.000000] SRAT: PXM 0 -> APIC 0x00 -> Node 0
[    0.000000] SRAT: PXM 0 -> APIC 0x02 -> Node 0
[    0.000000] SRAT: PXM 0 -> APIC 0x04 -> Node 0
[    0.000000] SRAT: PXM 0 -> APIC 0x06 -> Node 0
[    0.000000] SRAT: PXM 0 -> APIC 0x08 -> Node 0
[    0.000000] SRAT: PXM 0 -> APIC 0x0a -> Node 0
[    0.000000] SRAT: PXM 0 -> APIC 0x0c -> Node 0
[    0.000000] SRAT: PXM 0 -> APIC 0x0e -> Node 0
[    0.000000] SRAT: PXM 0 -> APIC 0x01 -> Node 0
[    0.000000] SRAT: PXM 0 -> APIC 0x03 -> Node 0
[    0.000000] SRAT: PXM 0 -> APIC 0x05 -> Node 0
[    0.000000] SRAT: PXM 0 -> APIC 0x07 -> Node 0
[    0.000000] SRAT: PXM 0 -> APIC 0x09 -> Node 0
[    0.000000] SRAT: PXM 0 -> APIC 0x0b -> Node 0
[    0.000000] SRAT: PXM 0 -> APIC 0x0d -> Node 0
[    0.000000] SRAT: PXM 0 -> APIC 0x0f -> Node 0
[    0.000000] SRAT: Node 0 PXM 0 [mem 0x00000000-0x17fffffff]
[    0.000000] Initmem setup node 0 [mem 0x00000000-0x17fffffff]
[    0.000000]   NODE_DATA [mem 0x17fff8000-0x17fffcfff]
[    0.000000] Zone ranges:
[    0.000000]   DMA      [mem 0x00001000-0x00ffffff]
[    0.000000]   DMA32    [mem 0x01000000-0xffffffff]
[    0.000000]   Normal   [mem 0x100000000-0x17fffffff]
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x00001000-0x0009efff]
[    0.000000]   node   0: [mem 0x00100000-0x77720fff]
[    0.000000]   node   0: [mem 0x777a2000-0x79130fff]
[    0.000000]   node   0: [mem 0x7bad9000-0x7bafffff]
[    0.000000]   node   0: [mem 0x100000000-0x17fffffff]
[    0.000000] ACPI: PM-Timer IO Port: 0x408
[    0.000000] ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x02] lapic_id[0x02] enabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x04] lapic_id[0x04] enabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x06] lapic_id[0x06] enabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x08] lapic_id[0x08] enabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x0a] lapic_id[0x0a] enabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x0c] lapic_id[0x0c] enabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x0e] lapic_id[0x0e] enabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x03] lapic_id[0x03] enabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x05] lapic_id[0x05] enabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x07] lapic_id[0x07] enabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x09] lapic_id[0x09] enabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x0b] lapic_id[0x0b] enabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x0d] lapic_id[0x0d] enabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x0f] lapic_id[0x0f] enabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x00] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x01] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x02] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x03] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x04] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x05] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x06] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x07] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x08] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x09] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x0a] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x0b] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x0c] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x0d] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x0e] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x0f] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x10] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x11] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x12] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x13] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x14] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x15] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x16] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x17] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x18] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x19] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x1a] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x1b] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x1c] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x1d] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x1e] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x1f] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x20] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x21] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x22] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x23] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x24] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x25] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x26] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x27] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x28] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x29] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x2a] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x2b] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x2c] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x2d] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x2e] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x2f] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x30] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x31] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x32] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x33] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x34] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x35] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x36] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x37] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x38] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x39] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x3a] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x3b] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x3c] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x3d] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x3e] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x3f] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x40] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x41] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x42] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x43] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x44] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x45] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x46] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x47] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x48] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x49] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x4a] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x4b] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x4c] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x4d] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x4e] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x4f] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x50] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x51] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x52] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x53] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x54] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x55] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x56] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x57] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x58] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x59] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x5a] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x5b] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x5c] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x5d] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x5e] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x5f] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x60] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x61] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x62] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x63] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x64] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x65] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x66] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x67] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x68] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x69] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x6a] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x6b] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x6c] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x6d] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x6e] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x6f] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x70] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x71] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x72] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x73] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x74] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x75] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x76] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x77] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x78] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x79] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x7a] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x7b] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x7c] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x7d] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x7e] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x7f] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x80] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x81] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x82] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x83] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x84] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x85] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x86] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x87] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x88] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x89] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x8a] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x8b] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x8c] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x8d] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x8e] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x8f] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x90] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x91] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x92] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x93] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x94] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x95] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x96] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x97] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x98] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x99] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x9a] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x9b] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x9c] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x9d] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x9e] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x9f] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0xa0] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0xa1] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0xa2] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0xa3] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0xa4] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0xa5] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0xa6] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0xa7] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0xa8] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0xa9] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0xaa] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0xab] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0xac] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0xad] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0xae] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0xaf] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0xb0] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0xb1] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0xb2] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0xb3] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0xb4] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0xb5] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0xb6] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0xb7] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0xb8] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0xb9] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0xba] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0xbb] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0xbc] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0xbd] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0xbe] high level lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0xbf] high level lint[0x1])
[    0.000000] ACPI: IOAPIC (id[0x08] address[0xfec00000] gsi_base[0])
[    0.000000] IOAPIC[0]: apic_id 8, version 32, address 0xfec00000, GSI 
0-23
[    0.000000] ACPI: IOAPIC (id[0x09] address[0xfec01000] gsi_base[24])
[    0.000000] IOAPIC[1]: apic_id 9, version 32, address 0xfec01000, GSI 
24-47
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 3 global_irq 3 low level)
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 4 global_irq 4 low level)
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
[    0.000000] Using ACPI (MADT) for SMP configuration information
[    0.000000] ACPI: HPET id: 0x8086a701 base: 0xfed00000
[    0.000000] smpboot: 192 Processors exceeds NR_CPUS limit of 32
[    0.000000] smpboot: Allowing 32 CPUs, 16 hotplug CPUs
[    0.000000] PM: Registered nosave memory: 000000000009f000 - 
00000000000a0000
[    0.000000] PM: Registered nosave memory: 00000000000a0000 - 
00000000000e0000
[    0.000000] PM: Registered nosave memory: 00000000000e0000 - 
0000000000100000
[    0.000000] PM: Registered nosave memory: 0000000077721000 - 
00000000777a2000
[    0.000000] PM: Registered nosave memory: 0000000079131000 - 
000000007b30a000
[    0.000000] PM: Registered nosave memory: 000000007b30a000 - 
000000007b96d000
[    0.000000] PM: Registered nosave memory: 000000007b96d000 - 
000000007bad9000
[    0.000000] PM: Registered nosave memory: 000000007bb00000 - 
0000000090000000
[    0.000000] PM: Registered nosave memory: 0000000090000000 - 
00000000fed1c000
[    0.000000] PM: Registered nosave memory: 00000000fed1c000 - 
00000000fed20000
[    0.000000] PM: Registered nosave memory: 00000000fed20000 - 
00000000ff800000
[    0.000000] PM: Registered nosave memory: 00000000ff800000 - 
0000000100000000
[    0.000000] e820: [mem 0x90000000-0xfed1bfff] available for PCI devices
[    0.000000] setup_percpu: NR_CPUS:32 nr_cpumask_bits:32 nr_cpu_ids:32 
nr_node_ids:1
[    0.000000] PERCPU: Embedded 25 pages/cpu @ffff88017fa00000 s73152 
r8192 d21056 u131072
[    0.000000] Built 1 zonelists in Zone order, mobility grouping on.  
Total pages: 1004060
[    0.000000] Policy zone: Normal
[    0.000000] Kernel command line: BOOT_IMAGE=/tftpboot/Wren 
root=/dev/nfs rw ip=bootp console=ttyS0,115200n8
[    0.000000] PID hash table entries: 4096 (order: 3, 32768 bytes)
[    0.000000] xsave: enabled xstate_bv 0x7, cntxt size 0x340
[    0.000000] Checking aperture...
[    0.000000] No AGP bridge found
[    0.000000] Memory: 3930216k/6291456k available (7156k kernel code, 
2211372k absent, 149868k reserved, 5704k data, 896k init)
[    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=32, Nodes=1
[    0.000000] Preemptible hierarchical RCU implementation.
[    0.000000]     CONFIG_RCU_FANOUT set to non-default value of 32
[    0.000000] NR_IRQS:4352 nr_irqs:1344 16
[    0.000000] Console: colour VGA+ 80x25
[    0.000000] console [ttyS0] enabled
[    0.000000] allocated 67108864 bytes of page_cgroup
[    0.000000] please try 'cgroup_disable=memory' option if you don't 
want memory cgroups
[    0.000000] hpet clockevent registered
[    0.000000] tsc: Fast TSC calibration using PIT
[    0.001000] tsc: Detected 1396.737 MHz processor
[    0.000078] Calibrating delay loop (skipped), value calculated using 
timer frequency.. 2793.47 BogoMIPS (lpj=1396737)
[    0.000087] pid_max: default: 32768 minimum: 301
[    0.000331] Security Framework initialized
[    0.001704] Dentry cache hash table entries: 524288 (order: 11, 
8388608 bytes)
[    0.038268] Inode-cache hash table entries: 262144 (order: 9, 2097152 
bytes)
[    0.047030] Mount-cache hash table entries: 256
[    0.048697] Initializing cgroup subsys memory
[    0.048857] Initializing cgroup subsys devices
[    0.048869] Initializing cgroup subsys freezer
[    0.048886] Initializing cgroup subsys net_cls
[    0.048901] Initializing cgroup subsys blkio
[    0.048912] Initializing cgroup subsys perf_event
[    0.048922] Initializing cgroup subsys net_prio
[    0.048936] Initializing cgroup subsys hugetlb
[    0.049210] CPU: Physical Processor ID: 0
[    0.049215] CPU: Processor Core ID: 0
[    0.049246] ENERGY_PERF_BIAS: Set to 'normal', was 'performance'
[    0.049246] ENERGY_PERF_BIAS: View and update with 
x86_energy_perf_policy(8)
[    0.049270] mce: CPU supports 22 MCE banks
[    0.049383] CPU0: Thermal monitoring enabled (TM1)
[    0.049465] Last level iTLB entries: 4KB 0, 2MB 0, 4MB 0
[    0.049465] Last level dTLB entries: 4KB 64, 2MB 0, 4MB 0
[    0.049465] tlb_flushall_shift: 6
[    0.050830] Freeing SMP alternatives: 28k freed
[    0.050877] ACPI: Core revision 20130328
[    0.162702] ACPI: All ACPI Tables successfully acquired
[    0.163841] Switched APIC routing to physical flat.
[    0.163984] BUG: unable to handle kernel NULL pointer dereference at 
0000000000000600
[    0.164075] IP: [<ffffffff816f344d>] _raw_spin_lock_irqsave+0x1d/0x50
[    0.164077] PGD 0
[    0.164079] Oops: 0002 [#1] PREEMPT SMP
[    0.164088] Modules linked in:
[    0.164099] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.10.102-rt112 #1
[    0.164105] Hardware name: Intel Corp. GRANGEVILLE/GRANTLEY, BIOS 
GNVDCRB1.86B.0030.R00.1411131050 11/13/2014
[    0.164106] task: ffff880176680840 ti: ffff880176076000 task.ti: 
ffff880176076000
[    0.164112] RIP: 0010:[<ffffffff816f344d>] [<ffffffff816f344d>] 
_raw_spin_lock_irqsave+0x1d/0x50
[    0.164115] RSP: 0000:ffff88017fa03f00  EFLAGS: 00010097
[    0.164116] RAX: ffff880176077fd8 RBX: 0000000000000600 RCX: 
0000000000010001
[    0.164117] RDX: 0000000000000100 RSI: 0000000000000000 RDI: 
0000000000000001
[    0.164118] RBP: ffff88017fa03f10 R08: 0000000000000000 R09: 
0000000000000000
[    0.164118] R10: 0000000000000000 R11: 0000000000000038 R12: 
0000000000000082
[    0.164119] R13: 0000000000000000 R14: 0000000000000000 R15: 
0000000000000000
[    0.164121] FS:  0000000000000000(0000) GS:ffff88017fa00000(0000) 
knlGS:0000000000000000
[    0.164122] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.164122] CR2: 0000000000000600 CR3: 0000000001c0e000 CR4: 
00000000003406f0
[    0.164123] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
0000000000000000
[    0.164124] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 
0000000000000400
[    0.164124] Stack:
[    0.164148]  0000000000000003 0000000000000600 ffff88017fa03f60 
ffffffff8106dcd8
[    0.164150]  0000000000000000 0000000000000000 000000005775bcb3 
ffff88017fa0bc60
[    0.164153]  0000000000000202 0000000000000000 0000000000000000 
0000000000000000
[    0.164153] Call Trace:
[    0.164165]  <IRQ>
[    0.164185]  [<ffffffff8106dcd8>] try_to_wake_up+0x28/0x320
[    0.164188]  [<ffffffff8106dfe0>] wake_up_process+0x10/0x20
[    0.164207]  [<ffffffff8101c548>] mce_notify_irq+0x28/0x30
[    0.164210]  [<ffffffff8101df35>] intel_threshold_interrupt+0xb5/0xd0
[    0.164213]  [<ffffffff8101e88c>] smp_threshold_interrupt+0x1c/0x40
[    0.164221]  [<ffffffff816f9b5a>] threshold_interrupt+0x6a/0x70
[    0.164223]  <EOI>
[    0.164226]  [<ffffffff8101dda7>] ? cmci_recheck+0x67/0x70
[    0.164241]  [<ffffffff816e9777>] setup_local_APIC+0x276/0x283
[    0.164259]  [<ffffffff81caf010>] native_smp_prepare_cpus+0x379/0x43b
[    0.164266]  [<ffffffff81ca3e4f>] kernel_init_freeable+0xd7/0x21a
[    0.164270]  [<ffffffff816df1f0>] ? rest_init+0x90/0x90
[    0.164272]  [<ffffffff816df1f9>] kernel_init+0x9/0x180
[    0.164275]  [<ffffffff816f8dc8>] ret_from_fork+0x58/0x90
[    0.164277]  [<ffffffff816df1f0>] ? rest_init+0x90/0x90
[    0.164295] Code: e7 ff ff 48 8b 7d 08 e8 02 1a 95 ff 5d c3 55 48 89 
e5 41 54 53 48 89 fb 9c 41 5c fa bf 01 00 00 00 e8 a8 38 00 00 ba 00 01 
00 00 <f0> 66 0f c1 13 0f b6 ce 38 d1 74 10 0f 1f 80 00 00 00 00 f3 90
[    0.164298] RIP  [<ffffffff816f344d>] _raw_spin_lock_irqsave+0x1d/0x50
[    0.164298]  RSP <ffff88017fa03f00>
[    0.164299] CR2: 0000000000000600
[    0.656225] ---[ end trace 0000000000000001 ]---
[    0.656233] Kernel panic - not syncing: Fatal exception in interrupt


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH][RT] x86: Fix an RT MCE crash
  2016-06-30 16:40           ` Corey Minyard
@ 2016-06-30 17:01             ` Borislav Petkov
  2016-06-30 17:18               ` Corey Minyard
  0 siblings, 1 reply; 25+ messages in thread
From: Borislav Petkov @ 2016-06-30 17:01 UTC (permalink / raw)
  To: Corey Minyard; +Cc: Luck, Tony, Steven Rostedt, linux-rt-users, Corey Minyard

On Thu, Jun 30, 2016 at 11:40:17AM -0500, Corey Minyard wrote:
> I'm not sure.  I've included the entire boot log below...

...

> [    0.164185]  [<ffffffff8106dcd8>] try_to_wake_up+0x28/0x320
> [    0.164188]  [<ffffffff8106dfe0>] wake_up_process+0x10/0x20
> [    0.164207]  [<ffffffff8101c548>] mce_notify_irq+0x28/0x30
> [    0.164210]  [<ffffffff8101df35>] intel_threshold_interrupt+0xb5/0xd0
> [    0.164213]  [<ffffffff8101e88c>] smp_threshold_interrupt+0x1c/0x40
> [    0.164221]  [<ffffffff816f9b5a>] threshold_interrupt+0x6a/0x70
> [    0.164223]  <EOI>
> [    0.164226]  [<ffffffff8101dda7>] ? cmci_recheck+0x67/0x70
> [    0.164241]  [<ffffffff816e9777>] setup_local_APIC+0x276/0x283
> [    0.164259]  [<ffffffff81caf010>] native_smp_prepare_cpus+0x379/0x43b
> [    0.164266]  [<ffffffff81ca3e4f>] kernel_init_freeable+0xd7/0x21a
> [    0.164270]  [<ffffffff816df1f0>] ? rest_init+0x90/0x90
> [    0.164272]  [<ffffffff816df1f9>] kernel_init+0x9/0x180
> [    0.164275]  [<ffffffff816f8dc8>] ret_from_fork+0x58/0x90
> [    0.164277]  [<ffffffff816df1f0>] ? rest_init+0x90/0x90
> [    0.164295] Code: e7 ff ff 48 8b 7d 08 e8 02 1a 95 ff 5d c3 55 48 89 e5
> 41 54 53 48 89 fb 9c 41 5c fa bf 01 00 00 00 e8 a8 38 00 00 ba 00 01 00 00
> <f0> 66 0f c1 13 0f b6 ce 38 d1 74 10 0f 1f 80 00 00 00 00 f3 90
> [    0.164298] RIP  [<ffffffff816f344d>] _raw_spin_lock_irqsave+0x1d/0x50
> [    0.164298]  RSP <ffff88017fa03f00>
> [    0.164299] CR2: 0000000000000600
> [    0.656225] ---[ end trace 0000000000000001 ]---
> [    0.656233] Kernel panic - not syncing: Fatal exception in interrupt

Hmm, we have that setup_local_APIC -> cmci_recheck path on latest kernel
too. However, we do init CMCI earlier, down the start_kernel() path.

Would it be possible to boot latest upstream kernel on it to see whether
it explodes the same way?

Thanks.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH][RT] x86: Fix an RT MCE crash
  2016-06-30 17:01             ` Borislav Petkov
@ 2016-06-30 17:18               ` Corey Minyard
  2016-06-30 17:26                 ` Borislav Petkov
  0 siblings, 1 reply; 25+ messages in thread
From: Corey Minyard @ 2016-06-30 17:18 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: Luck, Tony, Steven Rostedt, linux-rt-users, Corey Minyard

On 06/30/2016 12:01 PM, Borislav Petkov wrote:
> On Thu, Jun 30, 2016 at 11:40:17AM -0500, Corey Minyard wrote:
>> I'm not sure.  I've included the entire boot log below...
> ...
>
>> [    0.164185]  [<ffffffff8106dcd8>] try_to_wake_up+0x28/0x320
>> [    0.164188]  [<ffffffff8106dfe0>] wake_up_process+0x10/0x20
>> [    0.164207]  [<ffffffff8101c548>] mce_notify_irq+0x28/0x30
>> [    0.164210]  [<ffffffff8101df35>] intel_threshold_interrupt+0xb5/0xd0
>> [    0.164213]  [<ffffffff8101e88c>] smp_threshold_interrupt+0x1c/0x40
>> [    0.164221]  [<ffffffff816f9b5a>] threshold_interrupt+0x6a/0x70
>> [    0.164223]  <EOI>
>> [    0.164226]  [<ffffffff8101dda7>] ? cmci_recheck+0x67/0x70
>> [    0.164241]  [<ffffffff816e9777>] setup_local_APIC+0x276/0x283
>> [    0.164259]  [<ffffffff81caf010>] native_smp_prepare_cpus+0x379/0x43b
>> [    0.164266]  [<ffffffff81ca3e4f>] kernel_init_freeable+0xd7/0x21a
>> [    0.164270]  [<ffffffff816df1f0>] ? rest_init+0x90/0x90
>> [    0.164272]  [<ffffffff816df1f9>] kernel_init+0x9/0x180
>> [    0.164275]  [<ffffffff816f8dc8>] ret_from_fork+0x58/0x90
>> [    0.164277]  [<ffffffff816df1f0>] ? rest_init+0x90/0x90
>> [    0.164295] Code: e7 ff ff 48 8b 7d 08 e8 02 1a 95 ff 5d c3 55 48 89 e5
>> 41 54 53 48 89 fb 9c 41 5c fa bf 01 00 00 00 e8 a8 38 00 00 ba 00 01 00 00
>> <f0> 66 0f c1 13 0f b6 ce 38 d1 74 10 0f 1f 80 00 00 00 00 f3 90
>> [    0.164298] RIP  [<ffffffff816f344d>] _raw_spin_lock_irqsave+0x1d/0x50
>> [    0.164298]  RSP <ffff88017fa03f00>
>> [    0.164299] CR2: 0000000000000600
>> [    0.656225] ---[ end trace 0000000000000001 ]---
>> [    0.656233] Kernel panic - not syncing: Fatal exception in interrupt
> Hmm, we have that setup_local_APIC -> cmci_recheck path on latest kernel
> too. However, we do init CMCI earlier, down the start_kernel() path.
>
> Would it be possible to boot latest upstream kernel on it to see whether
> it explodes the same way?
>
> Thanks.

This is on 3.10-rt with PREEMPT_RT enabled.  It appears that from 3.18-rt
and later it has code like the change I have proposed, so it does not crash.

I could add a something to see if the interrupt is coming in early to 
4.6-rt,
is that what you are looking for?

-corey

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH][RT] x86: Fix an RT MCE crash
  2016-06-30 17:18               ` Corey Minyard
@ 2016-06-30 17:26                 ` Borislav Petkov
  2016-06-30 17:54                   ` Corey Minyard
  0 siblings, 1 reply; 25+ messages in thread
From: Borislav Petkov @ 2016-06-30 17:26 UTC (permalink / raw)
  To: Corey Minyard; +Cc: Luck, Tony, Steven Rostedt, linux-rt-users, Corey Minyard

On Thu, Jun 30, 2016 at 12:18:01PM -0500, Corey Minyard wrote:
> This is on 3.10-rt with PREEMPT_RT enabled.  It appears that from 3.18-rt
> and later it has code like the change I have proposed, so it does not crash.
> 
> I could add a something to see if the interrupt is coming in early to
> 4.6-rt,
> is that what you are looking for?

Actually, I'd like to know first whether the unpatched upstream kernel -
not -rt - is crashing.

And then 4.6-rt.

Because from looking at your splat, you're getting a thresholding
interrupt the moment you enable the local APIC and from staring at the
MCE code upstream, I think we should be prepared for that scenario.

AFAICT, both -rt and upstream should handle that case just fine and I'm
guessing upstream was fixed at some point and -rt grew another fix which
is probably not needed and it should take the upstream one instead...

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH][RT] x86: Fix an RT MCE crash
  2016-06-30 17:26                 ` Borislav Petkov
@ 2016-06-30 17:54                   ` Corey Minyard
  2016-06-30 18:22                     ` Borislav Petkov
  0 siblings, 1 reply; 25+ messages in thread
From: Corey Minyard @ 2016-06-30 17:54 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: Luck, Tony, Steven Rostedt, linux-rt-users, Corey Minyard

On 06/30/2016 12:26 PM, Borislav Petkov wrote:
> On Thu, Jun 30, 2016 at 12:18:01PM -0500, Corey Minyard wrote:
>> This is on 3.10-rt with PREEMPT_RT enabled.  It appears that from 3.18-rt
>> and later it has code like the change I have proposed, so it does not crash.
>>
>> I could add a something to see if the interrupt is coming in early to
>> 4.6-rt,
>> is that what you are looking for?
> Actually, I'd like to know first whether the unpatched upstream kernel -
> not -rt - is crashing.

It won't crash.  If you disable PREEMPT_RT on the 3.10-rt kernel it won't
crash (which I have tested).  With PREEMPT_RT, the kernel creates a
separate thread that is woken on mce notifications.  The trouble is
that the interrupts are initialized before the thread is created.

> And then 4.6-rt.
>
> Because from looking at your splat, you're getting a thresholding
> interrupt the moment you enable the local APIC and from staring at the
> MCE code upstream, I think we should be prepared for that scenario.
>
> AFAICT, both -rt and upstream should handle that case just fine and I'm
> guessing upstream was fixed at some point and -rt grew another fix which
> is probably not needed and it should take the upstream one instead...

This is not a bug in mainline.  This is only an RT bug, and only
with PREEMPT_RT enabled.

I can try these things if you really want, but it doesn't seem like
a useful activity to me.

It looks like in 3.18-rt someone noticed this issue and fixed it,
but the fix wasn't backported to earlier kernels.  I'm really just
trying to get that fix backported.

-corey

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH][RT] x86: Fix an RT MCE crash
  2016-06-30 17:54                   ` Corey Minyard
@ 2016-06-30 18:22                     ` Borislav Petkov
  2016-06-30 19:44                       ` Corey Minyard
  0 siblings, 1 reply; 25+ messages in thread
From: Borislav Petkov @ 2016-06-30 18:22 UTC (permalink / raw)
  To: Corey Minyard; +Cc: Luck, Tony, Steven Rostedt, linux-rt-users, Corey Minyard

On Thu, Jun 30, 2016 at 12:54:14PM -0500, Corey Minyard wrote:
> It won't crash.  If you disable PREEMPT_RT on the 3.10-rt kernel it won't
> crash (which I have tested).  With PREEMPT_RT, the kernel creates a
> separate thread that is woken on mce notifications.  The trouble is
> that the interrupts are initialized before the thread is created.

Hmmm.

Ok, so I don't have any idea what RT does but from looking at your splat:

[    0.164153] Call Trace:
[    0.164165]  <IRQ>
[    0.164185]  [<ffffffff8106dcd8>] try_to_wake_up+0x28/0x320
[    0.164188]  [<ffffffff8106dfe0>] wake_up_process+0x10/0x20
[    0.164207]  [<ffffffff8101c548>] mce_notify_irq+0x28/0x30
[    0.164210]  [<ffffffff8101df35>] intel_threshold_interrupt+0xb5/0xd0
[    0.164213]  [<ffffffff8101e88c>] smp_threshold_interrupt+0x1c/0x40
[    0.164221]  [<ffffffff816f9b5a>] threshold_interrupt+0x6a/0x70
[    0.164223]  <EOI>
[    0.164226]  [<ffffffff8101dda7>] ? cmci_recheck+0x67/0x70
[    0.164241]  [<ffffffff816e9777>] setup_local_APIC+0x276/0x283
[    0.164259]  [<ffffffff81caf010>] native_smp_prepare_cpus+0x379/0x43b
[    0.164266]  [<ffffffff81ca3e4f>] kernel_init_freeable+0xd7/0x21a
[    0.164270]  [<ffffffff816df1f0>] ? rest_init+0x90/0x90
[    0.164272]  [<ffffffff816df1f9>] kernel_init+0x9/0x180
[    0.164275]  [<ffffffff816f8dc8>] ret_from_fork+0x58/0x90
[    0.164277]  [<ffffffff816df1f0>] ? rest_init+0x90/0x90
[    0.164295] Code: e7 ff ff 48 8b 7d 08 e8 02 1a 95 ff 5d c3 55 48 89 e5 41
54 53 48 89 fb 9c 41 5c fa bf 01 00 00 00 e8 a8 38 00 00 ba 00 01 00 00 <f0>
66 0f c1 13 0f b6 ce 38 d1 74 10 0f 1f 80 00 00 00 00 f3 90
[    0.164298] RIP  [<ffffffff816f344d>] _raw_spin_lock_irqsave+0x1d/0x50
[    0.164298]  RSP <ffff88017fa03f00>
[    0.164299] CR2: 0000000000000600
[    0.656225] ---[ end trace 0000000000000001 ]---
[    0.656233] Kernel panic - not syncing: Fatal exception in interrupt

we're 0.16 seconds within the boot and we're just initializing the local
APIC and the moment that happens, we get a thresholding APIC interrupt.

So how can interrupts be initialized before that?

I'm genuinely asking because I can't imagine how CMCI can get initialized
*after* the local APIC init.

Because, we do init CMCI in identify_cpu()->mcheck_cpu_init() and that
happens earlier than your splat. You can even see where it happens in
dmesg:

[    0.049270] mce: CPU supports 22 MCE banks
[    0.049383] CPU0: Thermal monitoring enabled (TM1)

First line is __mcheck_cpu_cap_init(), second is intel_init_thermal().

The CMCI initialization is done right after it in

void mce_intel_feature_init(struct cpuinfo_x86 *c)
{
        intel_init_thermal(c);
        intel_init_cmci();


but wait!, this is the upstream kernel. Where can I look at 3.10-rt
sources?

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH][RT] x86: Fix an RT MCE crash
  2016-06-30 18:22                     ` Borislav Petkov
@ 2016-06-30 19:44                       ` Corey Minyard
  2016-06-30 20:34                         ` Borislav Petkov
  0 siblings, 1 reply; 25+ messages in thread
From: Corey Minyard @ 2016-06-30 19:44 UTC (permalink / raw)
  To: Borislav Petkov, Corey Minyard; +Cc: Luck, Tony, Steven Rostedt, linux-rt-users

On 06/30/2016 01:22 PM, Borislav Petkov wrote:
> On Thu, Jun 30, 2016 at 12:54:14PM -0500, Corey Minyard wrote:
>> It won't crash.  If you disable PREEMPT_RT on the 3.10-rt kernel it won't
>> crash (which I have tested).  With PREEMPT_RT, the kernel creates a
>> separate thread that is woken on mce notifications.  The trouble is
>> that the interrupts are initialized before the thread is created.
> Hmmm.
>
> Ok, so I don't have any idea what RT does but from looking at your splat:
>
> [    0.164153] Call Trace:
> [    0.164165]  <IRQ>
> [    0.164185]  [<ffffffff8106dcd8>] try_to_wake_up+0x28/0x320
> [    0.164188]  [<ffffffff8106dfe0>] wake_up_process+0x10/0x20
> [    0.164207]  [<ffffffff8101c548>] mce_notify_irq+0x28/0x30
> [    0.164210]  [<ffffffff8101df35>] intel_threshold_interrupt+0xb5/0xd0
> [    0.164213]  [<ffffffff8101e88c>] smp_threshold_interrupt+0x1c/0x40
> [    0.164221]  [<ffffffff816f9b5a>] threshold_interrupt+0x6a/0x70
> [    0.164223]  <EOI>
> [    0.164226]  [<ffffffff8101dda7>] ? cmci_recheck+0x67/0x70
> [    0.164241]  [<ffffffff816e9777>] setup_local_APIC+0x276/0x283
> [    0.164259]  [<ffffffff81caf010>] native_smp_prepare_cpus+0x379/0x43b
> [    0.164266]  [<ffffffff81ca3e4f>] kernel_init_freeable+0xd7/0x21a
> [    0.164270]  [<ffffffff816df1f0>] ? rest_init+0x90/0x90
> [    0.164272]  [<ffffffff816df1f9>] kernel_init+0x9/0x180
> [    0.164275]  [<ffffffff816f8dc8>] ret_from_fork+0x58/0x90
> [    0.164277]  [<ffffffff816df1f0>] ? rest_init+0x90/0x90
> [    0.164295] Code: e7 ff ff 48 8b 7d 08 e8 02 1a 95 ff 5d c3 55 48 89 e5 41
> 54 53 48 89 fb 9c 41 5c fa bf 01 00 00 00 e8 a8 38 00 00 ba 00 01 00 00 <f0>
> 66 0f c1 13 0f b6 ce 38 d1 74 10 0f 1f 80 00 00 00 00 f3 90
> [    0.164298] RIP  [<ffffffff816f344d>] _raw_spin_lock_irqsave+0x1d/0x50
> [    0.164298]  RSP <ffff88017fa03f00>
> [    0.164299] CR2: 0000000000000600
> [    0.656225] ---[ end trace 0000000000000001 ]---
> [    0.656233] Kernel panic - not syncing: Fatal exception in interrupt
>
> we're 0.16 seconds within the boot and we're just initializing the local
> APIC and the moment that happens, we get a thresholding APIC interrupt.
>
> So how can interrupts be initialized before that?

I don't think they are.  I think there is something about this
particular board.  We aren't having any issues with other systems.

But as you say, the kernel should be ready for this.

>
> I'm genuinely asking because I can't imagine how CMCI can get initialized
> *after* the local APIC init.
>
> Because, we do init CMCI in identify_cpu()->mcheck_cpu_init() and that
> happens earlier than your splat. You can even see where it happens in
> dmesg:
>
> [    0.049270] mce: CPU supports 22 MCE banks
> [    0.049383] CPU0: Thermal monitoring enabled (TM1)
>
> First line is __mcheck_cpu_cap_init(), second is intel_init_thermal().
>
> The CMCI initialization is done right after it in
>
> void mce_intel_feature_init(struct cpuinfo_x86 *c)
> {
>          intel_init_thermal(c);
>          intel_init_cmci();
>
>
> but wait!, this is the upstream kernel. Where can I look at 3.10-rt
> sources?

They are at:

git://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git 
v3.10-rt

-corey

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH][RT] x86: Fix an RT MCE crash
  2016-06-30 19:44                       ` Corey Minyard
@ 2016-06-30 20:34                         ` Borislav Petkov
  2016-06-30 22:47                           ` Corey Minyard
  0 siblings, 1 reply; 25+ messages in thread
From: Borislav Petkov @ 2016-06-30 20:34 UTC (permalink / raw)
  To: Corey Minyard; +Cc: Corey Minyard, Luck, Tony, Steven Rostedt, linux-rt-users

On Thu, Jun 30, 2016 at 02:44:42PM -0500, Corey Minyard wrote:
> >[    0.164153] Call Trace:
> >[    0.164165]  <IRQ>
> >[    0.164185]  [<ffffffff8106dcd8>] try_to_wake_up+0x28/0x320
> >[    0.164188]  [<ffffffff8106dfe0>] wake_up_process+0x10/0x20
> >[    0.164207]  [<ffffffff8101c548>] mce_notify_irq+0x28/0x30
> >[    0.164210]  [<ffffffff8101df35>] intel_threshold_interrupt+0xb5/0xd0
> >[    0.164213]  [<ffffffff8101e88c>] smp_threshold_interrupt+0x1c/0x40
> >[    0.164221]  [<ffffffff816f9b5a>] threshold_interrupt+0x6a/0x70
> >[    0.164223]  <EOI>
> >[    0.164226]  [<ffffffff8101dda7>] ? cmci_recheck+0x67/0x70
> >[    0.164241]  [<ffffffff816e9777>] setup_local_APIC+0x276/0x283
> >[    0.164259]  [<ffffffff81caf010>] native_smp_prepare_cpus+0x379/0x43b
> >[    0.164266]  [<ffffffff81ca3e4f>] kernel_init_freeable+0xd7/0x21a
> >[    0.164270]  [<ffffffff816df1f0>] ? rest_init+0x90/0x90
> >[    0.164272]  [<ffffffff816df1f9>] kernel_init+0x9/0x180
> >[    0.164275]  [<ffffffff816f8dc8>] ret_from_fork+0x58/0x90
> >[    0.164277]  [<ffffffff816df1f0>] ? rest_init+0x90/0x90
> >[    0.164295] Code: e7 ff ff 48 8b 7d 08 e8 02 1a 95 ff 5d c3 55 48 89 e5 41
> >54 53 48 89 fb 9c 41 5c fa bf 01 00 00 00 e8 a8 38 00 00 ba 00 01 00 00 <f0>
> >66 0f c1 13 0f b6 ce 38 d1 74 10 0f 1f 80 00 00 00 00 f3 90
> >[    0.164298] RIP  [<ffffffff816f344d>] _raw_spin_lock_irqsave+0x1d/0x50
> >[    0.164298]  RSP <ffff88017fa03f00>
> >[    0.164299] CR2: 0000000000000600
> >[    0.656225] ---[ end trace 0000000000000001 ]---
> >[    0.656233] Kernel panic - not syncing: Fatal exception in interrupt
> >
> >we're 0.16 seconds within the boot and we're just initializing the local
> >APIC and the moment that happens, we get a thresholding APIC interrupt.
> >
> >So how can interrupts be initialized before that?
> 
> I don't think they are.  I think there is something about this
> particular board.  We aren't having any issues with other systems.

Right, so the fact that it raises the thresholding interrupt could
mean that it generates a bunch of correctable ECC errors and it hits a
threshold which is signalled by that interrupt.

And if that is true, then you should be seeing some errors in mcelog or
sb_edac reporting some.

You could, just in case, try latest upstream and enable
CONFIG_EDAC_SBRIDGE and check dmesg for some ECCs.

Or, of course, something else entirely might be funny with that box,
causing that interrupt to fire.

> But as you say, the kernel should be ready for this.

Right, and we've removed that mce_notify_irq() call in
intel_threshold_interrupt() with

  f29a7aff4bd6 ("x86/mce: Avoid potential deadlock due to printk() in MCE context")

but that's more of a side-effect of that patch.

And if you want to backport it, you'd need the mce_gen_pool_add() and
remaining machinery for the genpool.

Presumably, booting with "mce=no_cmci" should fix this but then you
won't have the CMCI thresholding, i.e., the interrupt which gets raised
when a certain amount of correctable errors has been generated.

Hmm, a funny box that.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH][RT] x86: Fix an RT MCE crash
  2016-06-30 20:34                         ` Borislav Petkov
@ 2016-06-30 22:47                           ` Corey Minyard
  2016-07-01  7:20                             ` Borislav Petkov
  0 siblings, 1 reply; 25+ messages in thread
From: Corey Minyard @ 2016-06-30 22:47 UTC (permalink / raw)
  To: Borislav Petkov, Corey Minyard; +Cc: Luck, Tony, Steven Rostedt, linux-rt-users

On 06/30/2016 03:34 PM, Borislav Petkov wrote:
> On Thu, Jun 30, 2016 at 02:44:42PM -0500, Corey Minyard wrote:
>> I don't think they are.  I think there is something about this
>> particular board.  We aren't having any issues with other systems.
> Right, so the fact that it raises the thresholding interrupt could
> mean that it generates a bunch of correctable ECC errors and it hits a
> threshold which is signalled by that interrupt.
>
> And if that is true, then you should be seeing some errors in mcelog or
> sb_edac reporting some.
>
> You could, just in case, try latest upstream and enable
> CONFIG_EDAC_SBRIDGE and check dmesg for some ECCs.
>
> Or, of course, something else entirely might be funny with that box,
> causing that interrupt to fire.

You are right, I enabled that on the tip of master and I get the
following spewing out for a while:

EDAC MC0: 27843 CE memory read error on CPU_SrcID#0_Ha#0_Chan#1_DIMM#0 
(channel:1 slot:0 page:0x102c offset:0x180 grain:32 syndrome:0x0 -  
OVERFLOW area:DRAM err_code:0001:0091 socket:0 ha:0 channel_mask:2 rank:0)

So there's apparently something broken in the hardware.

>> But as you say, the kernel should be ready for this.
> Right, and we've removed that mce_notify_irq() call in
> intel_threshold_interrupt() with
>
>    f29a7aff4bd6 ("x86/mce: Avoid potential deadlock due to printk() in MCE context")
>
> but that's more of a side-effect of that patch.
>
> And if you want to backport it, you'd need the mce_gen_pool_add() and
> remaining machinery for the genpool.

That sounds like a bit much.

Steven, what would you like to do here?

Thanks,

-corey

> Presumably, booting with "mce=no_cmci" should fix this but then you
> won't have the CMCI thresholding, i.e., the interrupt which gets raised
> when a certain amount of correctable errors has been generated.
>
> Hmm, a funny box that.
>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH][RT] x86: Fix an RT MCE crash
  2016-06-30 22:47                           ` Corey Minyard
@ 2016-07-01  7:20                             ` Borislav Petkov
  2016-07-06  0:59                               ` Corey Minyard
  0 siblings, 1 reply; 25+ messages in thread
From: Borislav Petkov @ 2016-07-01  7:20 UTC (permalink / raw)
  To: Corey Minyard; +Cc: Corey Minyard, Luck, Tony, Steven Rostedt, linux-rt-users

On Thu, Jun 30, 2016 at 05:47:29PM -0500, Corey Minyard wrote:
> You are right, I enabled that on the tip of master and I get the
> following spewing out for a while:
>
> EDAC MC0: 27843 CE memory read error on CPU_SrcID#0_Ha#0_Chan#1_DIMM#0
> (channel:1 slot:0 page:0x102c offset:0x180 grain:32 syndrome:0x0 -  OVERFLOW
> area:DRAM err_code:0001:0091 socket:0 ha:0 channel_mask:2 rank:0)
>
> So there's apparently something broken in the hardware.

Yeah, DIMM0 on your socket 0 is generating a bunch of correctable errors
and might go bad soon, the stress being on "might". You could replace
it.

> That sounds like a bit much.

Actually, you probably would need only a couple:

1. 648ed94038c0 ("x86/mce: Provide a lockless memory pool to save error records")

2. 061120aed708 ("x86/mce: Don't use percpu workqueues")
 - that one is unrelated but should be nice for RT as it gets rid of percpu
   workqueues and I know RT hates them :)

3. fd4cf79fcc4b ("x86/mce: Remove the MCE ring for Action Optional errors")
 - this one connects the genpool to MCE

4. f29a7aff4bd6 ("x86/mce: Avoid potential deadlock due to printk() in MCE context")
 - and this is the last one which I meant earlier.

So that's 4 patches, more or less.

Now, you're in the perfect position to test those because you *actually*
have a real-life system which generates those errors so it is the
perfect candidate for testing the backports. And you should test them
with the failing DIMM still in place, of course.

HTH.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH][RT] x86: Fix an RT MCE crash
  2016-06-30 16:01       ` Borislav Petkov
  2016-06-30 16:17         ` Luck, Tony
@ 2016-07-01  9:20         ` Daniel Wagner
  1 sibling, 0 replies; 25+ messages in thread
From: Daniel Wagner @ 2016-07-01  9:20 UTC (permalink / raw)
  To: Borislav Petkov, Steven Rostedt
  Cc: Corey Minyard, linux-rt-users, Corey Minyard, Tony Luck

On 06/30/2016 06:01 PM, Borislav Petkov wrote:
>>>>>   #ifdef CONFIG_PREEMPT_RT_FULL
>>>>> +static bool notify_work_ready __read_mostly;
>>>>>   struct task_struct *mce_notify_helper;
>>>>>   
>>>>>   static int mce_notify_helper_thread(void *unused)
>>>>> @@ -1386,12 +1387,14 @@ static int mce_notify_work_init(void)
> 
> Hmm, what is mce_notify_work_init() ?
> 
> This must be some RT-homegrown thing.
> 
> What it is supposed to do? Upstream is much different from 3.10 or
> whatever that kernel version is.

The notify work snippet defers the execution of mce_notify_irq to a
kthread using swork (simple work, which is based on simple wait [1]).

mce_notify_irq() can't be run in IRQ context because of
wake_up_interruptible() which calls mutex_lock() eventually with
CONFIG_PREEMPT_RT_FULL.

cheers,
daniel

[1]
https://git.kernel.org/cgit/linux/kernel/git/rt/linux-rt-devel.git/tree/include/linux/swork.h?h=linux-4.6.y-rt

https://git.kernel.org/cgit/linux/kernel/git/rt/linux-rt-devel.git/tree/kernel/sched/swork.c?h=linux-4.6.y-rt

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH][RT] x86: Fix an RT MCE crash
  2016-07-01  7:20                             ` Borislav Petkov
@ 2016-07-06  0:59                               ` Corey Minyard
  2016-07-06  8:37                                 ` Borislav Petkov
  0 siblings, 1 reply; 25+ messages in thread
From: Corey Minyard @ 2016-07-06  0:59 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: Corey Minyard, Luck, Tony, Steven Rostedt, linux-rt-users

On 07/01/2016 02:20 AM, Borislav Petkov wrote:
>> That sounds like a bit much.
> Actually, you probably would need only a couple:
>
> 1. 648ed94038c0 ("x86/mce: Provide a lockless memory pool to save error records")
>
> 2. 061120aed708 ("x86/mce: Don't use percpu workqueues")
>   - that one is unrelated but should be nice for RT as it gets rid of percpu
>     workqueues and I know RT hates them :)
>
> 3. fd4cf79fcc4b ("x86/mce: Remove the MCE ring for Action Optional errors")
>   - this one connects the genpool to MCE
>
> 4. f29a7aff4bd6 ("x86/mce: Avoid potential deadlock due to printk() in MCE context")
>   - and this is the last one which I meant earlier.
>
> So that's 4 patches, more or less.
>
> Now, you're in the perfect position to test those because you *actually*
> have a real-life system which generates those errors so it is the
> perfect candidate for testing the backports. And you should test them
> with the failing DIMM still in place, of course.

I'm having our hardware people keep the system as-is until we can
track this down.

A applied the above four patches and a few more support patches got that
were needed, but no love.  Exact same issue.  Well, almost the same, here's
the traceback:

[    0.455575]  [<ffffffff810733c4>] try_to_wake_up+0x34/0x300
[    0.455590]  [<ffffffff81067d76>] ? __hrtimer_start_range_ns+0x226/0x3a0
[    0.455593]  [<ffffffff810736e0>] wake_up_process+0x10/0x20
[    0.455615]  [<ffffffff8101c7a8>] mce_notify_irq+0x28/0x30
[    0.455621]  [<ffffffff8101cbd9>] mce_irq_work_cb+0x9/0x10
[    0.455646]  [<ffffffff810cbb0c>] irq_work_run_list+0x3c/0x60
[    0.455649]  [<ffffffff810cbe97>] irq_work_tick_soft+0x27/0x30
[    0.455673]  [<ffffffff8104dbe4>] run_timer_softirq+0x24/0x250
[    0.455681]  [<ffffffff81045bce>] do_current_softirqs+0x1ae/0x250
[    0.455684]  [<ffffffff81045c9e>] run_ksoftirqd+0x2e/0x50
[    0.455697]  [<ffffffff8106c7f6>] smpboot_thread_fn+0x206/0x320
[    0.455700]  [<ffffffff8106c5f0>] ? lg_global_unlock+0x60/0x60
[    0.455720]  [<ffffffff81063cad>] kthread+0xad/0xc0
[    0.455740]  [<ffffffff81730303>] ? _dbgp_external_startup+0x236/0x392
[    0.455744]  [<ffffffff81063c00>] ? kthread_create_on_node+0x130/0x130
[    0.455752]  [<ffffffff8173a4be>] ret_from_fork+0x4e/0x80
[    0.455756]  [<ffffffff81063c00>] ? kthread_create_on_node+0x130/0x130


So it crashed in the kthread instead of the irq, but exactly the same issue,
that particular field is not initialized.  Not that these aren't patches 
that look
like good ideas.

-corey

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH][RT] x86: Fix an RT MCE crash
  2016-07-06  0:59                               ` Corey Minyard
@ 2016-07-06  8:37                                 ` Borislav Petkov
  2016-07-06 12:03                                   ` Corey Minyard
  0 siblings, 1 reply; 25+ messages in thread
From: Borislav Petkov @ 2016-07-06  8:37 UTC (permalink / raw)
  To: Corey Minyard, Steven Rostedt; +Cc: Corey Minyard, Luck, Tony, linux-rt-users

On Tue, Jul 05, 2016 at 07:59:59PM -0500, Corey Minyard wrote:
> I'm having our hardware people keep the system as-is until we can
> track this down.
> 
> A applied the above four patches and a few more support patches got that
> were needed, but no love.  Exact same issue.  Well, almost the same, here's
> the traceback:
> 
> [    0.455575]  [<ffffffff810733c4>] try_to_wake_up+0x34/0x300
> [    0.455590]  [<ffffffff81067d76>] ? __hrtimer_start_range_ns+0x226/0x3a0
> [    0.455593]  [<ffffffff810736e0>] wake_up_process+0x10/0x20
> [    0.455615]  [<ffffffff8101c7a8>] mce_notify_irq+0x28/0x30
> [    0.455621]  [<ffffffff8101cbd9>] mce_irq_work_cb+0x9/0x10
> [    0.455646]  [<ffffffff810cbb0c>] irq_work_run_list+0x3c/0x60
> [    0.455649]  [<ffffffff810cbe97>] irq_work_tick_soft+0x27/0x30
> [    0.455673]  [<ffffffff8104dbe4>] run_timer_softirq+0x24/0x250
> [    0.455681]  [<ffffffff81045bce>] do_current_softirqs+0x1ae/0x250
> [    0.455684]  [<ffffffff81045c9e>] run_ksoftirqd+0x2e/0x50
> [    0.455697]  [<ffffffff8106c7f6>] smpboot_thread_fn+0x206/0x320
> [    0.455700]  [<ffffffff8106c5f0>] ? lg_global_unlock+0x60/0x60
> [    0.455720]  [<ffffffff81063cad>] kthread+0xad/0xc0
> [    0.455740]  [<ffffffff81730303>] ? _dbgp_external_startup+0x236/0x392
> [    0.455744]  [<ffffffff81063c00>] ? kthread_create_on_node+0x130/0x130
> [    0.455752]  [<ffffffff8173a4be>] ret_from_fork+0x4e/0x80
> [    0.455756]  [<ffffffff81063c00>] ? kthread_create_on_node+0x130/0x130
> 
> 
> So it crashed in the kthread instead of the irq, but exactly the same issue,
> that particular field is not initialized.  Not that these aren't patches
> that look like good ideas.

Hmm, so this looks like RT-specific now AFAICT.

mce_notify_irq() calls mce_notify_work() and on RT_FULL that's
trying to wake up mce_notify_helper which is not initialized yet -
mce_notify_work_init() happens later in a device_initcall_sync.

Would something as trivial as this work in your case?

---
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index aaf4b9b94f38..cc70d98a30f6 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -1391,7 +1391,8 @@ static int mce_notify_work_init(void)
 
 static void mce_notify_work(void)
 {
-	wake_up_process(mce_notify_helper);
+	if (mce_notify_helper)
+		wake_up_process(mce_notify_helper);
 }
 #else
 static void mce_notify_work(void)


-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH][RT] x86: Fix an RT MCE crash
  2016-07-06  8:37                                 ` Borislav Petkov
@ 2016-07-06 12:03                                   ` Corey Minyard
  2016-07-06 13:32                                     ` Steven Rostedt
  0 siblings, 1 reply; 25+ messages in thread
From: Corey Minyard @ 2016-07-06 12:03 UTC (permalink / raw)
  To: Borislav Petkov, Corey Minyard, Steven Rostedt
  Cc: Luck, Tony, linux-rt-users, Sebastian Sewior

On 07/06/2016 03:37 AM, Borislav Petkov wrote:
> On Tue, Jul 05, 2016 at 07:59:59PM -0500, Corey Minyard wrote:
>> I'm having our hardware people keep the system as-is until we can
>> track this down.
>>
>> A applied the above four patches and a few more support patches got that
>> were needed, but no love.  Exact same issue.  Well, almost the same, here's
>> the traceback:
>>
>> [    0.455575]  [<ffffffff810733c4>] try_to_wake_up+0x34/0x300
>> [    0.455590]  [<ffffffff81067d76>] ? __hrtimer_start_range_ns+0x226/0x3a0
>> [    0.455593]  [<ffffffff810736e0>] wake_up_process+0x10/0x20
>> [    0.455615]  [<ffffffff8101c7a8>] mce_notify_irq+0x28/0x30
>> [    0.455621]  [<ffffffff8101cbd9>] mce_irq_work_cb+0x9/0x10
>> [    0.455646]  [<ffffffff810cbb0c>] irq_work_run_list+0x3c/0x60
>> [    0.455649]  [<ffffffff810cbe97>] irq_work_tick_soft+0x27/0x30
>> [    0.455673]  [<ffffffff8104dbe4>] run_timer_softirq+0x24/0x250
>> [    0.455681]  [<ffffffff81045bce>] do_current_softirqs+0x1ae/0x250
>> [    0.455684]  [<ffffffff81045c9e>] run_ksoftirqd+0x2e/0x50
>> [    0.455697]  [<ffffffff8106c7f6>] smpboot_thread_fn+0x206/0x320
>> [    0.455700]  [<ffffffff8106c5f0>] ? lg_global_unlock+0x60/0x60
>> [    0.455720]  [<ffffffff81063cad>] kthread+0xad/0xc0
>> [    0.455740]  [<ffffffff81730303>] ? _dbgp_external_startup+0x236/0x392
>> [    0.455744]  [<ffffffff81063c00>] ? kthread_create_on_node+0x130/0x130
>> [    0.455752]  [<ffffffff8173a4be>] ret_from_fork+0x4e/0x80
>> [    0.455756]  [<ffffffff81063c00>] ? kthread_create_on_node+0x130/0x130
>>
>>
>> So it crashed in the kthread instead of the irq, but exactly the same issue,
>> that particular field is not initialized.  Not that these aren't patches
>> that look like good ideas.
> Hmm, so this looks like RT-specific now AFAICT.
>
> mce_notify_irq() calls mce_notify_work() and on RT_FULL that's
> trying to wake up mce_notify_helper which is not initialized yet -
> mce_notify_work_init() happens later in a device_initcall_sync.
>
> Would something as trivial as this work in your case?
>
> ---
> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
> index aaf4b9b94f38..cc70d98a30f6 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> @@ -1391,7 +1391,8 @@ static int mce_notify_work_init(void)
>   
>   static void mce_notify_work(void)
>   {
> -	wake_up_process(mce_notify_helper);
> +	if (mce_notify_helper)
> +		wake_up_process(mce_notify_helper);
>   }
>   #else
>   static void mce_notify_work(void)
>
>
I did think about that option, but I'm not sure why the current RT patch
has that as a separate bool.

This appears to come in here:

http://www.spinics.net/lists/linux-rt-users/msg12779.html

I'm copying Sebastian, who appears to be the original source of this
change.

-corey

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH][RT] x86: Fix an RT MCE crash
  2016-07-06 12:03                                   ` Corey Minyard
@ 2016-07-06 13:32                                     ` Steven Rostedt
  2016-07-06 13:43                                       ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 25+ messages in thread
From: Steven Rostedt @ 2016-07-06 13:32 UTC (permalink / raw)
  To: Corey Minyard
  Cc: Borislav Petkov, Corey Minyard, Luck, Tony, linux-rt-users,
	Sebastian Sewior

On Wed, 6 Jul 2016 07:03:43 -0500
Corey Minyard <cminyard@mvista.com> wrote:

> > ---
> > diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
> > index aaf4b9b94f38..cc70d98a30f6 100644
> > --- a/arch/x86/kernel/cpu/mcheck/mce.c
> > +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> > @@ -1391,7 +1391,8 @@ static int mce_notify_work_init(void)
> >   
> >   static void mce_notify_work(void)
> >   {
> > -	wake_up_process(mce_notify_helper);
> > +	if (mce_notify_helper)
> > +		wake_up_process(mce_notify_helper);
> >   }

Actually, this appears to be the fix in 4.6-rt.

> >   #else
> >   static void mce_notify_work(void)
> >
> >  
> I did think about that option, but I'm not sure why the current RT patch
> has that as a separate bool.
> 
> This appears to come in here:
> 
> http://www.spinics.net/lists/linux-rt-users/msg12779.html
> 
> I'm copying Sebastian, who appears to be the original source of this
> change.

You can see why this is different by looking at the commit that changed
it. Or this email:

 lkml.kernel.org/r/1365704626.9609.55.camel@gandalf.local.home

-- Steve

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH][RT] x86: Fix an RT MCE crash
  2016-07-06 13:32                                     ` Steven Rostedt
@ 2016-07-06 13:43                                       ` Sebastian Andrzej Siewior
  2016-07-11 17:32                                         ` Steven Rostedt
  0 siblings, 1 reply; 25+ messages in thread
From: Sebastian Andrzej Siewior @ 2016-07-06 13:43 UTC (permalink / raw)
  To: Steven Rostedt, Corey Minyard
  Cc: Borislav Petkov, Corey Minyard, Luck, Tony, linux-rt-users

On 07/06/2016 03:32 PM, Steven Rostedt wrote:
> On Wed, 6 Jul 2016 07:03:43 -0500
> Corey Minyard <cminyard@mvista.com> wrote:
> 
>>> ---
>>> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
>>> index aaf4b9b94f38..cc70d98a30f6 100644
>>> --- a/arch/x86/kernel/cpu/mcheck/mce.c
>>> +++ b/arch/x86/kernel/cpu/mcheck/mce.c
>>> @@ -1391,7 +1391,8 @@ static int mce_notify_work_init(void)
>>>   
>>>   static void mce_notify_work(void)
>>>   {
>>> -	wake_up_process(mce_notify_helper);
>>> +	if (mce_notify_helper)
>>> +		wake_up_process(mce_notify_helper);
>>>   }
> 
> Actually, this appears to be the fix in 4.6-rt.

This is in there since it was switched to swork instead of using its
own thread and is also in v4.1:

https://git.kernel.org/cgit/linux/kernel/git/rt/linux-rt-devel.git/tree/arch/x86/kernel/cpu/mcheck/mce.c?h=linux-4.1.y-rt#n1387

as part of x86-mce-use-swait-queue-for-mce-wakeups.patch. And this
patch was first part of v3.18.9-rt4 as a new patch dropping x86-mce-
Defer-mce-wakeups-to-threads-for-PREEMPT_RT.patch.

> -- Steve

Sebastian

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH][RT] x86: Fix an RT MCE crash
  2016-07-06 13:43                                       ` Sebastian Andrzej Siewior
@ 2016-07-11 17:32                                         ` Steven Rostedt
  0 siblings, 0 replies; 25+ messages in thread
From: Steven Rostedt @ 2016-07-11 17:32 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Corey Minyard, Borislav Petkov, Corey Minyard, Luck, Tony,
	linux-rt-users

On Wed, 6 Jul 2016 15:43:51 +0200
Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:

> On 07/06/2016 03:32 PM, Steven Rostedt wrote:
> > On Wed, 6 Jul 2016 07:03:43 -0500
> > Corey Minyard <cminyard@mvista.com> wrote:
> >   
> >>> ---
> >>> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
> >>> index aaf4b9b94f38..cc70d98a30f6 100644
> >>> --- a/arch/x86/kernel/cpu/mcheck/mce.c
> >>> +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> >>> @@ -1391,7 +1391,8 @@ static int mce_notify_work_init(void)
> >>>   
> >>>   static void mce_notify_work(void)
> >>>   {
> >>> -	wake_up_process(mce_notify_helper);
> >>> +	if (mce_notify_helper)
> >>> +		wake_up_process(mce_notify_helper);
> >>>   }  
> > 
> > Actually, this appears to be the fix in 4.6-rt.  
> 
> This is in there since it was switched to swork instead of using its
> own thread and is also in v4.1:
> 
> https://git.kernel.org/cgit/linux/kernel/git/rt/linux-rt-devel.git/tree/arch/x86/kernel/cpu/mcheck/mce.c?h=linux-4.1.y-rt#n1387
> 
> as part of x86-mce-use-swait-queue-for-mce-wakeups.patch. And this
> patch was first part of v3.18.9-rt4 as a new patch dropping x86-mce-
> Defer-mce-wakeups-to-threads-for-PREEMPT_RT.patch.
> 

I'm doing backports now. I'm currently on 4.1, but trying to move
quickly.

Corey, what commits did you backport (plus the one Boris showed)? And
does that fix your issue?

-- Steve

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2016-07-11 17:32 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-30 13:24 [PATCH][RT] x86: Fix an RT MCE crash minyard
2016-06-30 13:43 ` Steven Rostedt
2016-06-30 14:49   ` Corey Minyard
2016-06-30 15:51     ` Steven Rostedt
2016-06-30 15:58       ` Corey Minyard
2016-06-30 16:01       ` Borislav Petkov
2016-06-30 16:17         ` Luck, Tony
2016-06-30 16:40           ` Corey Minyard
2016-06-30 17:01             ` Borislav Petkov
2016-06-30 17:18               ` Corey Minyard
2016-06-30 17:26                 ` Borislav Petkov
2016-06-30 17:54                   ` Corey Minyard
2016-06-30 18:22                     ` Borislav Petkov
2016-06-30 19:44                       ` Corey Minyard
2016-06-30 20:34                         ` Borislav Petkov
2016-06-30 22:47                           ` Corey Minyard
2016-07-01  7:20                             ` Borislav Petkov
2016-07-06  0:59                               ` Corey Minyard
2016-07-06  8:37                                 ` Borislav Petkov
2016-07-06 12:03                                   ` Corey Minyard
2016-07-06 13:32                                     ` Steven Rostedt
2016-07-06 13:43                                       ` Sebastian Andrzej Siewior
2016-07-11 17:32                                         ` Steven Rostedt
2016-07-01  9:20         ` Daniel Wagner
2016-06-30 16:04       ` Corey Minyard

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.