linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* MCE handler gets NIP wrong on MPC8378
@ 2020-02-18 17:07 Radu Rendec
  2020-02-18 18:08 ` Christophe Leroy
  0 siblings, 1 reply; 13+ messages in thread
From: Radu Rendec @ 2020-02-18 17:07 UTC (permalink / raw)
  To: linuxppc-dev

Hi Everyone,

The saved NIP seems to be broken inside machine_check_exception() on
MPC8378, running Linux 4.9.191. The value is 0x900 most of the times,
but I have seen other weird values.

I've been able to track down the entry code to head_32.S (vector 0x200),
but I'm not sure where/how the NIP value (where the exception occurred)
is captured.

I also noticed most of the code has moved to head_32.h in newer kernel
versions, but EXCEPTION_PROLOG_1 and EXCEPTION_PROLOG_2 look pretty much
the same. I guess the same thing happens on a recent kernel, even though
I don't have an easy way to test it.

The original MCE that I see is triggered by a failed PCIe transaction,
but I was able to reproduce it by just reading from a (physically)
unmapped memory area. Sample code and kernel crash dump are included
below.

Can anyone please provide any suggestion as to what to look at next?

Thanks,
Radu

8<--------------------------------------------------------------------

#include <linux/module.h>
#include <linux/delay.h>
#include <asm/io.h>

static void __iomem *bad_addr_base;

static int __init test_mce_init(void)
{
        unsigned int x;

        bad_addr_base = ioremap(0xf0000000, 0x100);

        if (bad_addr_base) {
                __asm__ __volatile__ ("isync");
                x = ioread32(bad_addr_base);
                pr_info("Test: %#0x\n", x);
        } else
                pr_err("Cannot map\n");

        return 0;
}

static void __exit test_mce_exit(void)
{
        iounmap(bad_addr_base);
}

module_init(test_mce_init);
module_exit(test_mce_exit);

MODULE_LICENSE("GPL");

8<--------------------------------------------------------------------

[   14.977053] mce: loading out-of-tree module taints kernel.
[   15.004285] Disabling lock debugging due to kernel taint
[   15.026151] Machine check in kernel mode.
[   15.030153] Caused by (from SRR1=41000): [   15.033982] Transfer
error ack signal
[   15.037652] Oops: Machine check 1, sig: 7 [#1]
[   15.042088] PREEMPT [   15.044091] MPC8378_CUSTOM
[   15.046967] Modules linked in: mce(O+) iptable_filter ip_tables
x_tables ipv6 mpc8xxx_wdt yaffs spidev spi_fsl_spi spi_fsl_lib
spi_fsl_cpm fsl_mph_dr_of ehci_fsl ehci_hcd
[   15.067486] CPU: 0 PID: 1213 Comm: insmod Tainted: G   M     C O
4.9.191-default-mpc8378-p3c692a64ae1d #31
[   15.078175] task: 9e83e550 task.stack: 9ea2e000
[   15.082699] NIP: 00000900 LR: b147e030 CTR: 80015d50
[   15.087659] REGS: 9ea2fca0 TRAP: 0200   Tainted: G   M     C O
(4.9.191-default-mpc8378-p3c692a64ae1d)
[   15.098084] MSR: 00041000 <ME>[   15.100973]   CR: 42002228  XER: 00000000
[   15.104976] DAR: 80017414 DSISR: 00000000
GPR00: b147e030 9ea2fd50 9e83e550 00000000 b1480000 9c652200 9ea2fd18 00000000
GPR08: 9c652200 00000000 b1480000 00001032 80015d50 100b93d0 b147e308 805eb3d8
GPR16: 0000003a 00000550 b1473b5c b147c2a4 8048e444 80082b08 00000000 b147c0e8
GPR24: 00000124 00000578 00000000 00000000 b147c0a0 b147e000 9eb7c280 b147c0a0
NIP [00000900] 0x900
[   15.139310] LR [b147e030] test_mce_init+0x30/0xa8 [mce]
[   15.144528] Call Trace:
[   15.146973] [9ea2fd50] [b147e000] test_mce_init+0x0/0xa8 [mce] (unreliable)
[   15.153940] [9ea2fd60] [b147e030] test_mce_init+0x30/0xa8 [mce]
[   15.159864] [9ea2fd70] [80003ac4] do_one_initcall+0xbc/0x184
[   15.165527] [9ea2fde0] [804857e8] do_init_module+0x64/0x1e4
[   15.171107] [9ea2fe00] [80086014] load_module+0x1c78/0x2268
[   15.176680] [9ea2fec0] [80086780] SyS_init_module+0x17c/0x190
[   15.182433] [9ea2ff40] [80010acc] ret_from_syscall+0x0/0x38
[   15.188005] --- interrupt: c01 at 0xfdfdb64
[   15.188005]     LR = 0x10013c64
[   15.195309] Instruction dump:
[   15.198274] 00000000 XXXXXXXX 00000000 XXXXXXXX 00000000 XXXXXXXX
00000000 XXXXXXXX
[   15.206047] 00000000 XXXXXXXX 00000000 XXXXXXXX 7d5043a6 XXXXXXXX
7d400026 XXXXXXXX
[   15.213824] ---[ end trace d38922938e009d45 ]---
[   15.218434]
[   16.219951] Kernel panic - not syncing: Fatal exception
[   16.225174] Rebooting in 1 seconds..

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: MCE handler gets NIP wrong on MPC8378
  2020-02-18 17:07 MCE handler gets NIP wrong on MPC8378 Radu Rendec
@ 2020-02-18 18:08 ` Christophe Leroy
  2020-02-19 15:11   ` Radu Rendec
  0 siblings, 1 reply; 13+ messages in thread
From: Christophe Leroy @ 2020-02-18 18:08 UTC (permalink / raw)
  To: Radu Rendec, linuxppc-dev



Le 18/02/2020 à 18:07, Radu Rendec a écrit :
> Hi Everyone,
> 
> The saved NIP seems to be broken inside machine_check_exception() on
> MPC8378, running Linux 4.9.191. The value is 0x900 most of the times,
> but I have seen other weird values.
> 
> I've been able to track down the entry code to head_32.S (vector 0x200),
> but I'm not sure where/how the NIP value (where the exception occurred)
> is captured.

NIP value is supposed to come from SRR0, loaded in r12 in PROLOG_2 and 
saved into _NIP(r11) in transfer_to_handler in entry_32.S

Can something clobber r12 at some point ?

Maybe add the following at some place to trap when it happens ?

tweqi r12, 0x900

If you put it just after reading SRR0, and just before writing into 
NIP(r11), you'll see if its wrong from the begining or if it is 
overwriten later.

Christophe

> 
> I also noticed most of the code has moved to head_32.h in newer kernel
> versions, but EXCEPTION_PROLOG_1 and EXCEPTION_PROLOG_2 look pretty much
> the same. I guess the same thing happens on a recent kernel, even though
> I don't have an easy way to test it.
> 
> The original MCE that I see is triggered by a failed PCIe transaction,
> but I was able to reproduce it by just reading from a (physically)
> unmapped memory area. Sample code and kernel crash dump are included
> below.
> 
> Can anyone please provide any suggestion as to what to look at next?
> 
> Thanks,
> Radu
> 
> 8<--------------------------------------------------------------------
> 
> #include <linux/module.h>
> #include <linux/delay.h>
> #include <asm/io.h>
> 
> static void __iomem *bad_addr_base;
> 
> static int __init test_mce_init(void)
> {
>          unsigned int x;
> 
>          bad_addr_base = ioremap(0xf0000000, 0x100);
> 
>          if (bad_addr_base) {
>                  __asm__ __volatile__ ("isync");
>                  x = ioread32(bad_addr_base);
>                  pr_info("Test: %#0x\n", x);
>          } else
>                  pr_err("Cannot map\n");
> 
>          return 0;
> }
> 
> static void __exit test_mce_exit(void)
> {
>          iounmap(bad_addr_base);
> }
> 
> module_init(test_mce_init);
> module_exit(test_mce_exit);
> 
> MODULE_LICENSE("GPL");
> 
> 8<--------------------------------------------------------------------
> 
> [   14.977053] mce: loading out-of-tree module taints kernel.
> [   15.004285] Disabling lock debugging due to kernel taint
> [   15.026151] Machine check in kernel mode.
> [   15.030153] Caused by (from SRR1=41000): [   15.033982] Transfer
> error ack signal
> [   15.037652] Oops: Machine check 1, sig: 7 [#1]
> [   15.042088] PREEMPT [   15.044091] MPC8378_CUSTOM
> [   15.046967] Modules linked in: mce(O+) iptable_filter ip_tables
> x_tables ipv6 mpc8xxx_wdt yaffs spidev spi_fsl_spi spi_fsl_lib
> spi_fsl_cpm fsl_mph_dr_of ehci_fsl ehci_hcd
> [   15.067486] CPU: 0 PID: 1213 Comm: insmod Tainted: G   M     C O
> 4.9.191-default-mpc8378-p3c692a64ae1d #31
> [   15.078175] task: 9e83e550 task.stack: 9ea2e000
> [   15.082699] NIP: 00000900 LR: b147e030 CTR: 80015d50
> [   15.087659] REGS: 9ea2fca0 TRAP: 0200   Tainted: G   M     C O
> (4.9.191-default-mpc8378-p3c692a64ae1d)
> [   15.098084] MSR: 00041000 <ME>[   15.100973]   CR: 42002228  XER: 00000000
> [   15.104976] DAR: 80017414 DSISR: 00000000
> GPR00: b147e030 9ea2fd50 9e83e550 00000000 b1480000 9c652200 9ea2fd18 00000000
> GPR08: 9c652200 00000000 b1480000 00001032 80015d50 100b93d0 b147e308 805eb3d8
> GPR16: 0000003a 00000550 b1473b5c b147c2a4 8048e444 80082b08 00000000 b147c0e8
> GPR24: 00000124 00000578 00000000 00000000 b147c0a0 b147e000 9eb7c280 b147c0a0
> NIP [00000900] 0x900
> [   15.139310] LR [b147e030] test_mce_init+0x30/0xa8 [mce]
> [   15.144528] Call Trace:
> [   15.146973] [9ea2fd50] [b147e000] test_mce_init+0x0/0xa8 [mce] (unreliable)
> [   15.153940] [9ea2fd60] [b147e030] test_mce_init+0x30/0xa8 [mce]
> [   15.159864] [9ea2fd70] [80003ac4] do_one_initcall+0xbc/0x184
> [   15.165527] [9ea2fde0] [804857e8] do_init_module+0x64/0x1e4
> [   15.171107] [9ea2fe00] [80086014] load_module+0x1c78/0x2268
> [   15.176680] [9ea2fec0] [80086780] SyS_init_module+0x17c/0x190
> [   15.182433] [9ea2ff40] [80010acc] ret_from_syscall+0x0/0x38
> [   15.188005] --- interrupt: c01 at 0xfdfdb64
> [   15.188005]     LR = 0x10013c64
> [   15.195309] Instruction dump:
> [   15.198274] 00000000 XXXXXXXX 00000000 XXXXXXXX 00000000 XXXXXXXX
> 00000000 XXXXXXXX
> [   15.206047] 00000000 XXXXXXXX 00000000 XXXXXXXX 7d5043a6 XXXXXXXX
> 7d400026 XXXXXXXX
> [   15.213824] ---[ end trace d38922938e009d45 ]---
> [   15.218434]
> [   16.219951] Kernel panic - not syncing: Fatal exception
> [   16.225174] Rebooting in 1 seconds..
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: MCE handler gets NIP wrong on MPC8378
  2020-02-18 18:08 ` Christophe Leroy
@ 2020-02-19 15:11   ` Radu Rendec
  2020-02-19 19:46     ` Radu Rendec
  2020-02-26  0:01     ` Radu Rendec
  0 siblings, 2 replies; 13+ messages in thread
From: Radu Rendec @ 2020-02-19 15:11 UTC (permalink / raw)
  To: Christophe Leroy; +Cc: linuxppc-dev

On 02/18/2020 at 1:08 PM Christophe Leroy <christophe.leroy@c-s.fr> wrote:
> Le 18/02/2020 à 18:07, Radu Rendec a écrit :
> > The saved NIP seems to be broken inside machine_check_exception() on
> > MPC8378, running Linux 4.9.191. The value is 0x900 most of the times,
> > but I have seen other weird values.
> >
> > I've been able to track down the entry code to head_32.S (vector 0x200),
> > but I'm not sure where/how the NIP value (where the exception occurred)
> > is captured.
>
> NIP value is supposed to come from SRR0, loaded in r12 in PROLOG_2 and
> saved into _NIP(r11) in transfer_to_handler in entry_32.S

Thank you so much for the information, it is extremely helpful!

> Can something clobber r12 at some point ?
>
> Maybe add the following at some place to trap when it happens ?
>
> tweqi r12, 0x900
>
> If you put it just after reading SRR0, and just before writing into
> NIP(r11), you'll see if its wrong from the begining or if it is
> overwriten later.

I did something even simpler: I added the following

        lis r12,0x1234

... right after

        mfspr r12,SPRN_SRR0

... and now the NIP value I see in the crash dump is 0x12340000. This
means r12 is not clobbered and most likely the NIP value I normally see
is the actual SRR0 value.

Just to be sure that SRR0 is not clobbered before it's even saved to r12
(very unlikely though) I changed the code to save SRR0 to r8 at the very
beginning of the handler (first instruction, at address 0x200) and then
load r12 from r8 later. This of course clobbers r8, but it's good for
testing. Now in the crash dump I see 0x900 in both NIP and r8.

So I think I ruled out any problem in the Linux MCE handler. MPC8378 has
an e300 core and I double checked with the e300 core reference manual
(e300coreRM.pdf from NXP). I couldn't find anything weird there either.
Quoting from the RM:

| 5.5.2.1 Machine Check Interrupt Enabled (MSR[ME] = 1)
|
| When a machine check interrupt is taken, registers are updated as
| shown in Table 5-14.
|
| Table 5-14. Machine Check Interrupt—Register Settings
|
| SRR0 Set to the address of the next instruction that would have been
|      completed in the interrupted instruction stream. Neither this
|      instruction nor any others beyond it will have been completed.
|      All preceding instructions will have been completed.

At this point I'm assuming a silicon bug, although I couldn't find
anything interesting in the Errata provided by NXP.

Best regards,
Radu

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: MCE handler gets NIP wrong on MPC8378
  2020-02-19 15:11   ` Radu Rendec
@ 2020-02-19 19:46     ` Radu Rendec
  2020-02-19 21:08       ` Christophe Leroy
  2020-02-26  0:01     ` Radu Rendec
  1 sibling, 1 reply; 13+ messages in thread
From: Radu Rendec @ 2020-02-19 19:46 UTC (permalink / raw)
  To: Christophe Leroy; +Cc: linuxppc-dev

On 02/19/2020 at 10:11 AM Radu Rendec <radu.rendec@gmail.com> wrote:
> On 02/18/2020 at 1:08 PM Christophe Leroy <christophe.leroy@c-s.fr> wrote:
> > Le 18/02/2020 à 18:07, Radu Rendec a écrit :
> > > The saved NIP seems to be broken inside machine_check_exception() on
> > > MPC8378, running Linux 4.9.191. The value is 0x900 most of the times,
> > > but I have seen other weird values.
> > >
> > > I've been able to track down the entry code to head_32.S (vector 0x200),
> > > but I'm not sure where/how the NIP value (where the exception occurred)
> > > is captured.
> >
> > NIP value is supposed to come from SRR0, loaded in r12 in PROLOG_2 and
> > saved into _NIP(r11) in transfer_to_handler in entry_32.S
> >
> > Can something clobber r12 at some point ?
> >
>
> I did something even simpler: I added the following
>
>       lis r12,0x1234
>
> ... right after
>
>       mfspr r12,SPRN_SRR0
>
> ... and now the NIP value I see in the crash dump is 0x12340000. This
> means r12 is not clobbered and most likely the NIP value I normally see
> is the actual SRR0 value.

I apologize for the noise. I just found out accidentally that the saved
NIP value is correct if interrupts are disabled at the time when the
faulty access that triggers the MCE occurs. This seems to happen
consistently.

By "interrupts are disabled" I mean local_irq_save/local_irq_restore, so
it's basically enough to wrap ioread32 to get the NIP value right.

Does this make any sense? Maybe it's not a silicon bug after all, or
maybe it is and I just found a workaround. Could this happen on other
PowerPC CPUs as well?

Best regards,
Radu

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: MCE handler gets NIP wrong on MPC8378
  2020-02-19 19:46     ` Radu Rendec
@ 2020-02-19 21:08       ` Christophe Leroy
  2020-02-19 21:21         ` Christophe Leroy
  0 siblings, 1 reply; 13+ messages in thread
From: Christophe Leroy @ 2020-02-19 21:08 UTC (permalink / raw)
  To: Radu Rendec; +Cc: linuxppc-dev

Radu Rendec <radu.rendec@gmail.com> a écrit :

> On 02/19/2020 at 10:11 AM Radu Rendec <radu.rendec@gmail.com> wrote:
>> On 02/18/2020 at 1:08 PM Christophe Leroy <christophe.leroy@c-s.fr> wrote:
>> > Le 18/02/2020 à 18:07, Radu Rendec a écrit :
>> > > The saved NIP seems to be broken inside machine_check_exception() on
>> > > MPC8378, running Linux 4.9.191. The value is 0x900 most of the times,
>> > > but I have seen other weird values.
>> > >
>> > > I've been able to track down the entry code to head_32.S (vector 0x200),
>> > > but I'm not sure where/how the NIP value (where the exception occurred)
>> > > is captured.
>> >
>> > NIP value is supposed to come from SRR0, loaded in r12 in PROLOG_2 and
>> > saved into _NIP(r11) in transfer_to_handler in entry_32.S
>> >
>> > Can something clobber r12 at some point ?
>> >
>>
>> I did something even simpler: I added the following
>>
>>       lis r12,0x1234
>>
>> ... right after
>>
>>       mfspr r12,SPRN_SRR0
>>
>> ... and now the NIP value I see in the crash dump is 0x12340000. This
>> means r12 is not clobbered and most likely the NIP value I normally see
>> is the actual SRR0 value.
>
> I apologize for the noise. I just found out accidentally that the saved
> NIP value is correct if interrupts are disabled at the time when the
> faulty access that triggers the MCE occurs. This seems to happen
> consistently.
>
> By "interrupts are disabled" I mean local_irq_save/local_irq_restore, so
> it's basically enough to wrap ioread32 to get the NIP value right.
>
> Does this make any sense? Maybe it's not a silicon bug after all, or
> maybe it is and I just found a workaround. Could this happen on other
> PowerPC CPUs as well?

Interesting.

0x900 is the adress of the timer interrupt.

Would the MCE occur just after the timer interrupt ?

Can you tell how are configured your IO busses, etc ... ?

Christophe



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: MCE handler gets NIP wrong on MPC8378
  2020-02-19 21:08       ` Christophe Leroy
@ 2020-02-19 21:21         ` Christophe Leroy
  2020-02-19 22:39           ` Radu Rendec
  0 siblings, 1 reply; 13+ messages in thread
From: Christophe Leroy @ 2020-02-19 21:21 UTC (permalink / raw)
  To: Radu Rendec; +Cc: linuxppc-dev

Christophe Leroy <christophe.leroy@c-s.fr> a écrit :

> Radu Rendec <radu.rendec@gmail.com> a écrit :
>
>> On 02/19/2020 at 10:11 AM Radu Rendec <radu.rendec@gmail.com> wrote:
>>> On 02/18/2020 at 1:08 PM Christophe Leroy <christophe.leroy@c-s.fr> wrote:
>>>> Le 18/02/2020 à 18:07, Radu Rendec a écrit :
>>>> > The saved NIP seems to be broken inside machine_check_exception() on
>>>> > MPC8378, running Linux 4.9.191. The value is 0x900 most of the times,
>>>> > but I have seen other weird values.
>>>> >
>>>> > I've been able to track down the entry code to head_32.S (vector 0x200),
>>>> > but I'm not sure where/how the NIP value (where the exception occurred)
>>>> > is captured.
>>>>
>>>> NIP value is supposed to come from SRR0, loaded in r12 in PROLOG_2 and
>>>> saved into _NIP(r11) in transfer_to_handler in entry_32.S
>>>>
>>>> Can something clobber r12 at some point ?
>>>>
>>>
>>> I did something even simpler: I added the following
>>>
>>>      lis r12,0x1234
>>>
>>> ... right after
>>>
>>>      mfspr r12,SPRN_SRR0
>>>
>>> ... and now the NIP value I see in the crash dump is 0x12340000. This
>>> means r12 is not clobbered and most likely the NIP value I normally see
>>> is the actual SRR0 value.
>>
>> I apologize for the noise. I just found out accidentally that the saved
>> NIP value is correct if interrupts are disabled at the time when the
>> faulty access that triggers the MCE occurs. This seems to happen
>> consistently.
>>
>> By "interrupts are disabled" I mean local_irq_save/local_irq_restore, so
>> it's basically enough to wrap ioread32 to get the NIP value right.
>>
>> Does this make any sense? Maybe it's not a silicon bug after all, or
>> maybe it is and I just found a workaround. Could this happen on other
>> PowerPC CPUs as well?
>
> Interesting.
>
> 0x900 is the adress of the timer interrupt.
>
> Would the MCE occur just after the timer interrupt ?
>
> Can you tell how are configured your IO busses, etc ... ?

And what's the value of SERSR after the machine check ?

Do you use the local bus monitoring driver ?

Christophe


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: MCE handler gets NIP wrong on MPC8378
  2020-02-19 21:21         ` Christophe Leroy
@ 2020-02-19 22:39           ` Radu Rendec
  2020-02-20  8:38             ` Christophe Leroy
  0 siblings, 1 reply; 13+ messages in thread
From: Radu Rendec @ 2020-02-19 22:39 UTC (permalink / raw)
  To: Christophe Leroy; +Cc: linuxppc-dev

On 02/19/2020 at 4:21 PM Christophe Leroy <christophe.leroy@c-s.fr> wrote:
> > Radu Rendec <radu.rendec@gmail.com> a écrit :
> >> On 02/19/2020 at 10:11 AM Radu Rendec <radu.rendec@gmail.com> wrote:
> >>> On 02/18/2020 at 1:08 PM Christophe Leroy <christophe.leroy@c-s.fr> wrote:
> >>>> Le 18/02/2020 à 18:07, Radu Rendec a écrit :
> >>>> > The saved NIP seems to be broken inside machine_check_exception() on
> >>>> > MPC8378, running Linux 4.9.191. The value is 0x900 most of the times,
> >>>> > but I have seen other weird values.
> >>>> >
> >>>> > I've been able to track down the entry code to head_32.S (vector 0x200),
> >>>> > but I'm not sure where/how the NIP value (where the exception occurred)
> >>>> > is captured.
> >>>>
> >>>> NIP value is supposed to come from SRR0, loaded in r12 in PROLOG_2 and
> >>>> saved into _NIP(r11) in transfer_to_handler in entry_32.S
> >>>>
> >>>> Can something clobber r12 at some point ?
> >>>>
> >>>
> >>> I did something even simpler: I added the following
> >>>
> >>>      lis r12,0x1234
> >>>
> >>> ... right after
> >>>
> >>>      mfspr r12,SPRN_SRR0
> >>>
> >>> ... and now the NIP value I see in the crash dump is 0x12340000. This
> >>> means r12 is not clobbered and most likely the NIP value I normally see
> >>> is the actual SRR0 value.
> >>
> >> I apologize for the noise. I just found out accidentally that the saved
> >> NIP value is correct if interrupts are disabled at the time when the
> >> faulty access that triggers the MCE occurs. This seems to happen
> >> consistently.
> >>
> >> By "interrupts are disabled" I mean local_irq_save/local_irq_restore, so
> >> it's basically enough to wrap ioread32 to get the NIP value right.
> >>
> >> Does this make any sense? Maybe it's not a silicon bug after all, or
> >> maybe it is and I just found a workaround. Could this happen on other
> >> PowerPC CPUs as well?
> >
> > Interesting.
> >
> > 0x900 is the adress of the timer interrupt.
> >
> > Would the MCE occur just after the timer interrupt ?

I doubt that. I'm using a small test module to artificially trigger the
MCE. Basically it's just this (the full code is in my original post):

        bad_addr_base = ioremap(0xf0000000, 0x100);
        x = ioread32(bad_addr_base);

I find it hard to believe that every time I load the module the lwbrx
instruction that triggers the MCE is executed exactly after the timer
interrupt (or that the timer interrupt always occurs close to the lwbrx
instruction).

> >
> > Can you tell how are configured your IO busses, etc ... ?

Nothing special. The device tree is mostly similar to mpc8379_rdb.dts,
but I can provide the actual dts if you think it's relevant.

> And what's the value of SERSR after the machine check ?

I'm assuming you're talking about the IPIC SERSR register. I modified
machine_check_exception and added a call to ipic_get_mcp_status, which
seems to read IPIC_SERSR. The value is 0, both with interrupts enabled
and disabled (which makes sense, since disabling/enabling interrupts is
local to the CPU core).

> Do you use the local bus monitoring driver ?

I don't. In fact, I'm not even aware of it. What driver is that?

Best regards,
Radu

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: MCE handler gets NIP wrong on MPC8378
  2020-02-19 22:39           ` Radu Rendec
@ 2020-02-20  8:38             ` Christophe Leroy
  2020-02-20 16:02               ` Radu Rendec
  0 siblings, 1 reply; 13+ messages in thread
From: Christophe Leroy @ 2020-02-20  8:38 UTC (permalink / raw)
  To: Radu Rendec; +Cc: linuxppc-dev



On 02/19/2020 10:39 PM, Radu Rendec wrote:
> On 02/19/2020 at 4:21 PM Christophe Leroy <christophe.leroy@c-s.fr> wrote:
>>> Radu Rendec <radu.rendec@gmail.com> a écrit :
>>>> On 02/19/2020 at 10:11 AM Radu Rendec <radu.rendec@gmail.com> wrote:
>>>>> On 02/18/2020 at 1:08 PM Christophe Leroy <christophe.leroy@c-s.fr> wrote:
>>>>>> Le 18/02/2020 à 18:07, Radu Rendec a écrit :
>>>>>>> The saved NIP seems to be broken inside machine_check_exception() on
>>>>>>> MPC8378, running Linux 4.9.191. The value is 0x900 most of the times,
>>>>>>> but I have seen other weird values.
>>>>>>>
>>>>>>> I've been able to track down the entry code to head_32.S (vector 0x200),
>>>>>>> but I'm not sure where/how the NIP value (where the exception occurred)
>>>>>>> is captured.
>>>>>>
>>>>>> NIP value is supposed to come from SRR0, loaded in r12 in PROLOG_2 and
>>>>>> saved into _NIP(r11) in transfer_to_handler in entry_32.S
>>>>>>
>>>>>> Can something clobber r12 at some point ?
>>>>>>
>>>>>
>>>>> I did something even simpler: I added the following
>>>>>
>>>>>       lis r12,0x1234
>>>>>
>>>>> ... right after
>>>>>
>>>>>       mfspr r12,SPRN_SRR0
>>>>>
>>>>> ... and now the NIP value I see in the crash dump is 0x12340000. This
>>>>> means r12 is not clobbered and most likely the NIP value I normally see
>>>>> is the actual SRR0 value.
>>>>
>>>> I apologize for the noise. I just found out accidentally that the saved
>>>> NIP value is correct if interrupts are disabled at the time when the
>>>> faulty access that triggers the MCE occurs. This seems to happen
>>>> consistently.
>>>>
>>>> By "interrupts are disabled" I mean local_irq_save/local_irq_restore, so
>>>> it's basically enough to wrap ioread32 to get the NIP value right.
>>>>
>>>> Does this make any sense? Maybe it's not a silicon bug after all, or
>>>> maybe it is and I just found a workaround. Could this happen on other
>>>> PowerPC CPUs as well?
>>>
>>> Interesting.
>>>
>>> 0x900 is the adress of the timer interrupt.
>>>
>>> Would the MCE occur just after the timer interrupt ?
> 
> I doubt that. I'm using a small test module to artificially trigger the
> MCE. Basically it's just this (the full code is in my original post):
> 
>          bad_addr_base = ioremap(0xf0000000, 0x100);
>          x = ioread32(bad_addr_base);
> 
> I find it hard to believe that every time I load the module the lwbrx
> instruction that triggers the MCE is executed exactly after the timer
> interrupt (or that the timer interrupt always occurs close to the lwbrx
> instruction).

Can you try to see how much time there is between your read and the MCE ?
The below should allow it, you'll see first value in r13 and the other 
in r14 (mce.c is your test code)

Also provide the timebase frequency as reported in /proc/cpuinfo

diff --git a/arch/powerpc/kernel/head_32.S b/arch/powerpc/kernel/head_32.S
index 97c887950c3c..0ae6a0a17e26 100644
--- a/arch/powerpc/kernel/head_32.S
+++ b/arch/powerpc/kernel/head_32.S
@@ -273,6 +273,7 @@ __secondary_hold_acknowledge:
  	. = 0x200
  	DO_KVM  0x200
  MachineCheck:
+	mftbl	r14
  	EXCEPTION_PROLOG_0
  #ifdef CONFIG_VMAP_STACK
  	li	r11, MSR_KERNEL & ~(MSR_IR | MSR_RI) /* can take DTLB miss */
diff --git a/arch/powerpc/platforms/83xx/mce.c 
b/arch/powerpc/platforms/83xx/mce.c
index 91c2de6b73ca..0b7e4dcc0cb3 100644
--- a/arch/powerpc/platforms/83xx/mce.c
+++ b/arch/powerpc/platforms/83xx/mce.c
@@ -11,7 +11,7 @@ static int __init test_mce_init(void)
          bad_addr_base = ioremap(0xf0000000, 0x100);

          if (bad_addr_base) {
-                __asm__ __volatile__ ("isync");
+                __asm__ __volatile__ ("isync ; mftbl 13");
                  x = ioread32(bad_addr_base);
                  pr_info("Test: %#0x\n", x);
          } else


> 
>>>
>>> Can you tell how are configured your IO busses, etc ... ?
> 
> Nothing special. The device tree is mostly similar to mpc8379_rdb.dts,
> but I can provide the actual dts if you think it's relevant.
> 
>> And what's the value of SERSR after the machine check ?
> 
> I'm assuming you're talking about the IPIC SERSR register. I modified
> machine_check_exception and added a call to ipic_get_mcp_status, which
> seems to read IPIC_SERSR. The value is 0, both with interrupts enabled
> and disabled (which makes sense, since disabling/enabling interrupts is
> local to the CPU core).

And what's the reason given in the Oops message for the machine check ? 
Is that "Caused by (from SRR1=49030): Transfer error ack signal" or 
something else ?

> 
>> Do you use the local bus monitoring driver ?
> 
> I don't. In fact, I'm not even aware of it. What driver is that?

CONFIG_FSL_LBC

Christophe

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: MCE handler gets NIP wrong on MPC8378
  2020-02-20  8:38             ` Christophe Leroy
@ 2020-02-20 16:02               ` Radu Rendec
  2020-02-20 16:25                 ` Christophe Leroy
  0 siblings, 1 reply; 13+ messages in thread
From: Radu Rendec @ 2020-02-20 16:02 UTC (permalink / raw)
  To: Christophe Leroy; +Cc: linuxppc-dev

On 02/20/2020 at 3:38 AM Christophe Leroy <christophe.leroy@c-s.fr> wrote:
> On 02/19/2020 10:39 PM, Radu Rendec wrote:
> > On 02/19/2020 at 4:21 PM Christophe Leroy <christophe.leroy@c-s.fr> wrote:
> >>> Interesting.
> >>>
> >>> 0x900 is the adress of the timer interrupt.
> >>>
> >>> Would the MCE occur just after the timer interrupt ?
> >
> > I doubt that. I'm using a small test module to artificially trigger the
> > MCE. Basically it's just this (the full code is in my original post):
> >
> >          bad_addr_base = ioremap(0xf0000000, 0x100);
> >          x = ioread32(bad_addr_base);
> >
> > I find it hard to believe that every time I load the module the lwbrx
> > instruction that triggers the MCE is executed exactly after the timer
> > interrupt (or that the timer interrupt always occurs close to the lwbrx
> > instruction).
>
> Can you try to see how much time there is between your read and the MCE ?
> The below should allow it, you'll see first value in r13 and the other
> in r14 (mce.c is your test code)
>
> Also provide the timebase frequency as reported in /proc/cpuinfo

I just ran a test: r13 is 0xda8e0f91 and r14 is 0xdaae0f9c.

# cat /proc/cpuinfo
processor       : 0
cpu             : e300c4
clock           : 800.000004MHz
revision        : 1.1 (pvr 8086 1011)
bogomips        : 200.00
timebase        : 100000000

The difference between r14 and r13 is 0x20000b. Assuming TB is
incremented with 'timebase' frequency, that means 20.97 milliseconds
(although the e300 manual says TB is "incremented once every four core
input clock cycles").

I repeated the test twice and the absolute values were of course very
different, but r14-r13 was 0x20000c and 0x200011, so it seems to be
quite consistent (within just a few clock cycles).

Just for the fun of it, I repeated the test once more, but with
interrupts disabled. The difference was 0x200014. FWIW, I disabled
interrupts before sampling TB in r13.

> And what's the reason given in the Oops message for the machine check ?
> Is that "Caused by (from SRR1=49030): Transfer error ack signal" or
> something else ?

When interrupts are enabled:
Caused by (from SRR1=41000): Transfer error ack signal

When interrupts are disabled:
Caused by (from SRR1=41030): Transfer error ack signal

> >
> >> Do you use the local bus monitoring driver ?
> >
> > I don't. In fact, I'm not even aware of it. What driver is that?
>
> CONFIG_FSL_LBC

OK, it seems I'm actually using it. I haven't enabled it explicitly, but
it's automatically pulled by CONFIG_MTD_NAND_FSL_ELBC as a prerequisite.

I looked at the code in arch/powerpc/sysdev/fsl_lbc.c and it's quite
small. Most of the code is in fsl_lbc_ctrl_irq, which I guess is
supposed to print a message if/when the LBC catches an error. I've never
seen any of those messages being printed.

Best regards,
Radu

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: MCE handler gets NIP wrong on MPC8378
  2020-02-20 16:02               ` Radu Rendec
@ 2020-02-20 16:25                 ` Christophe Leroy
  2020-02-20 17:34                   ` Radu Rendec
  0 siblings, 1 reply; 13+ messages in thread
From: Christophe Leroy @ 2020-02-20 16:25 UTC (permalink / raw)
  To: Radu Rendec; +Cc: linuxppc-dev



Le 20/02/2020 à 17:02, Radu Rendec a écrit :
> On 02/20/2020 at 3:38 AM Christophe Leroy <christophe.leroy@c-s.fr> wrote:
>> On 02/19/2020 10:39 PM, Radu Rendec wrote:
>>> On 02/19/2020 at 4:21 PM Christophe Leroy <christophe.leroy@c-s.fr> wrote:
>>>>> Interesting.
>>>>>
>>>>> 0x900 is the adress of the timer interrupt.
>>>>>
>>>>> Would the MCE occur just after the timer interrupt ?
>>>
>>> I doubt that. I'm using a small test module to artificially trigger the
>>> MCE. Basically it's just this (the full code is in my original post):
>>>
>>>           bad_addr_base = ioremap(0xf0000000, 0x100);
>>>           x = ioread32(bad_addr_base);
>>>
>>> I find it hard to believe that every time I load the module the lwbrx
>>> instruction that triggers the MCE is executed exactly after the timer
>>> interrupt (or that the timer interrupt always occurs close to the lwbrx
>>> instruction).
>>
>> Can you try to see how much time there is between your read and the MCE ?
>> The below should allow it, you'll see first value in r13 and the other
>> in r14 (mce.c is your test code)
>>
>> Also provide the timebase frequency as reported in /proc/cpuinfo
> 
> I just ran a test: r13 is 0xda8e0f91 and r14 is 0xdaae0f9c.
> 
> # cat /proc/cpuinfo
> processor       : 0
> cpu             : e300c4
> clock           : 800.000004MHz
> revision        : 1.1 (pvr 8086 1011)
> bogomips        : 200.00
> timebase        : 100000000
> 
> The difference between r14 and r13 is 0x20000b. Assuming TB is
> incremented with 'timebase' frequency, that means 20.97 milliseconds
> (although the e300 manual says TB is "incremented once every four core
> input clock cycles").

I wouldn't be surprised that the internal CPU clock be twice the input 
clock.

So that's long enough to surely get a timer interrupt during every bad 
access.

Now we have to understand why SRR1 contains the address of the timer 
exception entry and not the address of the bad access.

The value of SRR1 confirms that it comes from 0x900 as MSR[IR] and [DR] 
are cleared when interrupts are enabled.

Maybe you should file a support case at NXP. They are usually quite 
professionnal at responding.

Christophe

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: MCE handler gets NIP wrong on MPC8378
  2020-02-20 16:25                 ` Christophe Leroy
@ 2020-02-20 17:34                   ` Radu Rendec
  2020-02-20 17:48                     ` Christophe Leroy
  0 siblings, 1 reply; 13+ messages in thread
From: Radu Rendec @ 2020-02-20 17:34 UTC (permalink / raw)
  To: Christophe Leroy; +Cc: linuxppc-dev

On 02/20/2020 at 11:25 AM Christophe Leroy <christophe.leroy@c-s.fr> wrote:
> Le 20/02/2020 à 17:02, Radu Rendec a écrit :
> > On 02/20/2020 at 3:38 AM Christophe Leroy <christophe.leroy@c-s.fr> wrote:
> >> On 02/19/2020 10:39 PM, Radu Rendec wrote:
> >>> On 02/19/2020 at 4:21 PM Christophe Leroy <christophe.leroy@c-s.fr> wrote:
> >>>>> Interesting.
> >>>>>
> >>>>> 0x900 is the adress of the timer interrupt.
> >>>>>
> >>>>> Would the MCE occur just after the timer interrupt ?
> >>>
> >>> I doubt that. I'm using a small test module to artificially trigger the
> >>> MCE. Basically it's just this (the full code is in my original post):
> >>>
> >>>           bad_addr_base = ioremap(0xf0000000, 0x100);
> >>>           x = ioread32(bad_addr_base);
> >>>
> >>> I find it hard to believe that every time I load the module the lwbrx
> >>> instruction that triggers the MCE is executed exactly after the timer
> >>> interrupt (or that the timer interrupt always occurs close to the lwbrx
> >>> instruction).
> >>
> >> Can you try to see how much time there is between your read and the MCE ?
> >> The below should allow it, you'll see first value in r13 and the other
> >> in r14 (mce.c is your test code)
> >>
> >> Also provide the timebase frequency as reported in /proc/cpuinfo
> >
> > I just ran a test: r13 is 0xda8e0f91 and r14 is 0xdaae0f9c.
> >
> > # cat /proc/cpuinfo
> > processor       : 0
> > cpu             : e300c4
> > clock           : 800.000004MHz
> > revision        : 1.1 (pvr 8086 1011)
> > bogomips        : 200.00
> > timebase        : 100000000
> >
> > The difference between r14 and r13 is 0x20000b. Assuming TB is
> > incremented with 'timebase' frequency, that means 20.97 milliseconds
> > (although the e300 manual says TB is "incremented once every four core
> > input clock cycles").
>
> I wouldn't be surprised that the internal CPU clock be twice the input
> clock.
>
> So that's long enough to surely get a timer interrupt during every bad
> access.
>
> Now we have to understand why SRR1 contains the address of the timer
> exception entry and not the address of the bad access.
>
> The value of SRR1 confirms that it comes from 0x900 as MSR[IR] and [DR]
> are cleared when interrupts are enabled.
>
> Maybe you should file a support case at NXP. They are usually quite
> professionnal at responding.

I already did (quite some time ago), but it started off as "why does the
MCE occur in the first place". That part has already been figured out,
but unfortunately I don't have a viable solution to it. Like you said,
now the focus has shifted to understanding why the SRR0 value is not
what we expect.

I asked them the question about SRR0 as soon as you helped me get back
on track and figured out there's nothing wrong with the Linux MCE
handler and the NIP value comes from SRR0. What they came up with is
basically this paragraph in the e300 core manual (section 5.5.2):

| Note that the e300 core makes no attempt to force recoverability on a
| machine check; however, it does guarantee that the machine check
| interrupt is always taken immediately upon request, with a nonpredicted
| address saved in SRR0, regardless of the current machine state.

... and with an emphasis on "nonpredicted". To be honest, I am a bit
disappointed with their response and I believe in this context what
"unpredicted" means is that the address that is saved to SRR0 is a
"real" address rather than the result of branch prediction. The support
folks were probably thinking "unpredictable". But that's another word
and the difference is quite subtle :)

I updated the case and added information about the interrupts and the
timing. Let's see what they come up with this time.

Best regards,
Radu

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: MCE handler gets NIP wrong on MPC8378
  2020-02-20 17:34                   ` Radu Rendec
@ 2020-02-20 17:48                     ` Christophe Leroy
  0 siblings, 0 replies; 13+ messages in thread
From: Christophe Leroy @ 2020-02-20 17:48 UTC (permalink / raw)
  To: Radu Rendec; +Cc: linuxppc-dev



Le 20/02/2020 à 18:34, Radu Rendec a écrit :
> On 02/20/2020 at 11:25 AM Christophe Leroy <christophe.leroy@c-s.fr> wrote:
>> Le 20/02/2020 à 17:02, Radu Rendec a écrit :
>>> On 02/20/2020 at 3:38 AM Christophe Leroy <christophe.leroy@c-s.fr> wrote:
>>>> On 02/19/2020 10:39 PM, Radu Rendec wrote:
>>>>> On 02/19/2020 at 4:21 PM Christophe Leroy <christophe.leroy@c-s.fr> wrote:
>>>>>>> Interesting.
>>>>>>>
>>>>>>> 0x900 is the adress of the timer interrupt.
>>>>>>>
>>>>>>> Would the MCE occur just after the timer interrupt ?
>>>>>
>>>>> I doubt that. I'm using a small test module to artificially trigger the
>>>>> MCE. Basically it's just this (the full code is in my original post):
>>>>>
>>>>>            bad_addr_base = ioremap(0xf0000000, 0x100);
>>>>>            x = ioread32(bad_addr_base);
>>>>>
>>>>> I find it hard to believe that every time I load the module the lwbrx
>>>>> instruction that triggers the MCE is executed exactly after the timer
>>>>> interrupt (or that the timer interrupt always occurs close to the lwbrx
>>>>> instruction).
>>>>
>>>> Can you try to see how much time there is between your read and the MCE ?
>>>> The below should allow it, you'll see first value in r13 and the other
>>>> in r14 (mce.c is your test code)
>>>>
>>>> Also provide the timebase frequency as reported in /proc/cpuinfo
>>>
>>> I just ran a test: r13 is 0xda8e0f91 and r14 is 0xdaae0f9c.
>>>
>>> # cat /proc/cpuinfo
>>> processor       : 0
>>> cpu             : e300c4
>>> clock           : 800.000004MHz
>>> revision        : 1.1 (pvr 8086 1011)
>>> bogomips        : 200.00
>>> timebase        : 100000000
>>>
>>> The difference between r14 and r13 is 0x20000b. Assuming TB is
>>> incremented with 'timebase' frequency, that means 20.97 milliseconds
>>> (although the e300 manual says TB is "incremented once every four core
>>> input clock cycles").
>>
>> I wouldn't be surprised that the internal CPU clock be twice the input
>> clock.
>>
>> So that's long enough to surely get a timer interrupt during every bad
>> access.
>>
>> Now we have to understand why SRR1 contains the address of the timer
>> exception entry and not the address of the bad access.
>>
>> The value of SRR1 confirms that it comes from 0x900 as MSR[IR] and [DR]
>> are cleared when interrupts are enabled.
>>
>> Maybe you should file a support case at NXP. They are usually quite
>> professionnal at responding.
> 
> I already did (quite some time ago), but it started off as "why does the
> MCE occur in the first place". That part has already been figured out,
> but unfortunately I don't have a viable solution to it. Like you said,
> now the focus has shifted to understanding why the SRR0 value is not
> what we expect.
> 
> I asked them the question about SRR0 as soon as you helped me get back
> on track and figured out there's nothing wrong with the Linux MCE
> handler and the NIP value comes from SRR0. What they came up with is
> basically this paragraph in the e300 core manual (section 5.5.2):
> 
> | Note that the e300 core makes no attempt to force recoverability on a
> | machine check; however, it does guarantee that the machine check
> | interrupt is always taken immediately upon request, with a nonpredicted
> | address saved in SRR0, regardless of the current machine state.
> 
> ... and with an emphasis on "nonpredicted". To be honest, I am a bit
> disappointed with their response and I believe in this context what
> "unpredicted" means is that the address that is saved to SRR0 is a
> "real" address rather than the result of branch prediction. The support
> folks were probably thinking "unpredictable". But that's another word
> and the difference is quite subtle :)
> 
> I updated the case and added information about the interrupts and the
> timing. Let's see what they come up with this time.
> 

Yes now the point is to understand why it starts processing the timer 
interrupt at 0x900 (with IR and DR cleared as observed in SRR1) just 
before taking the Machine Check.

Allthough the execution of the decrementer interrupt is queue for after 
the completion of the failing memory access, I'd expect the Machine 
Check to take priority.

Note that I have never observed such a behaviour on MPC8321 which has an 
e300c2 core.

Christophe

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: MCE handler gets NIP wrong on MPC8378
  2020-02-19 15:11   ` Radu Rendec
  2020-02-19 19:46     ` Radu Rendec
@ 2020-02-26  0:01     ` Radu Rendec
  1 sibling, 0 replies; 13+ messages in thread
From: Radu Rendec @ 2020-02-26  0:01 UTC (permalink / raw)
  To: Christophe Leroy; +Cc: linuxppc-dev

On 02/20/2020 at 12:48 PM Christophe Leroy <christophe.leroy@c-s.fr> wrote:
> Le 20/02/2020 à 18:34, Radu Rendec a écrit :
> > On 02/20/2020 at 11:25 AM Christophe Leroy <christophe.leroy@c-s.fr> wrote:
> >> Le 20/02/2020 à 17:02, Radu Rendec a écrit :
> >>> On 02/20/2020 at 3:38 AM Christophe Leroy <christophe.leroy@c-s.fr> wrote:
> >>>> On 02/19/2020 10:39 PM, Radu Rendec wrote:
> >>>>> On 02/19/2020 at 4:21 PM Christophe Leroy <christophe.leroy@c-s.fr> wrote:
> >>>>>>> Interesting.
> >>>>>>>
> >>>>>>> 0x900 is the adress of the timer interrupt.
> >>>>>>>
> >>>>>>> Would the MCE occur just after the timer interrupt ?
> >>>>>
> >>>>> I doubt that. I'm using a small test module to artificially trigger the
> >>>>> MCE. Basically it's just this (the full code is in my original post):
> >>>>>
> >>>>>            bad_addr_base = ioremap(0xf0000000, 0x100);
> >>>>>            x = ioread32(bad_addr_base);
> >>>>>
> >>>>> I find it hard to believe that every time I load the module the lwbrx
> >>>>> instruction that triggers the MCE is executed exactly after the timer
> >>>>> interrupt (or that the timer interrupt always occurs close to the lwbrx
> >>>>> instruction).
> >>>>
> >>>> Can you try to see how much time there is between your read and the MCE ?
> >>>> The below should allow it, you'll see first value in r13 and the other
> >>>> in r14 (mce.c is your test code)
> >>>>
> >>>> Also provide the timebase frequency as reported in /proc/cpuinfo
> >>>
> >>> I just ran a test: r13 is 0xda8e0f91 and r14 is 0xdaae0f9c.
> >>>
> >>> # cat /proc/cpuinfo
> >>> processor       : 0
> >>> cpu             : e300c4
> >>> clock           : 800.000004MHz
> >>> revision        : 1.1 (pvr 8086 1011)
> >>> bogomips        : 200.00
> >>> timebase        : 100000000
> >>>
> >>> The difference between r14 and r13 is 0x20000b. Assuming TB is
> >>> incremented with 'timebase' frequency, that means 20.97 milliseconds
> >>> (although the e300 manual says TB is "incremented once every four core
> >>> input clock cycles").
> >>
> >> I wouldn't be surprised that the internal CPU clock be twice the input
> >> clock.
> >>
> >> So that's long enough to surely get a timer interrupt during every bad
> >> access.
> >>
> >> Now we have to understand why SRR1 contains the address of the timer
> >> exception entry and not the address of the bad access.
> >>
> >> The value of SRR1 confirms that it comes from 0x900 as MSR[IR] and [DR]
> >> are cleared when interrupts are enabled.
> >>
> >> Maybe you should file a support case at NXP. They are usually quite
> >> professionnal at responding.
> >
> > I already did (quite some time ago), but it started off as "why does the
> > MCE occur in the first place". That part has already been figured out,
> > but unfortunately I don't have a viable solution to it. Like you said,
> > now the focus has shifted to understanding why the SRR0 value is not
> > what we expect.
>
> Yes now the point is to understand why it starts processing the timer
> interrupt at 0x900 (with IR and DR cleared as observed in SRR1) just
> before taking the Machine Check.
>
> Allthough the execution of the decrementer interrupt is queue for after
> the completion of the failing memory access, I'd expect the Machine
> Check to take priority.
>
> Note that I have never observed such a behaviour on MPC8321 which has an
> e300c2 core.

I apologize for the silence during the past few days, I've been diverted
with something else. This is the feedback that I got from NXP:

| The e300 core uses SRR0/1 for both non-critical interrupts and machine
| check interrupts and if they happen simultaneously a problem can occur
| where the return address from the first exception is lost when handling
| the second exception concurrently. This only occurs in the rare case
| when the software ISR hasn't had the time to save SRR0/1 to the sw stack.
|
| If the ability to nest interrupts is desired, software then saves off
| enough state (i.e. the contents of SRR0, SRR1, etc) that will allow it
| to recover (i.e. resume handling the current interrupt) if another
| interrupt occurs.

So basically what they describe is a race condition between the MCE and
a regular interrupt, where the regular interrupt (the timer interrupt,
in our case) kicks in after the MCE handler is entered into but before
it saves SRR0. This not only requires very precise timing, but would
also end up with a saved SRR0 value that points back somewhere inside
the MCE handler.

But I've thought about something else. We already timed it and we know
it consistently takes around 20 ms between the faulty read and the MCE
handler execution. I'm thinking that the faulty read is essentially a
failed transaction on the internal bus, because no peripheral replies
to the access on the bad address. The 20 ms is probably the bus timeout.
How does this scenario look to you?

- The faulty read starts to execute. A new internal bus transaction is
  started, the bad address is put on the bus and the CPU waits for a
  peripheral to reply.
- The timer interrupt kicks in. The CPU saves NIP to SRR0 and NIP
  becomes 0x900. But the CPU cannot start executing immediately from
  address 0x900 because the bus is blocked.
- Nobody replies and eventually the bus transaction fails. An MCE is
  triggered to handle the failed bus transaction.
- The MCE has higher priority than the timer interrupt, so it's handled
  immediately. The CPU saves NIP to SRR0 and NIP becomes 0x200.
- The CPU starts executing the MCE handler with 0x900 in SRR0.

This is pure speculation and I have absolutely no idea about the e300
core internal architecture. But it's my best guess. I've sent something
similar to NXP support. Let's see what they come up with.

By the way, I have successfully tested a fix that uses __do_inl instead
of ioread32 and disables interrupts around the __do_inl call. If I was
even close with my speculation above, then I guess the only thing we
could fix in the kernel would be to modify __do_inl and co. to disable
interrupts around the potentially dangerous access. The benefit would be
that the MCE could be recovered from. For ioread32, there is no real
benefit in doing that (other than printing the correct NIP address in
the crash dump) because it doesn't instrument the exception tables
anyway so it's non-recoverable.

Best regards,
Radu

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2020-02-26  0:03 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-18 17:07 MCE handler gets NIP wrong on MPC8378 Radu Rendec
2020-02-18 18:08 ` Christophe Leroy
2020-02-19 15:11   ` Radu Rendec
2020-02-19 19:46     ` Radu Rendec
2020-02-19 21:08       ` Christophe Leroy
2020-02-19 21:21         ` Christophe Leroy
2020-02-19 22:39           ` Radu Rendec
2020-02-20  8:38             ` Christophe Leroy
2020-02-20 16:02               ` Radu Rendec
2020-02-20 16:25                 ` Christophe Leroy
2020-02-20 17:34                   ` Radu Rendec
2020-02-20 17:48                     ` Christophe Leroy
2020-02-26  0:01     ` Radu Rendec

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).