All of lore.kernel.org
 help / color / mirror / Atom feed
From: Christophe Leroy <christophe.leroy@c-s.fr>
To: Radu Rendec <radu.rendec@gmail.com>
Cc: linuxppc-dev@lists.ozlabs.org
Subject: Re: MCE handler gets NIP wrong on MPC8378
Date: Thu, 20 Feb 2020 08:38:24 +0000	[thread overview]
Message-ID: <09e9a042-766c-d2e6-2300-cebc372cabde@c-s.fr> (raw)
In-Reply-To: <CAD5jUk-Wta-W26D7PUwi2__2GoDp9pOrKMiNCdu9TnWgMvy4GQ@mail.gmail.com>



On 02/19/2020 10:39 PM, Radu Rendec wrote:
> On 02/19/2020 at 4:21 PM Christophe Leroy <christophe.leroy@c-s.fr> wrote:
>>> Radu Rendec <radu.rendec@gmail.com> a écrit :
>>>> On 02/19/2020 at 10:11 AM Radu Rendec <radu.rendec@gmail.com> wrote:
>>>>> On 02/18/2020 at 1:08 PM Christophe Leroy <christophe.leroy@c-s.fr> wrote:
>>>>>> Le 18/02/2020 à 18:07, Radu Rendec a écrit :
>>>>>>> The saved NIP seems to be broken inside machine_check_exception() on
>>>>>>> MPC8378, running Linux 4.9.191. The value is 0x900 most of the times,
>>>>>>> but I have seen other weird values.
>>>>>>>
>>>>>>> I've been able to track down the entry code to head_32.S (vector 0x200),
>>>>>>> but I'm not sure where/how the NIP value (where the exception occurred)
>>>>>>> is captured.
>>>>>>
>>>>>> NIP value is supposed to come from SRR0, loaded in r12 in PROLOG_2 and
>>>>>> saved into _NIP(r11) in transfer_to_handler in entry_32.S
>>>>>>
>>>>>> Can something clobber r12 at some point ?
>>>>>>
>>>>>
>>>>> I did something even simpler: I added the following
>>>>>
>>>>>       lis r12,0x1234
>>>>>
>>>>> ... right after
>>>>>
>>>>>       mfspr r12,SPRN_SRR0
>>>>>
>>>>> ... and now the NIP value I see in the crash dump is 0x12340000. This
>>>>> means r12 is not clobbered and most likely the NIP value I normally see
>>>>> is the actual SRR0 value.
>>>>
>>>> I apologize for the noise. I just found out accidentally that the saved
>>>> NIP value is correct if interrupts are disabled at the time when the
>>>> faulty access that triggers the MCE occurs. This seems to happen
>>>> consistently.
>>>>
>>>> By "interrupts are disabled" I mean local_irq_save/local_irq_restore, so
>>>> it's basically enough to wrap ioread32 to get the NIP value right.
>>>>
>>>> Does this make any sense? Maybe it's not a silicon bug after all, or
>>>> maybe it is and I just found a workaround. Could this happen on other
>>>> PowerPC CPUs as well?
>>>
>>> Interesting.
>>>
>>> 0x900 is the adress of the timer interrupt.
>>>
>>> Would the MCE occur just after the timer interrupt ?
> 
> I doubt that. I'm using a small test module to artificially trigger the
> MCE. Basically it's just this (the full code is in my original post):
> 
>          bad_addr_base = ioremap(0xf0000000, 0x100);
>          x = ioread32(bad_addr_base);
> 
> I find it hard to believe that every time I load the module the lwbrx
> instruction that triggers the MCE is executed exactly after the timer
> interrupt (or that the timer interrupt always occurs close to the lwbrx
> instruction).

Can you try to see how much time there is between your read and the MCE ?
The below should allow it, you'll see first value in r13 and the other 
in r14 (mce.c is your test code)

Also provide the timebase frequency as reported in /proc/cpuinfo

diff --git a/arch/powerpc/kernel/head_32.S b/arch/powerpc/kernel/head_32.S
index 97c887950c3c..0ae6a0a17e26 100644
--- a/arch/powerpc/kernel/head_32.S
+++ b/arch/powerpc/kernel/head_32.S
@@ -273,6 +273,7 @@ __secondary_hold_acknowledge:
  	. = 0x200
  	DO_KVM  0x200
  MachineCheck:
+	mftbl	r14
  	EXCEPTION_PROLOG_0
  #ifdef CONFIG_VMAP_STACK
  	li	r11, MSR_KERNEL & ~(MSR_IR | MSR_RI) /* can take DTLB miss */
diff --git a/arch/powerpc/platforms/83xx/mce.c 
b/arch/powerpc/platforms/83xx/mce.c
index 91c2de6b73ca..0b7e4dcc0cb3 100644
--- a/arch/powerpc/platforms/83xx/mce.c
+++ b/arch/powerpc/platforms/83xx/mce.c
@@ -11,7 +11,7 @@ static int __init test_mce_init(void)
          bad_addr_base = ioremap(0xf0000000, 0x100);

          if (bad_addr_base) {
-                __asm__ __volatile__ ("isync");
+                __asm__ __volatile__ ("isync ; mftbl 13");
                  x = ioread32(bad_addr_base);
                  pr_info("Test: %#0x\n", x);
          } else


> 
>>>
>>> Can you tell how are configured your IO busses, etc ... ?
> 
> Nothing special. The device tree is mostly similar to mpc8379_rdb.dts,
> but I can provide the actual dts if you think it's relevant.
> 
>> And what's the value of SERSR after the machine check ?
> 
> I'm assuming you're talking about the IPIC SERSR register. I modified
> machine_check_exception and added a call to ipic_get_mcp_status, which
> seems to read IPIC_SERSR. The value is 0, both with interrupts enabled
> and disabled (which makes sense, since disabling/enabling interrupts is
> local to the CPU core).

And what's the reason given in the Oops message for the machine check ? 
Is that "Caused by (from SRR1=49030): Transfer error ack signal" or 
something else ?

> 
>> Do you use the local bus monitoring driver ?
> 
> I don't. In fact, I'm not even aware of it. What driver is that?

CONFIG_FSL_LBC

Christophe

  reply	other threads:[~2020-02-20  8:39 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-18 17:07 MCE handler gets NIP wrong on MPC8378 Radu Rendec
2020-02-18 18:08 ` Christophe Leroy
2020-02-19 15:11   ` Radu Rendec
2020-02-19 19:46     ` Radu Rendec
2020-02-19 21:08       ` Christophe Leroy
2020-02-19 21:21         ` Christophe Leroy
2020-02-19 22:39           ` Radu Rendec
2020-02-20  8:38             ` Christophe Leroy [this message]
2020-02-20 16:02               ` Radu Rendec
2020-02-20 16:25                 ` Christophe Leroy
2020-02-20 17:34                   ` Radu Rendec
2020-02-20 17:48                     ` Christophe Leroy
2020-02-26  0:01     ` Radu Rendec

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=09e9a042-766c-d2e6-2300-cebc372cabde@c-s.fr \
    --to=christophe.leroy@c-s.fr \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=radu.rendec@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.