MCE handler gets NIP wrong on MPC8378

* MCE handler gets NIP wrong on MPC8378
@ 2020-02-18 17:07 Radu Rendec
  2020-02-18 18:08 ` Christophe Leroy
  0 siblings, 1 reply; 13+ messages in thread
From: Radu Rendec @ 2020-02-18 17:07 UTC (permalink / raw)
  To: linuxppc-dev

Hi Everyone,

The saved NIP seems to be broken inside machine_check_exception() on
MPC8378, running Linux 4.9.191. The value is 0x900 most of the times,
but I have seen other weird values.

I've been able to track down the entry code to head_32.S (vector 0x200),
but I'm not sure where/how the NIP value (where the exception occurred)
is captured.

I also noticed most of the code has moved to head_32.h in newer kernel
versions, but EXCEPTION_PROLOG_1 and EXCEPTION_PROLOG_2 look pretty much
the same. I guess the same thing happens on a recent kernel, even though
I don't have an easy way to test it.

The original MCE that I see is triggered by a failed PCIe transaction,
but I was able to reproduce it by just reading from a (physically)
unmapped memory area. Sample code and kernel crash dump are included
below.

Can anyone please provide any suggestion as to what to look at next?

Thanks,
Radu

8<--------------------------------------------------------------------

#include <linux/module.h>
#include <linux/delay.h>
#include <asm/io.h>

static void __iomem *bad_addr_base;

static int __init test_mce_init(void)
{
        unsigned int x;

        bad_addr_base = ioremap(0xf0000000, 0x100);

        if (bad_addr_base) {
                __asm__ __volatile__ ("isync");
                x = ioread32(bad_addr_base);
                pr_info("Test: %#0x\n", x);
        } else
                pr_err("Cannot map\n");

        return 0;
}

static void __exit test_mce_exit(void)
{
        iounmap(bad_addr_base);
}

module_init(test_mce_init);
module_exit(test_mce_exit);

MODULE_LICENSE("GPL");

8<--------------------------------------------------------------------

[   14.977053] mce: loading out-of-tree module taints kernel.
[   15.004285] Disabling lock debugging due to kernel taint
[   15.026151] Machine check in kernel mode.
[   15.030153] Caused by (from SRR1=41000): [   15.033982] Transfer
error ack signal
[   15.037652] Oops: Machine check 1, sig: 7 [#1]
[   15.042088] PREEMPT [   15.044091] MPC8378_CUSTOM
[   15.046967] Modules linked in: mce(O+) iptable_filter ip_tables
x_tables ipv6 mpc8xxx_wdt yaffs spidev spi_fsl_spi spi_fsl_lib
spi_fsl_cpm fsl_mph_dr_of ehci_fsl ehci_hcd
[   15.067486] CPU: 0 PID: 1213 Comm: insmod Tainted: G   M     C O
4.9.191-default-mpc8378-p3c692a64ae1d #31
[   15.078175] task: 9e83e550 task.stack: 9ea2e000
[   15.082699] NIP: 00000900 LR: b147e030 CTR: 80015d50
[   15.087659] REGS: 9ea2fca0 TRAP: 0200   Tainted: G   M     C O
(4.9.191-default-mpc8378-p3c692a64ae1d)
[   15.098084] MSR: 00041000 <ME>[   15.100973]   CR: 42002228  XER: 00000000
[   15.104976] DAR: 80017414 DSISR: 00000000
GPR00: b147e030 9ea2fd50 9e83e550 00000000 b1480000 9c652200 9ea2fd18 00000000
GPR08: 9c652200 00000000 b1480000 00001032 80015d50 100b93d0 b147e308 805eb3d8
GPR16: 0000003a 00000550 b1473b5c b147c2a4 8048e444 80082b08 00000000 b147c0e8
GPR24: 00000124 00000578 00000000 00000000 b147c0a0 b147e000 9eb7c280 b147c0a0
NIP [00000900] 0x900
[   15.139310] LR [b147e030] test_mce_init+0x30/0xa8 [mce]
[   15.144528] Call Trace:
[   15.146973] [9ea2fd50] [b147e000] test_mce_init+0x0/0xa8 [mce] (unreliable)
[   15.153940] [9ea2fd60] [b147e030] test_mce_init+0x30/0xa8 [mce]
[   15.159864] [9ea2fd70] [80003ac4] do_one_initcall+0xbc/0x184
[   15.165527] [9ea2fde0] [804857e8] do_init_module+0x64/0x1e4
[   15.171107] [9ea2fe00] [80086014] load_module+0x1c78/0x2268
[   15.176680] [9ea2fec0] [80086780] SyS_init_module+0x17c/0x190
[   15.182433] [9ea2ff40] [80010acc] ret_from_syscall+0x0/0x38
[   15.188005] --- interrupt: c01 at 0xfdfdb64
[   15.188005]     LR = 0x10013c64
[   15.195309] Instruction dump:
[   15.198274] 00000000 XXXXXXXX 00000000 XXXXXXXX 00000000 XXXXXXXX
00000000 XXXXXXXX
[   15.206047] 00000000 XXXXXXXX 00000000 XXXXXXXX 7d5043a6 XXXXXXXX
7d400026 XXXXXXXX
[   15.213824] ---[ end trace d38922938e009d45 ]---
[   15.218434]
[   16.219951] Kernel panic - not syncing: Fatal exception
[   16.225174] Rebooting in 1 seconds..

^ permalink raw reply	[flat|nested] 13+ messages in thread