* Injecting SLB miltihit crashes kernel 5.9.0-rc5 @ 2020-09-15 8:43 Michal Suchánek 2020-09-15 12:54 ` Michael Ellerman ` (2 more replies) 0 siblings, 3 replies; 10+ messages in thread From: Michal Suchánek @ 2020-09-15 8:43 UTC (permalink / raw) To: linuxppc-dev Hello, Using the SLB mutihit injection test module (which I did not write so I do not want to post it here) to verify updates on my 5.3 frankernekernel I found that the kernel crashes with Oops: kernel bad access. I tested on latest upstream kernel build that I have at hand and the result is te same (minus the message - nothing was logged and the kernel simply rebooted). Since the whole effort to write a real mode MCE handler was supposed to prevent this maybe the SLB injection module should be added to the kernel selftests? Thanks Michal ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Injecting SLB miltihit crashes kernel 5.9.0-rc5 2020-09-15 8:43 Injecting SLB miltihit crashes kernel 5.9.0-rc5 Michal Suchánek @ 2020-09-15 12:54 ` Michael Ellerman 2020-09-16 0:51 ` Nicholas Piggin 2020-09-15 18:06 ` Michal Suchanek 2020-09-16 7:12 ` Injecting SLB miltihit crashes kernel 5.9.0-rc5 Mahesh Jagannath Salgaonkar 2 siblings, 1 reply; 10+ messages in thread From: Michael Ellerman @ 2020-09-15 12:54 UTC (permalink / raw) To: Michal Suchánek, linuxppc-dev, mahesh Michal Suchánek <msuchanek@suse.de> writes: > Hello, > > Using the SLB mutihit injection test module (which I did not write so I > do not want to post it here) to verify updates on my 5.3 frankernekernel > I found that the kernel crashes with Oops: kernel bad access. > > I tested on latest upstream kernel build that I have at hand and the > result is te same (minus the message - nothing was logged and the kernel > simply rebooted). That's disappointing. > Since the whole effort to write a real mode MCE handler was supposed to > prevent this maybe the SLB injection module should be added to the > kernel selftests? Yes I'd like to see it upstream. I think it should be integrated into LKDTM, which contains other dangerous things like that and is designed for testing how the kernel handles/recovers from bad conditions. cheers ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Injecting SLB miltihit crashes kernel 5.9.0-rc5 2020-09-15 12:54 ` Michael Ellerman @ 2020-09-16 0:51 ` Nicholas Piggin 0 siblings, 0 replies; 10+ messages in thread From: Nicholas Piggin @ 2020-09-16 0:51 UTC (permalink / raw) To: linuxppc-dev, mahesh, Michael Ellerman, Michal Suchánek Excerpts from Michael Ellerman's message of September 15, 2020 10:54 pm: > Michal Suchánek <msuchanek@suse.de> writes: >> Hello, >> >> Using the SLB mutihit injection test module (which I did not write so I >> do not want to post it here) to verify updates on my 5.3 frankernekernel >> I found that the kernel crashes with Oops: kernel bad access. >> >> I tested on latest upstream kernel build that I have at hand and the >> result is te same (minus the message - nothing was logged and the kernel >> simply rebooted). > > That's disappointing. It seems to work okay with qemu and mambo injection on upstream (powernv_defconfig). I wonder why that nmi_enter is crashing. Can you post the output of a successful test with the patch reverted? qemu injection test output - [ 195.279885][ C0] Disabling lock debugging due to kernel taint [ 195.280891][ C0] MCE: CPU0: machine check (Warning) Host SLB Multihit DAR: 00000000deadbeef [Recovered] [ 195.282117][ C0] MCE: CPU0: NIP: [c00000000003c2b4] isa300_idle_stop_mayloss+0x68/0x6c [ 195.283631][ C0] MCE: CPU0: Initiator CPU [ 195.284432][ C0] MCE: CPU0: Probable Software error (some chance of hardware cause) [ 220.711577][ T90] MCE: CPU0: machine check (Warning) Host SLB Multihit DAR: 00000000deadbeef [Recovered] [ 220.712805][ T90] MCE: CPU0: PID: 90 Comm: yes NIP: [00007fff7fdac2e0] [ 220.713553][ T90] MCE: CPU0: Initiator CPU [ 220.714021][ T90] MCE: CPU0: Probable Software error (some chance of hardware cause) Thanks, Nick ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH] Revert "powerpc/64s: machine check interrupt update NMI accounting" 2020-09-15 8:43 Injecting SLB miltihit crashes kernel 5.9.0-rc5 Michal Suchánek @ 2020-09-15 18:06 ` Michal Suchanek 2020-09-15 18:06 ` Michal Suchanek 2020-09-16 7:12 ` Injecting SLB miltihit crashes kernel 5.9.0-rc5 Mahesh Jagannath Salgaonkar 2 siblings, 0 replies; 10+ messages in thread From: Michal Suchanek @ 2020-09-15 18:06 UTC (permalink / raw) To: linuxppc-dev Cc: Michal Suchanek, Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras, Nicholas Piggin, Santosh Sivaraj, Christophe Leroy, Ganesh Goudar, Mahesh Salgaonkar, Alistair Popple, Jordan Niethe, Mike Rapoport, Peter Zijlstra, Aneesh Kumar K.V, Greg Kroah-Hartman, linux-kernel, stable This reverts commit 116ac378bb3ff844df333e7609e7604651a0db9d. This commit causes the kernel to oops and reboot when injecting a SLB multihit which causes a MCE. Before this commit a SLB multihit was corrected by the kernel and the system continued to operate normally. cc: stable@vger.kernel.org Fixes: 116ac378bb3f ("powerpc/64s: machine check interrupt update NMI accounting") Signed-off-by: Michal Suchanek <msuchanek@suse.de> --- arch/powerpc/kernel/mce.c | 7 ------- arch/powerpc/kernel/traps.c | 18 +++--------------- 2 files changed, 3 insertions(+), 22 deletions(-) diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c index ada59f6c4298..2e13528dcc92 100644 --- a/arch/powerpc/kernel/mce.c +++ b/arch/powerpc/kernel/mce.c @@ -591,14 +591,10 @@ EXPORT_SYMBOL_GPL(machine_check_print_event_info); long notrace machine_check_early(struct pt_regs *regs) { long handled = 0; - bool nested = in_nmi(); u8 ftrace_enabled = this_cpu_get_ftrace_enabled(); this_cpu_set_ftrace_enabled(0); - if (!nested) - nmi_enter(); - hv_nmi_check_nonrecoverable(regs); /* @@ -607,9 +603,6 @@ long notrace machine_check_early(struct pt_regs *regs) if (ppc_md.machine_check_early) handled = ppc_md.machine_check_early(regs); - if (!nested) - nmi_exit(); - this_cpu_set_ftrace_enabled(ftrace_enabled); return handled; diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c index d1ebe152f210..7853b770918d 100644 --- a/arch/powerpc/kernel/traps.c +++ b/arch/powerpc/kernel/traps.c @@ -827,19 +827,7 @@ void machine_check_exception(struct pt_regs *regs) { int recover = 0; - /* - * BOOK3S_64 does not call this handler as a non-maskable interrupt - * (it uses its own early real-mode handler to handle the MCE proper - * and then raises irq_work to call this handler when interrupts are - * enabled). - * - * This is silly. The BOOK3S_64 should just call a different function - * rather than expecting semantics to magically change. Something - * like 'non_nmi_machine_check_exception()', perhaps? - */ - const bool nmi = !IS_ENABLED(CONFIG_PPC_BOOK3S_64); - - if (nmi) nmi_enter(); + nmi_enter(); __this_cpu_inc(irq_stat.mce_exceptions); @@ -865,7 +853,7 @@ void machine_check_exception(struct pt_regs *regs) if (check_io_access(regs)) goto bail; - if (nmi) nmi_exit(); + nmi_exit(); die("Machine check", regs, SIGBUS); @@ -876,7 +864,7 @@ void machine_check_exception(struct pt_regs *regs) return; bail: - if (nmi) nmi_exit(); + nmi_exit(); } void SMIException(struct pt_regs *regs) -- 2.28.0 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH] Revert "powerpc/64s: machine check interrupt update NMI accounting" @ 2020-09-15 18:06 ` Michal Suchanek 0 siblings, 0 replies; 10+ messages in thread From: Michal Suchanek @ 2020-09-15 18:06 UTC (permalink / raw) To: linuxppc-dev Cc: Aneesh Kumar K.V, Santosh Sivaraj, Alistair Popple, Greg Kroah-Hartman, Peter Zijlstra, Mahesh Salgaonkar, Nicholas Piggin, linux-kernel, Paul Mackerras, Ganesh Goudar, stable, Michal Suchanek, Jordan Niethe, Mike Rapoport This reverts commit 116ac378bb3ff844df333e7609e7604651a0db9d. This commit causes the kernel to oops and reboot when injecting a SLB multihit which causes a MCE. Before this commit a SLB multihit was corrected by the kernel and the system continued to operate normally. cc: stable@vger.kernel.org Fixes: 116ac378bb3f ("powerpc/64s: machine check interrupt update NMI accounting") Signed-off-by: Michal Suchanek <msuchanek@suse.de> --- arch/powerpc/kernel/mce.c | 7 ------- arch/powerpc/kernel/traps.c | 18 +++--------------- 2 files changed, 3 insertions(+), 22 deletions(-) diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c index ada59f6c4298..2e13528dcc92 100644 --- a/arch/powerpc/kernel/mce.c +++ b/arch/powerpc/kernel/mce.c @@ -591,14 +591,10 @@ EXPORT_SYMBOL_GPL(machine_check_print_event_info); long notrace machine_check_early(struct pt_regs *regs) { long handled = 0; - bool nested = in_nmi(); u8 ftrace_enabled = this_cpu_get_ftrace_enabled(); this_cpu_set_ftrace_enabled(0); - if (!nested) - nmi_enter(); - hv_nmi_check_nonrecoverable(regs); /* @@ -607,9 +603,6 @@ long notrace machine_check_early(struct pt_regs *regs) if (ppc_md.machine_check_early) handled = ppc_md.machine_check_early(regs); - if (!nested) - nmi_exit(); - this_cpu_set_ftrace_enabled(ftrace_enabled); return handled; diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c index d1ebe152f210..7853b770918d 100644 --- a/arch/powerpc/kernel/traps.c +++ b/arch/powerpc/kernel/traps.c @@ -827,19 +827,7 @@ void machine_check_exception(struct pt_regs *regs) { int recover = 0; - /* - * BOOK3S_64 does not call this handler as a non-maskable interrupt - * (it uses its own early real-mode handler to handle the MCE proper - * and then raises irq_work to call this handler when interrupts are - * enabled). - * - * This is silly. The BOOK3S_64 should just call a different function - * rather than expecting semantics to magically change. Something - * like 'non_nmi_machine_check_exception()', perhaps? - */ - const bool nmi = !IS_ENABLED(CONFIG_PPC_BOOK3S_64); - - if (nmi) nmi_enter(); + nmi_enter(); __this_cpu_inc(irq_stat.mce_exceptions); @@ -865,7 +853,7 @@ void machine_check_exception(struct pt_regs *regs) if (check_io_access(regs)) goto bail; - if (nmi) nmi_exit(); + nmi_exit(); die("Machine check", regs, SIGBUS); @@ -876,7 +864,7 @@ void machine_check_exception(struct pt_regs *regs) return; bail: - if (nmi) nmi_exit(); + nmi_exit(); } void SMIException(struct pt_regs *regs) -- 2.28.0 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH] Revert "powerpc/64s: machine check interrupt update NMI accounting" 2020-09-15 18:06 ` Michal Suchanek @ 2020-09-15 18:16 ` peterz -1 siblings, 0 replies; 10+ messages in thread From: peterz @ 2020-09-15 18:16 UTC (permalink / raw) To: Michal Suchanek Cc: linuxppc-dev, Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras, Nicholas Piggin, Santosh Sivaraj, Christophe Leroy, Ganesh Goudar, Mahesh Salgaonkar, Alistair Popple, Jordan Niethe, Mike Rapoport, Aneesh Kumar K.V, Greg Kroah-Hartman, linux-kernel, stable On Tue, Sep 15, 2020 at 08:06:59PM +0200, Michal Suchanek wrote: > This reverts commit 116ac378bb3ff844df333e7609e7604651a0db9d. > > This commit causes the kernel to oops and reboot when injecting a SLB > multihit which causes a MCE. > > Before this commit a SLB multihit was corrected by the kernel and the > system continued to operate normally. > > cc: stable@vger.kernel.org > Fixes: 116ac378bb3f ("powerpc/64s: machine check interrupt update NMI accounting") > Signed-off-by: Michal Suchanek <msuchanek@suse.de> Ever since 69ea03b56ed2 ("hardirq/nmi: Allow nested nmi_enter()") nmi_enter() supports nesting natively. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] Revert "powerpc/64s: machine check interrupt update NMI accounting" @ 2020-09-15 18:16 ` peterz 0 siblings, 0 replies; 10+ messages in thread From: peterz @ 2020-09-15 18:16 UTC (permalink / raw) To: Michal Suchanek Cc: Aneesh Kumar K.V, Santosh Sivaraj, Alistair Popple, Greg Kroah-Hartman, Mahesh Salgaonkar, Nicholas Piggin, linux-kernel, Jordan Niethe, Paul Mackerras, Ganesh Goudar, stable, linuxppc-dev, Mike Rapoport On Tue, Sep 15, 2020 at 08:06:59PM +0200, Michal Suchanek wrote: > This reverts commit 116ac378bb3ff844df333e7609e7604651a0db9d. > > This commit causes the kernel to oops and reboot when injecting a SLB > multihit which causes a MCE. > > Before this commit a SLB multihit was corrected by the kernel and the > system continued to operate normally. > > cc: stable@vger.kernel.org > Fixes: 116ac378bb3f ("powerpc/64s: machine check interrupt update NMI accounting") > Signed-off-by: Michal Suchanek <msuchanek@suse.de> Ever since 69ea03b56ed2 ("hardirq/nmi: Allow nested nmi_enter()") nmi_enter() supports nesting natively. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] Revert "powerpc/64s: machine check interrupt update NMI accounting" 2020-09-15 18:16 ` peterz @ 2020-09-16 9:00 ` Michal Suchánek -1 siblings, 0 replies; 10+ messages in thread From: Michal Suchánek @ 2020-09-16 9:00 UTC (permalink / raw) To: peterz Cc: linuxppc-dev, Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras, Nicholas Piggin, Santosh Sivaraj, Christophe Leroy, Ganesh Goudar, Mahesh Salgaonkar, Alistair Popple, Jordan Niethe, Mike Rapoport, Aneesh Kumar K.V, Greg Kroah-Hartman, linux-kernel, stable On Tue, Sep 15, 2020 at 08:16:42PM +0200, peterz@infradead.org wrote: > On Tue, Sep 15, 2020 at 08:06:59PM +0200, Michal Suchanek wrote: > > This reverts commit 116ac378bb3ff844df333e7609e7604651a0db9d. > > > > This commit causes the kernel to oops and reboot when injecting a SLB > > multihit which causes a MCE. > > > > Before this commit a SLB multihit was corrected by the kernel and the > > system continued to operate normally. > > > > cc: stable@vger.kernel.org > > Fixes: 116ac378bb3f ("powerpc/64s: machine check interrupt update NMI accounting") > > Signed-off-by: Michal Suchanek <msuchanek@suse.de> > > Ever since 69ea03b56ed2 ("hardirq/nmi: Allow nested nmi_enter()") > nmi_enter() supports nesting natively. And this patch was merged in parallel with this native nesting support and conflicted with it - hence the explicit nesting in the hunk that did not conflict. Either way the bug is present on kernels both with and without 69ea03b56ed2. So besides the conflict 69ea03b56ed2 does not affect this problem. Thanks Michal ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] Revert "powerpc/64s: machine check interrupt update NMI accounting" @ 2020-09-16 9:00 ` Michal Suchánek 0 siblings, 0 replies; 10+ messages in thread From: Michal Suchánek @ 2020-09-16 9:00 UTC (permalink / raw) To: peterz Cc: Aneesh Kumar K.V, Santosh Sivaraj, Alistair Popple, Greg Kroah-Hartman, Mahesh Salgaonkar, Nicholas Piggin, linux-kernel, Jordan Niethe, Paul Mackerras, Ganesh Goudar, stable, linuxppc-dev, Mike Rapoport On Tue, Sep 15, 2020 at 08:16:42PM +0200, peterz@infradead.org wrote: > On Tue, Sep 15, 2020 at 08:06:59PM +0200, Michal Suchanek wrote: > > This reverts commit 116ac378bb3ff844df333e7609e7604651a0db9d. > > > > This commit causes the kernel to oops and reboot when injecting a SLB > > multihit which causes a MCE. > > > > Before this commit a SLB multihit was corrected by the kernel and the > > system continued to operate normally. > > > > cc: stable@vger.kernel.org > > Fixes: 116ac378bb3f ("powerpc/64s: machine check interrupt update NMI accounting") > > Signed-off-by: Michal Suchanek <msuchanek@suse.de> > > Ever since 69ea03b56ed2 ("hardirq/nmi: Allow nested nmi_enter()") > nmi_enter() supports nesting natively. And this patch was merged in parallel with this native nesting support and conflicted with it - hence the explicit nesting in the hunk that did not conflict. Either way the bug is present on kernels both with and without 69ea03b56ed2. So besides the conflict 69ea03b56ed2 does not affect this problem. Thanks Michal ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Injecting SLB miltihit crashes kernel 5.9.0-rc5 2020-09-15 8:43 Injecting SLB miltihit crashes kernel 5.9.0-rc5 Michal Suchánek 2020-09-15 12:54 ` Michael Ellerman 2020-09-15 18:06 ` Michal Suchanek @ 2020-09-16 7:12 ` Mahesh Jagannath Salgaonkar 2 siblings, 0 replies; 10+ messages in thread From: Mahesh Jagannath Salgaonkar @ 2020-09-16 7:12 UTC (permalink / raw) To: Michal Suchánek, linuxppc-dev On 9/15/20 2:13 PM, Michal Suchánek wrote: > Hello, > > Using the SLB mutihit injection test module (which I did not write so I > do not want to post it here) to verify updates on my 5.3 frankernekernel > I found that the kernel crashes with Oops: kernel bad access. > > I tested on latest upstream kernel build that I have at hand and the > result is te same (minus the message - nothing was logged and the kernel > simply rebooted). Yes, SLB multihit recovery is broken upstream. Fix is on the way. > > Since the whole effort to write a real mode MCE handler was supposed to > prevent this maybe the SLB injection module should be added to the > kernel selftests? Yes. We are working on adding SLB injection selftest patches will be posted soon. Thanks, -Mahesh. > > Thanks > > Michal > ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2020-09-16 9:02 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-09-15 8:43 Injecting SLB miltihit crashes kernel 5.9.0-rc5 Michal Suchánek 2020-09-15 12:54 ` Michael Ellerman 2020-09-16 0:51 ` Nicholas Piggin 2020-09-15 18:06 ` [PATCH] Revert "powerpc/64s: machine check interrupt update NMI accounting" Michal Suchanek 2020-09-15 18:06 ` Michal Suchanek 2020-09-15 18:16 ` peterz 2020-09-15 18:16 ` peterz 2020-09-16 9:00 ` Michal Suchánek 2020-09-16 9:00 ` Michal Suchánek 2020-09-16 7:12 ` Injecting SLB miltihit crashes kernel 5.9.0-rc5 Mahesh Jagannath Salgaonkar
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.