All of lore.kernel.org
 help / color / mirror / Atom feed
From: Christophe LEROY <christophe.leroy@c-s.fr>
To: Nicholas Piggin <npiggin@gmail.com>
Cc: Mahesh Jagannath Salgaonkar <mahesh@linux.vnet.ibm.com>,
	linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH v2 3/3] powerpc: machine check interrupt is a non-maskable interrupt
Date: Thu, 11 Oct 2018 16:31:16 +0200	[thread overview]
Message-ID: <9f0cbf48-d278-08bf-cb32-8b9608768025@c-s.fr> (raw)
In-Reply-To: <20181009211650.042d428c@roar.ozlabs.ibm.com>



Le 09/10/2018 à 13:16, Nicholas Piggin a écrit :
> On Tue, 9 Oct 2018 09:36:18 +0000
> Christophe Leroy <christophe.leroy@c-s.fr> wrote:
> 
>> On 10/09/2018 05:30 AM, Nicholas Piggin wrote:
>>> On Tue, 9 Oct 2018 06:46:30 +0200
>>> Christophe LEROY <christophe.leroy@c-s.fr> wrote:
>>>    
>>>> Le 09/10/2018 à 06:32, Nicholas Piggin a écrit :
>>>>> On Mon, 8 Oct 2018 17:39:11 +0200
>>>>> Christophe LEROY <christophe.leroy@c-s.fr> wrote:
>>>>>       
>>>>>> Hi Nick,
>>>>>>
>>>>>> Le 19/07/2017 à 08:59, Nicholas Piggin a écrit :
>>>>>>> Use nmi_enter similarly to system reset interrupts. This uses NMI
>>>>>>> printk NMI buffers and turns off various debugging facilities that
>>>>>>> helps avoid tripping on ourselves or other CPUs.
>>>>>>>
>>>>>>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
>>>>>>> ---
>>>>>>>      arch/powerpc/kernel/traps.c | 9 ++++++---
>>>>>>>      1 file changed, 6 insertions(+), 3 deletions(-)
>>>>>>>
>>>>>>> diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
>>>>>>> index 2849c4f50324..6d31f9d7c333 100644
>>>>>>> --- a/arch/powerpc/kernel/traps.c
>>>>>>> +++ b/arch/powerpc/kernel/traps.c
>>>>>>> @@ -789,8 +789,10 @@ int machine_check_generic(struct pt_regs *regs)
>>>>>>>      
>>>>>>>      void machine_check_exception(struct pt_regs *regs)
>>>>>>>      {
>>>>>>> -	enum ctx_state prev_state = exception_enter();
>>>>>>>      	int recover = 0;
>>>>>>> +	bool nested = in_nmi();
>>>>>>> +	if (!nested)
>>>>>>> +		nmi_enter();
>>>>>>
>>>>>> This alters preempt_count, then when die() is called
>>>>>> in_interrupt() returns true allthough the trap didn't happen in
>>>>>> interrupt, so oops_end() panics for "fatal exception in interrupt"
>>>>>> instead of gently sending SIGBUS the faulting app.
>>>>>
>>>>> Thanks for tracking that down.
>>>>>       
>>>>>> Any idea on how to fix this ?
>>>>>
>>>>> I would say we have to deliver the sigbus by hand.
>>>>>
>>>>>        if ((user_mode(regs)))
>>>>>            _exception(SIGBUS, regs, BUS_MCEERR_AR, regs->nip);
>>>>>        else
>>>>>            die("Machine check", regs, SIGBUS);
>>>>>       
>>>>
>>>> And what about all the other things done by 'die()' ?
>>>>
>>>> And what if it is a kernel thread ?
>>>>
>>>> In one of my boards, I have a kernel thread regularly checking the HW,
>>>> and if it gets a machine check I expect it to gently stop and the die
>>>> notification to be delivered to all registered notifiers.
>>>>
>>>> Until before this patch, it was working well.
>>>
>>> I guess the alternative is we could check regs->trap for machine
>>> check in the die test. Complication is having to account for MCE
>>> in an interrupt handler.
>>>
>>>          if (in_interrupt()) {
>>>                   if (!IS_MCHECK_EXC(regs) || (irq_count() - (NMI_OFFSET + HARDIRQ_OFFSET)))
>>>                       panic("Fatal exception in interrupt");
>>>          }
>>>
>>> Something like that might work for you? We needs a ppc64 macro for the
>>> MCE, and can probably add something like in_nmi_from_interrupt() for
>>> the second part of the test.
>>
>> Don't know, I'm away from home on business trip so I won't be able to
>> test anything before next week. However it looks more or less like a
>> hack, doesn't it ?
> 
> I thought it seemed okay (with the right functions added). Actually it
> could be a bit nicer to do this, then it works generally :
> 
>           if (in_interrupt()) {
>                    if (!in_nmi() || in_nmi_from_interrupt())
>                        panic("Fatal exception in interrupt");
>           }
> 
>>
>> What about the following ?
> 
> Hmm, in some ways maybe it's nicer. One complication is I would like the
> same thing to be available for platform specific machine check
> handlers, so then you need to pass is_in_interrupt to them. Which you
> can do without any problem... But is it cleaner than the above?

For me it looks cleaner than twiddle the preempt_count depending on 
whether we were or not already in nmi() .

Let's draft something and see what it looks like.


> 
> I guess one advantage of yours is that a BUG somewhere in the NMI path
> will panic the system. Or is that a disadvantage?

Why would it panic the system more than now ? And is it an issue at all 
? Doesn't BUG() panic in any case ?

Christophe

  parent reply	other threads:[~2018-10-11 14:33 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-19  6:59 [PATCH v2 0/3] machine check handling improvements Nicholas Piggin
2017-07-19  6:59 ` [PATCH v2 1/3] powerpc/powernv: handle the platform error reboot in ppc_md.restart Nicholas Piggin
2017-07-19  7:16   ` Nicholas Piggin
2017-07-20  5:39   ` Mahesh Jagannath Salgaonkar
2017-08-31 11:36   ` [v2, " Michael Ellerman
2017-07-19  6:59 ` [PATCH v2 2/3] powerpc/powernv: machine check use kernel crash path Nicholas Piggin
2017-07-20  7:14   ` Mahesh Jagannath Salgaonkar
2017-07-19  6:59 ` [PATCH v2 3/3] powerpc: machine check interrupt is a non-maskable interrupt Nicholas Piggin
2018-10-08 15:39   ` Christophe LEROY
2018-10-09  4:32     ` Nicholas Piggin
2018-10-09  4:46       ` Christophe LEROY
2018-10-09  5:30         ` Nicholas Piggin
2018-10-09  9:36           ` Christophe Leroy
2018-10-09 11:16             ` Nicholas Piggin
2018-10-09 12:01               ` Christophe LEROY
2018-10-09 12:14                 ` Nicholas Piggin
2018-10-11 14:23                   ` Christophe LEROY
2018-10-11 14:31               ` Christophe LEROY [this message]
2018-10-13  8:29                 ` Christophe Leroy
2018-10-13  8:48                   ` Nicholas Piggin
2018-10-13  8:56                     ` Christophe LEROY

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9f0cbf48-d278-08bf-cb32-8b9608768025@c-s.fr \
    --to=christophe.leroy@c-s.fr \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mahesh@linux.vnet.ibm.com \
    --cc=npiggin@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.