linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: Marc Zyngier <maz@kernel.org>
To: He Ying <heying24@huawei.com>
Cc: <tglx@linutronix.de>, <julien.thierry.kdev@gmail.com>,
	<catalin.marinas@arm.com>, <linux-kernel@vger.kernel.org>,
	<linux-arm-kernel@lists.infradead.org>,
	Mark Rutland <mark.rutland@arm.com>
Subject: Re: [RFC PATCH] irqchip/gic-v3: Do not enable irqs when handling spurious interrups
Date: Fri, 23 Apr 2021 09:08:13 +0100	[thread overview]
Message-ID: <87tunxo236.wl-maz@kernel.org> (raw)
In-Reply-To: <47abc8a6-0f73-d1c0-789f-e979d4191ab2@huawei.com>

On Fri, 23 Apr 2021 04:29:44 +0100,
He Ying <heying24@huawei.com> wrote:

[...]

> >>>> I look into this issue and find that it's caused by 'BUG_ON(in_nmi())'
> >>>> in nmi_enter(). From the call trace, we find two 'el1_irqs' which
> >>>> means an interrupt preempts the other one and the new one is an NMI.
> >>>> Furthermore, by adding some prints, we find the first irq also calls
> >>>> nmi_enter(), but its priority is not GICD_INT_NMI_PRI and its irq number
> >>>> is 1023. It enables irq by calling gic_arch_enable_irqs() in
> >>>> gic_handle_irq(). At this moment, the second irq preempts the first irq
> >>>> and it's an NMI but current context is already in nmi. So that may be
> >>>> the problem.
> >>> I'm not sure I get it. From the stack trace, I see this:
> >>> 
> >>> [   14.816251]  asm_nmi_enter+0x94/0x98
> >>> [   14.816251]  el1_irq+0x8c/0x180			(C)
> >>> [   14.816252]  gic_handle_irq+0xbc/0x2e4
> >>> [   14.816252]  el1_irq+0xcc/0x180			(B)
> >>> [   14.816253]  arch_timer_handler_virt+0x38/0x58
> >>> [   14.816253]  handle_percpu_devid_irq+0x90/0x240
> >>> [   14.816253]  generic_handle_irq+0x34/0x50
> >>> [   14.816254]  __handle_domain_irq+0x68/0xc0
> >>> [   14.816254]  gic_handle_irq+0xf8/0x2e4
> >>> [   14.816255]  el1_irq+0xcc/0x180			(A)
> >>> 
> >>> which indicates that we preempted a timer interrupt (A) with another
> >>> IRQ (B), itself immediately preempted by another IRQ (C)? That's
> >>> indeed at least one too many.
> >>> 
> >>> Can you please describe for each of (A), (B) and (C) whether they are
> >>> spurious or not, what their priorities are if they aren't spurious?
> >> Yes. I ignored interrupt (A). (B) is spurious and its priority is
> >> 0xa0 and PMR is 0x70. (C) is an NMI and its priority is 0x20. Note
> >> that GIC_PRIO_IRQON is 0xe0, GIC_PRIO_IRQOFF is 0x60,
> >> GICD_INT_DEF_PRI is 0xa0 and GICD_INT_NMI_PRI is 0x20 in our kernel.
> > If (B) is spurious (aka ICC_IAR1R_EL1 return 1023), then its
> > "priority" doesn't really exist, and I don't really get what you mean
> > by "its priority is 0xa0".  ICC_RPR_EL1 shouldn't change when Ack-ing
> > a spurious interrupt, because there is no change in GIC state at all.
> OK. By saying "its priority is 0xa0", I just mean ICC_RPR_EL1 is read
> as 0xa0.

Right. That's very different from saying that an interrupt has
priority 0xA0.

> > 
> > And if PMR is 0x70 at the point where you get (B), then I really can't
> > see how you can get an interrupt of priority 0xa0 anyway.
> 
> Yes, it also confuses me. Perhaps ICC_RPR_EL1 changes when read, I
> think.

No, it just means that we were already in an interrupt context when we
handled the spurious interrupt: I think interrupt (A) above was being
handled with priority 0xa0, preempted by a spurious interrupt (B) which
did the wrong thing by messing with the NMI state.

Probably (B) was an NMI that got retired early and presented again to
the cpuif as (C). If that's the case, the GIC implementation may be a
bit flaky. But the driver definitely has a bug.

[...]

> > I believe the above patch would fix the spurious interrupt issue you
> > have experienced. Please let me know, and post a v2 if this works for
> > you.
> 
> Fortunately, it works. I'll post a v2. Should I cc stable mail list?

Yes please.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

      reply	other threads:[~2021-04-23  8:10 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-16  6:22 [RFC PATCH] irqchip/gic-v3: Do not enable irqs when handling spurious interrups He Ying
2021-04-16 14:15 ` Marc Zyngier
2021-04-17  2:01   ` He Ying
2021-04-21  9:13     ` He Ying
2021-04-22 12:27     ` Marc Zyngier
2021-04-23  3:29       ` He Ying
2021-04-23  8:08         ` Marc Zyngier [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87tunxo236.wl-maz@kernel.org \
    --to=maz@kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=heying24@huawei.com \
    --cc=julien.thierry.kdev@gmail.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).