linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Enrico Weigelt, metux IT consult" <info@metux.net>
To: Thomas Gleixner <tglx@linutronix.de>,
	Michael Ellerman <mpe@ellerman.id.au>,
	"Enrico Weigelt, metux IT consult" <info@metux.net>,
	linux-kernel@vger.kernel.org
Cc: James.Bottomley@HansenPartnership.com, deller@gmx.de,
	benh@kernel.crashing.org, paulus@samba.org, jdike@addtoit.com,
	richard@nod.at, anton.ivanov@cambridgegreys.com,
	mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com,
	linux-parisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
	linux-s390@vger.kernel.org, linux-um@lists.infradead.org
Subject: Re: [PATCH] arch: fix 'unexpected IRQ trap at vector' warnings
Date: Wed, 16 Dec 2020 17:42:18 +0100	[thread overview]
Message-ID: <7cee04bd-b962-c6cd-19d7-b1f63926f570@metux.net> (raw)
In-Reply-To: <87a6ueu3af.fsf@nanos.tec.linutronix.de>

On 15.12.20 23:12, Thomas Gleixner wrote:
> On Tue, Dec 15 2020 at 21:12, Enrico Weigelt wrote:
>> On 09.12.20 00:01, Thomas Gleixner wrote:
>>>   3) It's invoked from __handle_domain_irq() when the 'hwirq' which is
>>>      handed in by the caller does not resolve to a mapped Linux
>>>      interrupt which is pretty much the same as the x86 situation above
>>>      in #1, but it prints useless data.
>>>
>>>      It prints 'irq' which is invalid but it does not print the really
>>>      interesting 'hwirq' which was handed in by the caller and did
>>>      not resolve.
>>
>> I wouldn't say the irq-nr isn't interesting. In my particular case it
>> was quite what I've been looking for. But you're right, hwirq should
>> also be printed.
> 
> The number is _not_ interesting in this case. It's useless because the
> function does:

Oh, I've mixed up the cases - I only had the other one, down below.

>     irq = hwirq;
> 
>     if (lookup)
>         irq = find_mapping(hwirq);
> 
>     if (!irq || irq >= nr_irqs)
>        -> BAD

When exactly can that happen ? Only when some hardware sending an IRQ,
but no driver listening to it, or are there other cases ?

By the way: who's supposed to call that function ? Only irqchip's
(and the few soc specific 1st-level irq handlers) ? I'm asking, because
we have lots of gpio drivers, which have their own irq domain, but go
the generic_handle_irq() route. Same for some SOC-specific irqchips.

Should they also call handle_domain_irq() instead ?

> In both cases the only interesting information is that hwirq does not
> resolve to a valid Linux interrupt number and which hwirq number caused
> that.

Don't we also need know which irqchip the hwirq number belongs to ?

> If you look really then you find out that there is exactly _ONE_
> architecture which does anything else than incrementing a counter and/or
> printing stuff: X86, which has a big fat comment explaining why. The
> only way to ack an interrupt on X86 is to issue EOI on the local APIC,
> i.e. it does _not_ need any further information.

Yeah, found it :)

At this point I wonder whether the ack_APIC_irq() call could be done
somewhere further up in the call chain, eg. handle_irq() or
common_interrupt() ?

If that works, we IMHO could drop ack_bad_irq() completely (except for
the counter and printk, which we could consolidate elsewhere anyways)

>> ... rethinking this further ... shouldn't we also pass in even more data
>> (eg. irq_desc, irqchip, ...), so this function can check which hw to
>> actually talk to ?
> 
> There are 3 ways to get there:
> 
>       1) via dummy chip which obviously has no hardware associated

... which also calls print_irq_desc() ..

>       2) via handle_bad_irq() which prints the info already

print_irq_desc() doesn't seem to print the hwirq ... shall we fix this ?

>       3) __handle_domain_irq() which cannot print anything and obviously
>          cannot figure out the hw to talk to because there is no irq
>          descriptor associated.

Okay, what's the conclusion ? Drop printouts in the ack_bad_irq()'s ?

>>>   4) It's invoked from the dummy irq chip which is installed for a
>>>      couple of truly virtual interrupts where the invocation of
>>>      dummy_irq_chip::irq_ack() is indicating wreckage.
>>>
>>>      In that case the Linux irq number is the thing which is printed.
>>>
>>> So no. It's not just inconsistent it's in some places outright
>>> wrong. What we really want is:
>>>
>>> ack_bad_irq(int hwirq, int virq)
>>
>> is 'int' correct here ?
> 
> This was just for illustration.

Okay, thanks. Just discovered already have an irq_hw_number_t, which
doesn't seem to be used everywhere ... shall we fix that ?

>> OTOH: since both callers (dummychip.c, handle.c) already dump out before
>> ack_bad_irq(), do we need to print out anything at all ?
> 
> Not all callers print something, but yes this could do with some general
> cleanup.

I've found three callers, only one (__handle_domain_irq() in irqdesc.c)
doesn't print out anything. I belive, adding a pr_warn() here and drop
all the printouts in ack_bad_irq()'s makes sense.

> The error counter is independent of that, but yes there is room for
> consolidation.

Ok, I've already started hacking a bit here: adding an atomic_t counter
in kernel/irq/handle.c and inline'd accessor functions in
include/asm-generic/irq.h (just feeling that accessors are a bit cleaner
than direct access). Would that be okay ?

By the way: I still wonder whether my case should have ever reached
ack_bad_irq().

The irqdescs had been allocated via devm_irq_alloc_descs(), and the
driver just called generic_handle_irq() with base irq + gpio nr.
So, IMHO it was a valid linux irq number, but no (explicit) handler.

I wonder whether ack'ing those virtual irqs onto hw could be harmful.


--mtx

-- 
---
Hinweis: unverschlüsselte E-Mails können leicht abgehört und manipuliert
werden ! Für eine vertrauliche Kommunikation senden Sie bitte ihren
GPG/PGP-Schlüssel zu.
---
Enrico Weigelt, metux IT consult
Free software and Linux embedded engineering
info@metux.net -- +49-151-27565287

      reply	other threads:[~2020-12-16 16:46 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-07 14:31 [PATCH] arch: fix 'unexpected IRQ trap at vector' warnings Enrico Weigelt, metux IT consult
2020-12-08  2:11 ` Michael Ellerman
2020-12-08 14:42   ` Helge Deller
2020-12-08 23:01   ` Thomas Gleixner
2020-12-15 20:12     ` Enrico Weigelt, metux IT consult
2020-12-15 22:12       ` Thomas Gleixner
2020-12-16 16:42         ` Enrico Weigelt, metux IT consult [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7cee04bd-b962-c6cd-19d7-b1f63926f570@metux.net \
    --to=info@metux.net \
    --cc=James.Bottomley@HansenPartnership.com \
    --cc=anton.ivanov@cambridgegreys.com \
    --cc=benh@kernel.crashing.org \
    --cc=bp@alien8.de \
    --cc=deller@gmx.de \
    --cc=hpa@zytor.com \
    --cc=jdike@addtoit.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-parisc@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=linux-um@lists.infradead.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mingo@redhat.com \
    --cc=mpe@ellerman.id.au \
    --cc=paulus@samba.org \
    --cc=richard@nod.at \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).