linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* common_interrupt: No irq handler for vector
@ 2020-12-11 20:41 Shuah Khan
  2020-12-12 19:33 ` Thomas Gleixner
  0 siblings, 1 reply; 8+ messages in thread
From: Shuah Khan @ 2020-12-11 20:41 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin
  Cc: x86, Linux Kernel Mailing List, Shuah Khan

I am debugging __common_interrupt: 1.55 No irq handler for vector
messages and noticed comments and code don't agree:

arch/x86/kernel/apic/msi.c: msi_set_affinity() says:


  * If the vector is in use then the installed device handler will
  * denote it as spurious which is no harm as this is a rare event
  * and interrupt handlers have to cope with spurious interrupts
  * anyway. If the vector is unused, then it is marked so it won't
  * trigger the 'No irq handler for vector' warning in
  * common_interrupt().

common_interrupt() prints message if vector is unused: VECTOR_UNUSED

ack_APIC_irq();

if (desc == VECTOR_UNUSED) {
     pr_emerg_ratelimited("%s: %d.%u No irq handler for vector\n",
                           __func__, smp_processor_id(), vector);
}

Something wrong here?

thanks,
-- Shuah



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: common_interrupt: No irq handler for vector
  2020-12-11 20:41 common_interrupt: No irq handler for vector Shuah Khan
@ 2020-12-12 19:33 ` Thomas Gleixner
  2020-12-14 16:11   ` Shuah Khan
  0 siblings, 1 reply; 8+ messages in thread
From: Thomas Gleixner @ 2020-12-12 19:33 UTC (permalink / raw)
  To: Shuah Khan, Ingo Molnar, Borislav Petkov, H. Peter Anvin
  Cc: x86, Linux Kernel Mailing List, Shuah Khan

On Fri, Dec 11 2020 at 13:41, Shuah Khan wrote:

> I am debugging __common_interrupt: 1.55 No irq handler for vector
> messages and noticed comments and code don't agree:

I bet that's on an AMD system with broken AGESA BIOS.... Good luck
debugging it :) BIOS updates are on the way so I'm told.

> arch/x86/kernel/apic/msi.c: msi_set_affinity() says:
>
>
>   * If the vector is in use then the installed device handler will
>   * denote it as spurious which is no harm as this is a rare event
>   * and interrupt handlers have to cope with spurious interrupts
>   * anyway. If the vector is unused, then it is marked so it won't
>   * trigger the 'No irq handler for vector' warning in
>   * common_interrupt().
>
> common_interrupt() prints message if vector is unused: VECTOR_UNUSED
>
> ack_APIC_irq();
>
> if (desc == VECTOR_UNUSED) {
>      pr_emerg_ratelimited("%s: %d.%u No irq handler for vector\n",
>                            __func__, smp_processor_id(), vector);
> }
>
> Something wrong here?

No. It's perfectly correct in the MSI code. See further down.

	if (IS_ERR_OR_NULL(this_cpu_read(vector_irq[cfg->vector])))
		this_cpu_write(vector_irq[cfg->vector], VECTOR_RETRIGGERED);

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: common_interrupt: No irq handler for vector
  2020-12-12 19:33 ` Thomas Gleixner
@ 2020-12-14 16:11   ` Shuah Khan
  2020-12-14 20:41     ` Thomas Gleixner
  0 siblings, 1 reply; 8+ messages in thread
From: Shuah Khan @ 2020-12-14 16:11 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Greg Kroah-Hartman, H. Peter Anvin
  Cc: x86, Linux Kernel Mailing List, Shuah Khan

On 12/12/20 12:33 PM, Thomas Gleixner wrote:
> On Fri, Dec 11 2020 at 13:41, Shuah Khan wrote:
> 
>> I am debugging __common_interrupt: 1.55 No irq handler for vector
>> messages and noticed comments and code don't agree:
> 
> I bet that's on an AMD system with broken AGESA BIOS.... Good luck
> debugging it :) BIOS updates are on the way so I'm told.
> 

Interesting. The behavior I am seeing doesn't seem to be consistent
with BIOS problem. I don't see these messages on 5.10-rc7. I started
seeing them on stable releases. It started right around 5.9.9 and
not present on 5.9.7.

I am bisecting to isolate. Same issue on all stables 5.4, 4.19 and
so on. If it is BIOS problem I would expect to see it on 5.10-rc7
and wouldn't have expected to start seeing it 5.9.9.

+ add Greg since I am talking about stable releases.

>> arch/x86/kernel/apic/msi.c: msi_set_affinity() says:
>>
>>
>>    * If the vector is in use then the installed device handler will
>>    * denote it as spurious which is no harm as this is a rare event
>>    * and interrupt handlers have to cope with spurious interrupts
>>    * anyway. If the vector is unused, then it is marked so it won't
>>    * trigger the 'No irq handler for vector' warning in
>>    * common_interrupt().
>>
>> common_interrupt() prints message if vector is unused: VECTOR_UNUSED
>>
>> ack_APIC_irq();
>>
>> if (desc == VECTOR_UNUSED) {
>>       pr_emerg_ratelimited("%s: %d.%u No irq handler for vector\n",
>>                             __func__, smp_processor_id(), vector);
>> }
>>
>> Something wrong here?
> 
> No. It's perfectly correct in the MSI code. See further down.
> 
> 	if (IS_ERR_OR_NULL(this_cpu_read(vector_irq[cfg->vector])))
> 		this_cpu_write(vector_irq[cfg->vector], VECTOR_RETRIGGERED);
> 

I am asking about inconsistent comments and the actual message as the
comment implies if vector is VECTOR_UNUSED state, this message won't
be triggered in common_interrupt. Based on that my read is the comment
might be wrong if the code is correct as you are saying.

thanks,
-- Shuah

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: common_interrupt: No irq handler for vector
  2020-12-14 16:11   ` Shuah Khan
@ 2020-12-14 20:41     ` Thomas Gleixner
  2020-12-14 20:50       ` Thomas Gleixner
  2020-12-14 20:57       ` Shuah Khan
  0 siblings, 2 replies; 8+ messages in thread
From: Thomas Gleixner @ 2020-12-14 20:41 UTC (permalink / raw)
  To: Shuah Khan, Ingo Molnar, Borislav Petkov, Greg Kroah-Hartman,
	H. Peter Anvin
  Cc: x86, Linux Kernel Mailing List, Shuah Khan

On Mon, Dec 14 2020 at 09:11, Shuah Khan wrote:
> On 12/12/20 12:33 PM, Thomas Gleixner wrote:
>> On Fri, Dec 11 2020 at 13:41, Shuah Khan wrote:
>> 
>>> I am debugging __common_interrupt: 1.55 No irq handler for vector
>>> messages and noticed comments and code don't agree:
>> 
>> I bet that's on an AMD system with broken AGESA BIOS.... Good luck
>> debugging it :) BIOS updates are on the way so I'm told.
>> 
> Interesting. The behavior I am seeing doesn't seem to be consistent
> with BIOS problem. I don't see these messages on 5.10-rc7. I started
> seeing them on stable releases. It started right around 5.9.9 and
> not present on 5.9.7.

What kind of machine?

> I am bisecting to isolate. Same issue on all stables 5.4, 4.19 and
> so on. If it is BIOS problem I would expect to see it on 5.10-rc7
> and wouldn't have expected to start seeing it 5.9.9.

Can you provide some more details, e.g. dmesg please?

>> No. It's perfectly correct in the MSI code. See further down.
>> 
>> 	if (IS_ERR_OR_NULL(this_cpu_read(vector_irq[cfg->vector])))
>> 		this_cpu_write(vector_irq[cfg->vector], VECTOR_RETRIGGERED);
>> 
>
> I am asking about inconsistent comments and the actual message as the
> comment implies if vector is VECTOR_UNUSED state, this message won't
> be triggered in common_interrupt. Based on that my read is the comment
> might be wrong if the code is correct as you are saying.

The comment says:

  >>    * anyway. If the vector is unused, then it is marked so it won't
  >>    * trigger the 'No irq handler for vector' warning in
  >>    * common_interrupt().

  If the vector is unused, then it is _marked_ so ....

It perhaps should explicitely say 'is marked as VECTOR_RETRIGGERED' to make
it clear.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: common_interrupt: No irq handler for vector
  2020-12-14 20:41     ` Thomas Gleixner
@ 2020-12-14 20:50       ` Thomas Gleixner
  2020-12-14 20:57       ` Shuah Khan
  1 sibling, 0 replies; 8+ messages in thread
From: Thomas Gleixner @ 2020-12-14 20:50 UTC (permalink / raw)
  To: Shuah Khan, Ingo Molnar, Borislav Petkov, Greg Kroah-Hartman,
	H. Peter Anvin
  Cc: x86, Linux Kernel Mailing List, Shuah Khan

On Mon, Dec 14 2020 at 21:41, Thomas Gleixner wrote:
> On Mon, Dec 14 2020 at 09:11, Shuah Khan wrote:
>> On 12/12/20 12:33 PM, Thomas Gleixner wrote:
>>> On Fri, Dec 11 2020 at 13:41, Shuah Khan wrote:
>>> 
>>>> I am debugging __common_interrupt: 1.55 No irq handler for vector
>>>> messages and noticed comments and code don't agree:
>>> 
>>> I bet that's on an AMD system with broken AGESA BIOS.... Good luck
>>> debugging it :) BIOS updates are on the way so I'm told.
>>> 
>> Interesting. The behavior I am seeing doesn't seem to be consistent
>> with BIOS problem. I don't see these messages on 5.10-rc7. I started
>> seeing them on stable releases. It started right around 5.9.9 and
>> not present on 5.9.7.
>
> What kind of machine?
>
>> I am bisecting to isolate. Same issue on all stables 5.4, 4.19 and
>> so on. If it is BIOS problem I would expect to see it on 5.10-rc7
>> and wouldn't have expected to start seeing it 5.9.9.
>
> Can you provide some more details, e.g. dmesg please?
>
>>> No. It's perfectly correct in the MSI code. See further down.
>>> 
>>> 	if (IS_ERR_OR_NULL(this_cpu_read(vector_irq[cfg->vector])))
>>> 		this_cpu_write(vector_irq[cfg->vector], VECTOR_RETRIGGERED);
>>> 
>>
>> I am asking about inconsistent comments and the actual message as the
>> comment implies if vector is VECTOR_UNUSED state, this message won't
>> be triggered in common_interrupt. Based on that my read is the comment
>> might be wrong if the code is correct as you are saying.
>
> The comment says:
>
>   >>    * anyway. If the vector is unused, then it is marked so it won't
>   >>    * trigger the 'No irq handler for vector' warning in
>   >>    * common_interrupt().
>
>   If the vector is unused, then it is _marked_ so ....
>
> It perhaps should explicitely say 'is marked as VECTOR_RETRIGGERED' to make
> it clear.

And it's only marked for this particular case to prevent the message
from being shown. Because the insanities we need to do to migrate
unmaskable (*sigh*) MSI interrupts can trigger that warning which would
be just wrong and confusing. You warning is _not_ coming from a broken
MSI migration attempt, believe me.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: common_interrupt: No irq handler for vector
  2020-12-14 20:41     ` Thomas Gleixner
  2020-12-14 20:50       ` Thomas Gleixner
@ 2020-12-14 20:57       ` Shuah Khan
  2020-12-14 22:28         ` Thomas Gleixner
  1 sibling, 1 reply; 8+ messages in thread
From: Shuah Khan @ 2020-12-14 20:57 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Greg Kroah-Hartman, H. Peter Anvin
  Cc: x86, Linux Kernel Mailing List, Shuah Khan

On 12/14/20 1:41 PM, Thomas Gleixner wrote:
> On Mon, Dec 14 2020 at 09:11, Shuah Khan wrote:
>> On 12/12/20 12:33 PM, Thomas Gleixner wrote:
>>> On Fri, Dec 11 2020 at 13:41, Shuah Khan wrote:
>>>
>>>> I am debugging __common_interrupt: 1.55 No irq handler for vector
>>>> messages and noticed comments and code don't agree:
>>>
>>> I bet that's on an AMD system with broken AGESA BIOS.... Good luck
>>> debugging it :) BIOS updates are on the way so I'm told.
>>>
>> Interesting. The behavior I am seeing doesn't seem to be consistent
>> with BIOS problem. I don't see these messages on 5.10-rc7. I started
>> seeing them on stable releases. It started right around 5.9.9 and
>> not present on 5.9.7.
> 
> What kind of machine?

Here is the processor and BIOS info:
AMD Ryzen 7 4700G with Radeon Graphics
LENOVO ThinkCentre Embedded Controller -[O4ZCT12A-1.12]-
LENOVO ThinkCentre BIOS Boot Block Revision 1.1C

> 
>> I am bisecting to isolate. Same issue on all stables 5.4, 4.19 and
>> so on. If it is BIOS problem I would expect to see it on 5.10-rc7
>> and wouldn't have expected to start seeing it 5.9.9.
> 
> Can you provide some more details, e.g. dmesg please?
> 

__common_interrupt: 1.55 No irq handler for vector
__common_interrupt: 2.55 No irq handler for vector
__common_interrupt: 3.55 No irq handler for vector
__common_interrupt: 4.55 No irq handler for vector
__common_interrupt: 5.55 No irq handler for vector
__common_interrupt: 6.55 No irq handler for vector
__common_interrupt: 7.55 No irq handler for vector
__common_interrupt: 8.55 No irq handler for vector
__common_interrupt: 9.55 No irq handler for vector
__common_interrupt: 10.55 No irq handler for vector

>>> No. It's perfectly correct in the MSI code. See further down.
>>>
>>> 	if (IS_ERR_OR_NULL(this_cpu_read(vector_irq[cfg->vector])))
>>> 		this_cpu_write(vector_irq[cfg->vector], VECTOR_RETRIGGERED);
>>>
>>
>> I am asking about inconsistent comments and the actual message as the
>> comment implies if vector is VECTOR_UNUSED state, this message won't
>> be triggered in common_interrupt. Based on that my read is the comment
>> might be wrong if the code is correct as you are saying.
> 
> The comment says:
> 
>    >>    * anyway. If the vector is unused, then it is marked so it won't
>    >>    * trigger the 'No irq handler for vector' warning in
>    >>    * common_interrupt().
> 
>    If the vector is unused, then it is _marked_ so ....

See the messages above.

> 
> It perhaps should explicitely say 'is marked as VECTOR_RETRIGGERED' to make
> it clear.
> 

Possibly. I am running bisect starting at v5.9.7 (good) and compare with
v5.9.13 and see why this problems started showing up.

thanks,
-- Shuah


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: common_interrupt: No irq handler for vector
  2020-12-14 20:57       ` Shuah Khan
@ 2020-12-14 22:28         ` Thomas Gleixner
  2020-12-14 22:41           ` Shuah Khan
  0 siblings, 1 reply; 8+ messages in thread
From: Thomas Gleixner @ 2020-12-14 22:28 UTC (permalink / raw)
  To: Shuah Khan, Ingo Molnar, Borislav Petkov, Greg Kroah-Hartman,
	H. Peter Anvin
  Cc: x86, Linux Kernel Mailing List, Shuah Khan

Shuah,

On Mon, Dec 14 2020 at 13:57, Shuah Khan wrote:
> On 12/14/20 1:41 PM, Thomas Gleixner wrote:
> Here is the processor and BIOS info:
> AMD Ryzen 7 4700G with Radeon Graphics
> LENOVO ThinkCentre Embedded Controller -[O4ZCT12A-1.12]-
> LENOVO ThinkCentre BIOS Boot Block Revision 1.1C
>
>> 
>>> I am bisecting to isolate. Same issue on all stables 5.4, 4.19 and
>>> so on. If it is BIOS problem I would expect to see it on 5.10-rc7
>>> and wouldn't have expected to start seeing it 5.9.9.
>> 
>> Can you provide some more details, e.g. dmesg please?
>> 
>
> __common_interrupt: 1.55 No irq handler for vector
> __common_interrupt: 2.55 No irq handler for vector
> __common_interrupt: 3.55 No irq handler for vector
> __common_interrupt: 4.55 No irq handler for vector
> __common_interrupt: 5.55 No irq handler for vector
> __common_interrupt: 6.55 No irq handler for vector
> __common_interrupt: 7.55 No irq handler for vector
> __common_interrupt: 8.55 No irq handler for vector
> __common_interrupt: 9.55 No irq handler for vector
> __common_interrupt: 10.55 No irq handler for vector

This _IS_ the AGESA BIOS bug.

>>>> No. It's perfectly correct in the MSI code. See further down.
>>>>
>>>> 	if (IS_ERR_OR_NULL(this_cpu_read(vector_irq[cfg->vector])))
>>>> 		this_cpu_write(vector_irq[cfg->vector], VECTOR_RETRIGGERED);
>>>>
>>>
>>> I am asking about inconsistent comments and the actual message as the
>>> comment implies if vector is VECTOR_UNUSED state, this message won't
>>> be triggered in common_interrupt. Based on that my read is the comment
>>> might be wrong if the code is correct as you are saying.
>> 
>> The comment says:
>> 
>>    >>    * anyway. If the vector is unused, then it is marked so it won't
>>    >>    * trigger the 'No irq handler for vector' warning in
>>    >>    * common_interrupt().
>> 
>>    If the vector is unused, then it is _marked_ so ....
>
> See the messages above.

This code has absolutely nothing to do with these messages and this code
marks the vector RETRIGGERED so the warning cannot happen if the MSI
migration causes this spurious vector to be emitted. That marking is
there _because_ the migration triggered the warning occasionally which
is unavoidable due the silliness of hardware.

The problem is that the buggy BIOS causes vector 55 which is the legacy
X86 interrupt 7 to be sent to the secondary CPUs 1-10 when they come up
the first time during boot. This has been reported to death already and
AMD confirmed that it is an AGESA BIOS bug and that it is fixed with
AGESA BIOS version 1.1.8.0.

The reason why it shows up now might be timing related, nothing else.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: common_interrupt: No irq handler for vector
  2020-12-14 22:28         ` Thomas Gleixner
@ 2020-12-14 22:41           ` Shuah Khan
  0 siblings, 0 replies; 8+ messages in thread
From: Shuah Khan @ 2020-12-14 22:41 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Greg Kroah-Hartman, H. Peter Anvin, Shuah Khan
  Cc: x86, Linux Kernel Mailing List

On 12/14/20 3:28 PM, Thomas Gleixner wrote:
> Shuah,
> 
> On Mon, Dec 14 2020 at 13:57, Shuah Khan wrote:
>> On 12/14/20 1:41 PM, Thomas Gleixner wrote:
>> Here is the processor and BIOS info:
>> AMD Ryzen 7 4700G with Radeon Graphics
>> LENOVO ThinkCentre Embedded Controller -[O4ZCT12A-1.12]-
>> LENOVO ThinkCentre BIOS Boot Block Revision 1.1C
>>
>>>
>>>> I am bisecting to isolate. Same issue on all stables 5.4, 4.19 and
>>>> so on. If it is BIOS problem I would expect to see it on 5.10-rc7
>>>> and wouldn't have expected to start seeing it 5.9.9.
>>>
>>> Can you provide some more details, e.g. dmesg please?
>>>
>>
>> __common_interrupt: 1.55 No irq handler for vector
>> __common_interrupt: 2.55 No irq handler for vector
>> __common_interrupt: 3.55 No irq handler for vector
>> __common_interrupt: 4.55 No irq handler for vector
>> __common_interrupt: 5.55 No irq handler for vector
>> __common_interrupt: 6.55 No irq handler for vector
>> __common_interrupt: 7.55 No irq handler for vector
>> __common_interrupt: 8.55 No irq handler for vector
>> __common_interrupt: 9.55 No irq handler for vector
>> __common_interrupt: 10.55 No irq handler for vector
> 
> This _IS_ the AGESA BIOS bug.
> 
>>>>> No. It's perfectly correct in the MSI code. See further down.
>>>>>
>>>>> 	if (IS_ERR_OR_NULL(this_cpu_read(vector_irq[cfg->vector])))
>>>>> 		this_cpu_write(vector_irq[cfg->vector], VECTOR_RETRIGGERED);
>>>>>
>>>>
>>>> I am asking about inconsistent comments and the actual message as the
>>>> comment implies if vector is VECTOR_UNUSED state, this message won't
>>>> be triggered in common_interrupt. Based on that my read is the comment
>>>> might be wrong if the code is correct as you are saying.
>>>
>>> The comment says:
>>>
>>>     >>    * anyway. If the vector is unused, then it is marked so it won't
>>>     >>    * trigger the 'No irq handler for vector' warning in
>>>     >>    * common_interrupt().
>>>
>>>     If the vector is unused, then it is _marked_ so ....
>>
>> See the messages above.
> 
> This code has absolutely nothing to do with these messages and this code
> marks the vector RETRIGGERED so the warning cannot happen if the MSI
> migration causes this spurious vector to be emitted. That marking is
> there _because_ the migration triggered the warning occasionally which
> is unavoidable due the silliness of hardware.
> 
> The problem is that the buggy BIOS causes vector 55 which is the legacy
> X86 interrupt 7 to be sent to the secondary CPUs 1-10 when they come up
> the first time during boot. This has been reported to death already and
> AMD confirmed that it is an AGESA BIOS bug and that it is fixed with
> AGESA BIOS version 1.1.8.0.
> 
> The reason why it shows up now might be timing related, nothing else.
> 

Thank you for confirming. I will save myself the bisect time and look
for BIOS update.

thanks,
-- Shuah


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-12-14 22:42 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-11 20:41 common_interrupt: No irq handler for vector Shuah Khan
2020-12-12 19:33 ` Thomas Gleixner
2020-12-14 16:11   ` Shuah Khan
2020-12-14 20:41     ` Thomas Gleixner
2020-12-14 20:50       ` Thomas Gleixner
2020-12-14 20:57       ` Shuah Khan
2020-12-14 22:28         ` Thomas Gleixner
2020-12-14 22:41           ` Shuah Khan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).