* False positive "do_IRQ: #.55 No irq handler for vector" messages on AMD ryzen based laptops @ 2019-02-19 15:53 Hans de Goede 2019-02-19 21:01 ` Thomas Gleixner 0 siblings, 1 reply; 16+ messages in thread From: Hans de Goede @ 2019-02-19 15:53 UTC (permalink / raw) To: Thomas Gleixner; +Cc: Linux Kernel Mailing List Hi Thomas, Various people are reporting false positive "do_IRQ: #.55 No irq handler for vector" messages on AMD ryzen based laptops, see e.g.: https://bugzilla.redhat.com/show_bug.cgi?id=1551605 Which contains this dmesg snippet: Feb 07 20:14:29 localhost.localdomain kernel: smp: Bringing up secondary CPUs ... Feb 07 20:14:29 localhost.localdomain kernel: x86: Booting SMP configuration: Feb 07 20:14:29 localhost.localdomain kernel: .... node #0, CPUs: #1 Feb 07 20:14:29 localhost.localdomain kernel: do_IRQ: 1.55 No irq handler for vector Feb 07 20:14:29 localhost.localdomain kernel: #2 Feb 07 20:14:29 localhost.localdomain kernel: do_IRQ: 2.55 No irq handler for vector Feb 07 20:14:29 localhost.localdomain kernel: #3 Feb 07 20:14:29 localhost.localdomain kernel: do_IRQ: 3.55 No irq handler for vector Feb 07 20:14:29 localhost.localdomain kernel: smp: Brought up 1 node, 4 CPUs Feb 07 20:14:29 localhost.localdomain kernel: smpboot: Max logical packages: 1 Feb 07 20:14:29 localhost.localdomain kernel: smpboot: Total of 4 processors activated (15968.49 BogoMIPS) It seems that we get an IRQ for each CPU as we bring it online, which feels to me like it is some sorta false-positive. I temporarily have access to a loaner laptop for a couple of weeks which shows the same errors and I would like to fix this, but I don't really know how to fix this. Note if you want I can set up root ssh-access to the laptop. Regards, Hans ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: False positive "do_IRQ: #.55 No irq handler for vector" messages on AMD ryzen based laptops 2019-02-19 15:53 False positive "do_IRQ: #.55 No irq handler for vector" messages on AMD ryzen based laptops Hans de Goede @ 2019-02-19 21:01 ` Thomas Gleixner 2019-02-19 21:47 ` Lendacky, Thomas 2019-02-21 12:28 ` Hans de Goede 0 siblings, 2 replies; 16+ messages in thread From: Thomas Gleixner @ 2019-02-19 21:01 UTC (permalink / raw) To: Hans de Goede Cc: Linux Kernel Mailing List, Rafael J. Wysocki, Borislav Petkov, Tom Lendacky Hans, On Tue, 19 Feb 2019, Hans de Goede wrote: Cc+: ACPI/AMD folks > Various people are reporting false positive "do_IRQ: #.55 No irq handler for > vector" > messages on AMD ryzen based laptops, see e.g.: > > https://bugzilla.redhat.com/show_bug.cgi?id=1551605 > > Which contains this dmesg snippet: > > Feb 07 20:14:29 localhost.localdomain kernel: smp: Bringing up secondary CPUs > ... > Feb 07 20:14:29 localhost.localdomain kernel: x86: Booting SMP configuration: > Feb 07 20:14:29 localhost.localdomain kernel: .... node #0, CPUs: #1 > Feb 07 20:14:29 localhost.localdomain kernel: do_IRQ: 1.55 No irq handler for > vector > Feb 07 20:14:29 localhost.localdomain kernel: #2 > Feb 07 20:14:29 localhost.localdomain kernel: do_IRQ: 2.55 No irq handler for > vector > Feb 07 20:14:29 localhost.localdomain kernel: #3 > Feb 07 20:14:29 localhost.localdomain kernel: do_IRQ: 3.55 No irq handler for > vector > Feb 07 20:14:29 localhost.localdomain kernel: smp: Brought up 1 node, 4 CPUs > Feb 07 20:14:29 localhost.localdomain kernel: smpboot: Max logical packages: 1 > Feb 07 20:14:29 localhost.localdomain kernel: smpboot: Total of 4 processors > activated (15968.49 BogoMIPS) > > It seems that we get an IRQ for each CPU as we bring it online, > which feels to me like it is some sorta false-positive. Sigh, that looks like BIOS value add again. It's not a false positive. Something _IS_ sending a vector 55 to these CPUs for whatever reason. > I temporarily have access to a loaner laptop for a couple of weeks which shows > the same errors and I would like to fix this, but I don't really know how to > fix this. Can you please enable CONFIG_GENERIC_IRQ_DEBUGFS and dig in the files there whether vector 55 is used on CPU0 and which device is associated to that. I bet its a legacy IRQ and as that space starts at 48 (IRQ0) this should be IRQ9 which is usually - DRUMROLL - the ACPI interrupt. The kernel clearly sets that up to be delivered to CPU 0 only, but I've seen that before that the BIOS value add thinks that this setup is not relevant. /me goes off and sings LALALA > Note if you want I can set up root ssh-access to the laptop. As a least resort. root ssh - SHUDDER - Ooops now I spilled my preferred password for that :) Thanks, tglx ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: False positive "do_IRQ: #.55 No irq handler for vector" messages on AMD ryzen based laptops 2019-02-19 21:01 ` Thomas Gleixner @ 2019-02-19 21:47 ` Lendacky, Thomas 2019-02-21 12:30 ` Hans de Goede 2019-02-21 12:28 ` Hans de Goede 1 sibling, 1 reply; 16+ messages in thread From: Lendacky, Thomas @ 2019-02-19 21:47 UTC (permalink / raw) To: Thomas Gleixner, Hans de Goede Cc: Linux Kernel Mailing List, Rafael J. Wysocki, Borislav Petkov On 2/19/19 3:01 PM, Thomas Gleixner wrote: > Hans, > > On Tue, 19 Feb 2019, Hans de Goede wrote: > > Cc+: ACPI/AMD folks > >> Various people are reporting false positive "do_IRQ: #.55 No irq handler for >> vector" >> messages on AMD ryzen based laptops, see e.g.: >> >> https://bugzilla.redhat.com/show_bug.cgi?id=1551605 >> >> Which contains this dmesg snippet: >> >> Feb 07 20:14:29 localhost.localdomain kernel: smp: Bringing up secondary CPUs >> ... >> Feb 07 20:14:29 localhost.localdomain kernel: x86: Booting SMP configuration: >> Feb 07 20:14:29 localhost.localdomain kernel: .... node #0, CPUs: #1 >> Feb 07 20:14:29 localhost.localdomain kernel: do_IRQ: 1.55 No irq handler for >> vector >> Feb 07 20:14:29 localhost.localdomain kernel: #2 >> Feb 07 20:14:29 localhost.localdomain kernel: do_IRQ: 2.55 No irq handler for >> vector >> Feb 07 20:14:29 localhost.localdomain kernel: #3 >> Feb 07 20:14:29 localhost.localdomain kernel: do_IRQ: 3.55 No irq handler for >> vector >> Feb 07 20:14:29 localhost.localdomain kernel: smp: Brought up 1 node, 4 CPUs >> Feb 07 20:14:29 localhost.localdomain kernel: smpboot: Max logical packages: 1 >> Feb 07 20:14:29 localhost.localdomain kernel: smpboot: Total of 4 processors >> activated (15968.49 BogoMIPS) >> >> It seems that we get an IRQ for each CPU as we bring it online, >> which feels to me like it is some sorta false-positive. > > Sigh, that looks like BIOS value add again. > > It's not a false positive. Something _IS_ sending a vector 55 to these CPUs > for whatever reason. > I remember seeing something like this in the past and it turned out to be a BIOS issue. BIOS was enabling the APs to interact with the legacy 8259 interrupt controller when only the BSP should. During POST the APs were exposed to ExtINT/INTR events as a result of the mis-configuration (probably due to a UEFI timer-tick using the 8259) and this left a pending ExtINT/INTR interrupt latched on the APs. When the APs were started by the OS, the latched ExtINT/INTR interrupt is processed shortly after the OS enables interrupts. The AP then queries the 8259 to identify the vector number (which is the value of the 8259's ICW2 register + the IRQ level). The master 8259's ICW2 was set to 0x30 and, since no interrupts are actually pending, the 8259 will respond with IRQ7 (spurious interrupt) yielding a vector of 0x37 or 55. The OS was not expecting vector 55 and printed the message. From the Intel Developer's Manual: Vol 3a, Section 10.5.1: "Only one processor in the system should have an LVT entry configured to use the ExtINT delivery mode." Not saying this is the problem, but very well could be. Thanks, Tom >> I temporarily have access to a loaner laptop for a couple of weeks which shows >> the same errors and I would like to fix this, but I don't really know how to >> fix this. > > Can you please enable CONFIG_GENERIC_IRQ_DEBUGFS and dig in the files there > whether vector 55 is used on CPU0 and which device is associated to that. > > I bet its a legacy IRQ and as that space starts at 48 (IRQ0) this should be > IRQ9 which is usually - DRUMROLL - the ACPI interrupt. > > The kernel clearly sets that up to be delivered to CPU 0 only, but I've > seen that before that the BIOS value add thinks that this setup is not > relevant. > > /me goes off and sings LALALA > >> Note if you want I can set up root ssh-access to the laptop. > > As a least resort. root ssh - SHUDDER - Ooops now I spilled my preferred > password for that :) > > Thanks, > > tglx > ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: False positive "do_IRQ: #.55 No irq handler for vector" messages on AMD ryzen based laptops 2019-02-19 21:47 ` Lendacky, Thomas @ 2019-02-21 12:30 ` Hans de Goede 2019-03-03 10:57 ` Hans de Goede 0 siblings, 1 reply; 16+ messages in thread From: Hans de Goede @ 2019-02-21 12:30 UTC (permalink / raw) To: Lendacky, Thomas, Thomas Gleixner Cc: Linux Kernel Mailing List, Rafael J. Wysocki, Borislav Petkov Hi, On 19-02-19 22:47, Lendacky, Thomas wrote: > On 2/19/19 3:01 PM, Thomas Gleixner wrote: >> Hans, >> >> On Tue, 19 Feb 2019, Hans de Goede wrote: >> >> Cc+: ACPI/AMD folks >> >>> Various people are reporting false positive "do_IRQ: #.55 No irq handler for >>> vector" >>> messages on AMD ryzen based laptops, see e.g.: >>> >>> https://bugzilla.redhat.com/show_bug.cgi?id=1551605 >>> >>> Which contains this dmesg snippet: >>> >>> Feb 07 20:14:29 localhost.localdomain kernel: smp: Bringing up secondary CPUs >>> ... >>> Feb 07 20:14:29 localhost.localdomain kernel: x86: Booting SMP configuration: >>> Feb 07 20:14:29 localhost.localdomain kernel: .... node #0, CPUs: #1 >>> Feb 07 20:14:29 localhost.localdomain kernel: do_IRQ: 1.55 No irq handler for >>> vector >>> Feb 07 20:14:29 localhost.localdomain kernel: #2 >>> Feb 07 20:14:29 localhost.localdomain kernel: do_IRQ: 2.55 No irq handler for >>> vector >>> Feb 07 20:14:29 localhost.localdomain kernel: #3 >>> Feb 07 20:14:29 localhost.localdomain kernel: do_IRQ: 3.55 No irq handler for >>> vector >>> Feb 07 20:14:29 localhost.localdomain kernel: smp: Brought up 1 node, 4 CPUs >>> Feb 07 20:14:29 localhost.localdomain kernel: smpboot: Max logical packages: 1 >>> Feb 07 20:14:29 localhost.localdomain kernel: smpboot: Total of 4 processors >>> activated (15968.49 BogoMIPS) >>> >>> It seems that we get an IRQ for each CPU as we bring it online, >>> which feels to me like it is some sorta false-positive. >> >> Sigh, that looks like BIOS value add again. >> >> It's not a false positive. Something _IS_ sending a vector 55 to these CPUs >> for whatever reason. >> > > I remember seeing something like this in the past and it turned out to be > a BIOS issue. BIOS was enabling the APs to interact with the legacy 8259 > interrupt controller when only the BSP should. During POST the APs were > exposed to ExtINT/INTR events as a result of the mis-configuration > (probably due to a UEFI timer-tick using the 8259) and this left a pending > ExtINT/INTR interrupt latched on the APs. > > When the APs were started by the OS, the latched ExtINT/INTR interrupt is > processed shortly after the OS enables interrupts. The AP then queries the > 8259 to identify the vector number (which is the value of the 8259's ICW2 > register + the IRQ level). The master 8259's ICW2 was set to 0x30 and, > since no interrupts are actually pending, the 8259 will respond with IRQ7 > (spurious interrupt) yielding a vector of 0x37 or 55. > > The OS was not expecting vector 55 and printed the message. > > From the Intel Developer's Manual: Vol 3a, Section 10.5.1: > "Only one processor in the system should have an LVT entry configured to > use the ExtINT delivery mode." > > Not saying this is the problem, but very well could be. That sounds like a likely candidate, esp. also since this only happens once per CPU when we first only the CPU. Can you provide me with a patch with some printk-s / pr_debugs to test for this, then I can build a kernel with that patch added and we can see if your hypothesis is right. Regards, Hans > > Thanks, > Tom > >>> I temporarily have access to a loaner laptop for a couple of weeks which shows >>> the same errors and I would like to fix this, but I don't really know how to >>> fix this. >> >> Can you please enable CONFIG_GENERIC_IRQ_DEBUGFS and dig in the files there >> whether vector 55 is used on CPU0 and which device is associated to that. >> >> I bet its a legacy IRQ and as that space starts at 48 (IRQ0) this should be >> IRQ9 which is usually - DRUMROLL - the ACPI interrupt. >> >> The kernel clearly sets that up to be delivered to CPU 0 only, but I've >> seen that before that the BIOS value add thinks that this setup is not >> relevant. >> >> /me goes off and sings LALALA >> >>> Note if you want I can set up root ssh-access to the laptop. >> >> As a least resort. root ssh - SHUDDER - Ooops now I spilled my preferred >> password for that :) >> >> Thanks, >> >> tglx >> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: False positive "do_IRQ: #.55 No irq handler for vector" messages on AMD ryzen based laptops 2019-02-21 12:30 ` Hans de Goede @ 2019-03-03 10:57 ` Hans de Goede 2019-03-05 14:06 ` Lendacky, Thomas 0 siblings, 1 reply; 16+ messages in thread From: Hans de Goede @ 2019-03-03 10:57 UTC (permalink / raw) To: Lendacky, Thomas, Thomas Gleixner Cc: Linux Kernel Mailing List, Rafael J. Wysocki, Borislav Petkov Hi, On 21-02-19 13:30, Hans de Goede wrote: > Hi, > > On 19-02-19 22:47, Lendacky, Thomas wrote: >> On 2/19/19 3:01 PM, Thomas Gleixner wrote: >>> Hans, >>> >>> On Tue, 19 Feb 2019, Hans de Goede wrote: >>> >>> Cc+: ACPI/AMD folks >>> >>>> Various people are reporting false positive "do_IRQ: #.55 No irq handler for >>>> vector" >>>> messages on AMD ryzen based laptops, see e.g.: >>>> >>>> https://bugzilla.redhat.com/show_bug.cgi?id=1551605 >>>> >>>> Which contains this dmesg snippet: >>>> >>>> Feb 07 20:14:29 localhost.localdomain kernel: smp: Bringing up secondary CPUs >>>> ... >>>> Feb 07 20:14:29 localhost.localdomain kernel: x86: Booting SMP configuration: >>>> Feb 07 20:14:29 localhost.localdomain kernel: .... node #0, CPUs: #1 >>>> Feb 07 20:14:29 localhost.localdomain kernel: do_IRQ: 1.55 No irq handler for >>>> vector >>>> Feb 07 20:14:29 localhost.localdomain kernel: #2 >>>> Feb 07 20:14:29 localhost.localdomain kernel: do_IRQ: 2.55 No irq handler for >>>> vector >>>> Feb 07 20:14:29 localhost.localdomain kernel: #3 >>>> Feb 07 20:14:29 localhost.localdomain kernel: do_IRQ: 3.55 No irq handler for >>>> vector >>>> Feb 07 20:14:29 localhost.localdomain kernel: smp: Brought up 1 node, 4 CPUs >>>> Feb 07 20:14:29 localhost.localdomain kernel: smpboot: Max logical packages: 1 >>>> Feb 07 20:14:29 localhost.localdomain kernel: smpboot: Total of 4 processors >>>> activated (15968.49 BogoMIPS) >>>> >>>> It seems that we get an IRQ for each CPU as we bring it online, >>>> which feels to me like it is some sorta false-positive. >>> >>> Sigh, that looks like BIOS value add again. >>> >>> It's not a false positive. Something _IS_ sending a vector 55 to these CPUs >>> for whatever reason. >>> >> >> I remember seeing something like this in the past and it turned out to be >> a BIOS issue. BIOS was enabling the APs to interact with the legacy 8259 >> interrupt controller when only the BSP should. During POST the APs were >> exposed to ExtINT/INTR events as a result of the mis-configuration >> (probably due to a UEFI timer-tick using the 8259) and this left a pending >> ExtINT/INTR interrupt latched on the APs. >> >> When the APs were started by the OS, the latched ExtINT/INTR interrupt is >> processed shortly after the OS enables interrupts. The AP then queries the >> 8259 to identify the vector number (which is the value of the 8259's ICW2 >> register + the IRQ level). The master 8259's ICW2 was set to 0x30 and, >> since no interrupts are actually pending, the 8259 will respond with IRQ7 >> (spurious interrupt) yielding a vector of 0x37 or 55. >> >> The OS was not expecting vector 55 and printed the message. >> >> From the Intel Developer's Manual: Vol 3a, Section 10.5.1: >> "Only one processor in the system should have an LVT entry configured to >> use the ExtINT delivery mode." >> >> Not saying this is the problem, but very well could be. > > That sounds like a likely candidate, esp. also since this only happens > once per CPU when we first only the CPU. > > Can you provide me with a patch with some printk-s / pr_debugs to > test for this, then I can build a kernel with that patch added and > we can see if your hypothesis is right. Ping? I like your theory, can you provide some help with debugging this further (to prove that your theory is correct ) ? Regards, Hans ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: False positive "do_IRQ: #.55 No irq handler for vector" messages on AMD ryzen based laptops 2019-03-03 10:57 ` Hans de Goede @ 2019-03-05 14:06 ` Lendacky, Thomas 2019-03-05 16:02 ` Hans de Goede 0 siblings, 1 reply; 16+ messages in thread From: Lendacky, Thomas @ 2019-03-05 14:06 UTC (permalink / raw) To: Hans de Goede, Thomas Gleixner Cc: Linux Kernel Mailing List, Rafael J. Wysocki, Borislav Petkov On 3/3/19 4:57 AM, Hans de Goede wrote: > Hi, > > On 21-02-19 13:30, Hans de Goede wrote: >> Hi, >> >> On 19-02-19 22:47, Lendacky, Thomas wrote: >>> On 2/19/19 3:01 PM, Thomas Gleixner wrote: >>>> Hans, >>>> >>>> On Tue, 19 Feb 2019, Hans de Goede wrote: >>>> >>>> Cc+: ACPI/AMD folks >>>> >>>>> Various people are reporting false positive "do_IRQ: #.55 No irq >>>>> handler for >>>>> vector" >>>>> messages on AMD ryzen based laptops, see e.g.: >>>>> >>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1551605 >>>>> >>>>> Which contains this dmesg snippet: >>>>> >>>>> Feb 07 20:14:29 localhost.localdomain kernel: smp: Bringing up >>>>> secondary CPUs >>>>> ... >>>>> Feb 07 20:14:29 localhost.localdomain kernel: x86: Booting SMP >>>>> configuration: >>>>> Feb 07 20:14:29 localhost.localdomain kernel: .... node #0, >>>>> CPUs: #1 >>>>> Feb 07 20:14:29 localhost.localdomain kernel: do_IRQ: 1.55 No irq >>>>> handler for >>>>> vector >>>>> Feb 07 20:14:29 localhost.localdomain kernel: #2 >>>>> Feb 07 20:14:29 localhost.localdomain kernel: do_IRQ: 2.55 No irq >>>>> handler for >>>>> vector >>>>> Feb 07 20:14:29 localhost.localdomain kernel: #3 >>>>> Feb 07 20:14:29 localhost.localdomain kernel: do_IRQ: 3.55 No irq >>>>> handler for >>>>> vector >>>>> Feb 07 20:14:29 localhost.localdomain kernel: smp: Brought up 1 node, >>>>> 4 CPUs >>>>> Feb 07 20:14:29 localhost.localdomain kernel: smpboot: Max logical >>>>> packages: 1 >>>>> Feb 07 20:14:29 localhost.localdomain kernel: smpboot: Total of 4 >>>>> processors >>>>> activated (15968.49 BogoMIPS) >>>>> >>>>> It seems that we get an IRQ for each CPU as we bring it online, >>>>> which feels to me like it is some sorta false-positive. >>>> >>>> Sigh, that looks like BIOS value add again. >>>> >>>> It's not a false positive. Something _IS_ sending a vector 55 to these >>>> CPUs >>>> for whatever reason. >>>> >>> >>> I remember seeing something like this in the past and it turned out to be >>> a BIOS issue. BIOS was enabling the APs to interact with the legacy 8259 >>> interrupt controller when only the BSP should. During POST the APs were >>> exposed to ExtINT/INTR events as a result of the mis-configuration >>> (probably due to a UEFI timer-tick using the 8259) and this left a pending >>> ExtINT/INTR interrupt latched on the APs. >>> >>> When the APs were started by the OS, the latched ExtINT/INTR interrupt is >>> processed shortly after the OS enables interrupts. The AP then queries the >>> 8259 to identify the vector number (which is the value of the 8259's ICW2 >>> register + the IRQ level). The master 8259's ICW2 was set to 0x30 and, >>> since no interrupts are actually pending, the 8259 will respond with IRQ7 >>> (spurious interrupt) yielding a vector of 0x37 or 55. >>> >>> The OS was not expecting vector 55 and printed the message. >>> >>> From the Intel Developer's Manual: Vol 3a, Section 10.5.1: >>> "Only one processor in the system should have an LVT entry configured to >>> use the ExtINT delivery mode." >>> >>> Not saying this is the problem, but very well could be. >> >> That sounds like a likely candidate, esp. also since this only happens >> once per CPU when we first only the CPU. >> >> Can you provide me with a patch with some printk-s / pr_debugs to >> test for this, then I can build a kernel with that patch added and >> we can see if your hypothesis is right. > > Ping? I like your theory, can you provide some help with debugging this > further (to prove that your theory is correct ) ? It's been a very long time since I dealt with this and I was only on the periphery. You might be able to print the LVT entries from the APIC and see if any of them have an un-masked ExtINT delivery mode. You would need to do this very early before Linux modifies any values. Or you can report the issue to the OEM and have them check their BIOS code to see if they are doing this. Thanks, Tom > > Regards, > > Hans ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: False positive "do_IRQ: #.55 No irq handler for vector" messages on AMD ryzen based laptops 2019-03-05 14:06 ` Lendacky, Thomas @ 2019-03-05 16:02 ` Hans de Goede 2019-03-05 19:19 ` Hans de Goede 0 siblings, 1 reply; 16+ messages in thread From: Hans de Goede @ 2019-03-05 16:02 UTC (permalink / raw) To: Lendacky, Thomas, Thomas Gleixner Cc: Linux Kernel Mailing List, Rafael J. Wysocki, Borislav Petkov Hi, On 05-03-19 15:06, Lendacky, Thomas wrote: > On 3/3/19 4:57 AM, Hans de Goede wrote: >> Hi, >> >> On 21-02-19 13:30, Hans de Goede wrote: >>> Hi, >>> >>> On 19-02-19 22:47, Lendacky, Thomas wrote: >>>> On 2/19/19 3:01 PM, Thomas Gleixner wrote: >>>>> Hans, >>>>> >>>>> On Tue, 19 Feb 2019, Hans de Goede wrote: >>>>> >>>>> Cc+: ACPI/AMD folks >>>>> >>>>>> Various people are reporting false positive "do_IRQ: #.55 No irq >>>>>> handler for >>>>>> vector" >>>>>> messages on AMD ryzen based laptops, see e.g.: >>>>>> >>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1551605 >>>>>> >>>>>> Which contains this dmesg snippet: >>>>>> >>>>>> Feb 07 20:14:29 localhost.localdomain kernel: smp: Bringing up >>>>>> secondary CPUs >>>>>> ... >>>>>> Feb 07 20:14:29 localhost.localdomain kernel: x86: Booting SMP >>>>>> configuration: >>>>>> Feb 07 20:14:29 localhost.localdomain kernel: .... node #0, >>>>>> CPUs: #1 >>>>>> Feb 07 20:14:29 localhost.localdomain kernel: do_IRQ: 1.55 No irq >>>>>> handler for >>>>>> vector >>>>>> Feb 07 20:14:29 localhost.localdomain kernel: #2 >>>>>> Feb 07 20:14:29 localhost.localdomain kernel: do_IRQ: 2.55 No irq >>>>>> handler for >>>>>> vector >>>>>> Feb 07 20:14:29 localhost.localdomain kernel: #3 >>>>>> Feb 07 20:14:29 localhost.localdomain kernel: do_IRQ: 3.55 No irq >>>>>> handler for >>>>>> vector >>>>>> Feb 07 20:14:29 localhost.localdomain kernel: smp: Brought up 1 node, >>>>>> 4 CPUs >>>>>> Feb 07 20:14:29 localhost.localdomain kernel: smpboot: Max logical >>>>>> packages: 1 >>>>>> Feb 07 20:14:29 localhost.localdomain kernel: smpboot: Total of 4 >>>>>> processors >>>>>> activated (15968.49 BogoMIPS) >>>>>> >>>>>> It seems that we get an IRQ for each CPU as we bring it online, >>>>>> which feels to me like it is some sorta false-positive. >>>>> >>>>> Sigh, that looks like BIOS value add again. >>>>> >>>>> It's not a false positive. Something _IS_ sending a vector 55 to these >>>>> CPUs >>>>> for whatever reason. >>>>> >>>> >>>> I remember seeing something like this in the past and it turned out to be >>>> a BIOS issue. BIOS was enabling the APs to interact with the legacy 8259 >>>> interrupt controller when only the BSP should. During POST the APs were >>>> exposed to ExtINT/INTR events as a result of the mis-configuration >>>> (probably due to a UEFI timer-tick using the 8259) and this left a pending >>>> ExtINT/INTR interrupt latched on the APs. >>>> >>>> When the APs were started by the OS, the latched ExtINT/INTR interrupt is >>>> processed shortly after the OS enables interrupts. The AP then queries the >>>> 8259 to identify the vector number (which is the value of the 8259's ICW2 >>>> register + the IRQ level). The master 8259's ICW2 was set to 0x30 and, >>>> since no interrupts are actually pending, the 8259 will respond with IRQ7 >>>> (spurious interrupt) yielding a vector of 0x37 or 55. >>>> >>>> The OS was not expecting vector 55 and printed the message. >>>> >>>> From the Intel Developer's Manual: Vol 3a, Section 10.5.1: >>>> "Only one processor in the system should have an LVT entry configured to >>>> use the ExtINT delivery mode." >>>> >>>> Not saying this is the problem, but very well could be. >>> >>> That sounds like a likely candidate, esp. also since this only happens >>> once per CPU when we first only the CPU. >>> >>> Can you provide me with a patch with some printk-s / pr_debugs to >>> test for this, then I can build a kernel with that patch added and >>> we can see if your hypothesis is right. >> >> Ping? I like your theory, can you provide some help with debugging this >> further (to prove that your theory is correct ) ? > > It's been a very long time since I dealt with this and I was only on the > periphery. You might be able to print the LVT entries from the APIC and > see if any of them have an un-masked ExtINT delivery mode. You would need > to do this very early before Linux modifies any values. I'm afraid I'm not familiar enough with the interrupt / APIC parts of the kernel to do something like this myself. > Or you can report the issue to the OEM and have them check their BIOS > code to see if they are doing this. I will try to go this route, but I'm not really hopeful that will lead to a solution. Regards, Hans ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: False positive "do_IRQ: #.55 No irq handler for vector" messages on AMD ryzen based laptops 2019-03-05 16:02 ` Hans de Goede @ 2019-03-05 19:19 ` Hans de Goede 2019-03-05 19:31 ` Lendacky, Thomas 0 siblings, 1 reply; 16+ messages in thread From: Hans de Goede @ 2019-03-05 19:19 UTC (permalink / raw) To: Lendacky, Thomas, Thomas Gleixner Cc: Linux Kernel Mailing List, Rafael J. Wysocki, Borislav Petkov Hi, On 05-03-19 17:02, Hans de Goede wrote: > Hi, > > On 05-03-19 15:06, Lendacky, Thomas wrote: >> On 3/3/19 4:57 AM, Hans de Goede wrote: >>> Hi, >>> >>> On 21-02-19 13:30, Hans de Goede wrote: >>>> Hi, >>>> >>>> On 19-02-19 22:47, Lendacky, Thomas wrote: >>>>> On 2/19/19 3:01 PM, Thomas Gleixner wrote: >>>>>> Hans, >>>>>> >>>>>> On Tue, 19 Feb 2019, Hans de Goede wrote: >>>>>> >>>>>> Cc+: ACPI/AMD folks >>>>>> >>>>>>> Various people are reporting false positive "do_IRQ: #.55 No irq >>>>>>> handler for >>>>>>> vector" >>>>>>> messages on AMD ryzen based laptops, see e.g.: >>>>>>> >>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1551605 >>>>>>> >>>>>>> Which contains this dmesg snippet: >>>>>>> >>>>>>> Feb 07 20:14:29 localhost.localdomain kernel: smp: Bringing up >>>>>>> secondary CPUs >>>>>>> ... >>>>>>> Feb 07 20:14:29 localhost.localdomain kernel: x86: Booting SMP >>>>>>> configuration: >>>>>>> Feb 07 20:14:29 localhost.localdomain kernel: .... node #0, >>>>>>> CPUs: #1 >>>>>>> Feb 07 20:14:29 localhost.localdomain kernel: do_IRQ: 1.55 No irq >>>>>>> handler for >>>>>>> vector >>>>>>> Feb 07 20:14:29 localhost.localdomain kernel: #2 >>>>>>> Feb 07 20:14:29 localhost.localdomain kernel: do_IRQ: 2.55 No irq >>>>>>> handler for >>>>>>> vector >>>>>>> Feb 07 20:14:29 localhost.localdomain kernel: #3 >>>>>>> Feb 07 20:14:29 localhost.localdomain kernel: do_IRQ: 3.55 No irq >>>>>>> handler for >>>>>>> vector >>>>>>> Feb 07 20:14:29 localhost.localdomain kernel: smp: Brought up 1 node, >>>>>>> 4 CPUs >>>>>>> Feb 07 20:14:29 localhost.localdomain kernel: smpboot: Max logical >>>>>>> packages: 1 >>>>>>> Feb 07 20:14:29 localhost.localdomain kernel: smpboot: Total of 4 >>>>>>> processors >>>>>>> activated (15968.49 BogoMIPS) >>>>>>> >>>>>>> It seems that we get an IRQ for each CPU as we bring it online, >>>>>>> which feels to me like it is some sorta false-positive. >>>>>> >>>>>> Sigh, that looks like BIOS value add again. >>>>>> >>>>>> It's not a false positive. Something _IS_ sending a vector 55 to these >>>>>> CPUs >>>>>> for whatever reason. >>>>>> >>>>> >>>>> I remember seeing something like this in the past and it turned out to be >>>>> a BIOS issue. BIOS was enabling the APs to interact with the legacy 8259 >>>>> interrupt controller when only the BSP should. During POST the APs were >>>>> exposed to ExtINT/INTR events as a result of the mis-configuration >>>>> (probably due to a UEFI timer-tick using the 8259) and this left a pending >>>>> ExtINT/INTR interrupt latched on the APs. >>>>> >>>>> When the APs were started by the OS, the latched ExtINT/INTR interrupt is >>>>> processed shortly after the OS enables interrupts. The AP then queries the >>>>> 8259 to identify the vector number (which is the value of the 8259's ICW2 >>>>> register + the IRQ level). The master 8259's ICW2 was set to 0x30 and, >>>>> since no interrupts are actually pending, the 8259 will respond with IRQ7 >>>>> (spurious interrupt) yielding a vector of 0x37 or 55. >>>>> >>>>> The OS was not expecting vector 55 and printed the message. >>>>> >>>>> From the Intel Developer's Manual: Vol 3a, Section 10.5.1: >>>>> "Only one processor in the system should have an LVT entry configured to >>>>> use the ExtINT delivery mode." >>>>> >>>>> Not saying this is the problem, but very well could be. >>>> >>>> That sounds like a likely candidate, esp. also since this only happens >>>> once per CPU when we first only the CPU. >>>> >>>> Can you provide me with a patch with some printk-s / pr_debugs to >>>> test for this, then I can build a kernel with that patch added and >>>> we can see if your hypothesis is right. >>> >>> Ping? I like your theory, can you provide some help with debugging this >>> further (to prove that your theory is correct ) ? >> >> It's been a very long time since I dealt with this and I was only on the >> periphery. You might be able to print the LVT entries from the APIC and >> see if any of them have an un-masked ExtINT delivery mode. You would need >> to do this very early before Linux modifies any values. > > I'm afraid I'm not familiar enough with the interrupt / APIC parts of > the kernel to do something like this myself. > >> Or you can report the issue to the OEM and have them check their BIOS >> code to see if they are doing this. > > I will try to go this route, but I'm not really hopeful that will > lead to a solution. A similar issue is also reported here: https://bugzilla.redhat.com/show_bug.cgi?id=1551605 There are multiple people with different vectors (so likely / possibly different bugs) commenting on that bug, but I just got confirmation that the vector 55 issue is also happening on an Acer system with an AMD A8 processor (I suspect a Ryzen, but that still needs to be confirmed). So this seems to be a generic issue with (some) AMD laptops and not specific to one OEM. Regards, Hans ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: False positive "do_IRQ: #.55 No irq handler for vector" messages on AMD ryzen based laptops 2019-03-05 19:19 ` Hans de Goede @ 2019-03-05 19:31 ` Lendacky, Thomas 2019-03-05 19:40 ` Hans de Goede 0 siblings, 1 reply; 16+ messages in thread From: Lendacky, Thomas @ 2019-03-05 19:31 UTC (permalink / raw) To: Hans de Goede, Thomas Gleixner Cc: Linux Kernel Mailing List, Rafael J. Wysocki, Borislav Petkov On 3/5/19 1:19 PM, Hans de Goede wrote: > Hi, > > On 05-03-19 17:02, Hans de Goede wrote: >> Hi, >> >> On 05-03-19 15:06, Lendacky, Thomas wrote: >>> On 3/3/19 4:57 AM, Hans de Goede wrote: >>>> Hi, >>>> >>>> On 21-02-19 13:30, Hans de Goede wrote: >>>>> Hi, >>>>> >>>>> On 19-02-19 22:47, Lendacky, Thomas wrote: >>>>>> On 2/19/19 3:01 PM, Thomas Gleixner wrote: >>>>>>> Hans, >>>>>>> >>>>>>> On Tue, 19 Feb 2019, Hans de Goede wrote: >>>>>>> >>>>>>> Cc+: ACPI/AMD folks >>>>>>> >>>>>>>> Various people are reporting false positive "do_IRQ: #.55 No irq >>>>>>>> handler for >>>>>>>> vector" >>>>>>>> messages on AMD ryzen based laptops, see e.g.: >>>>>>>> >>>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1551605 >>>>>>>> >>>>>>>> Which contains this dmesg snippet: >>>>>>>> >>>>>>>> Feb 07 20:14:29 localhost.localdomain kernel: smp: Bringing up >>>>>>>> secondary CPUs >>>>>>>> ... >>>>>>>> Feb 07 20:14:29 localhost.localdomain kernel: x86: Booting SMP >>>>>>>> configuration: >>>>>>>> Feb 07 20:14:29 localhost.localdomain kernel: .... node #0, >>>>>>>> CPUs: #1 >>>>>>>> Feb 07 20:14:29 localhost.localdomain kernel: do_IRQ: 1.55 No irq >>>>>>>> handler for >>>>>>>> vector >>>>>>>> Feb 07 20:14:29 localhost.localdomain kernel: #2 >>>>>>>> Feb 07 20:14:29 localhost.localdomain kernel: do_IRQ: 2.55 No irq >>>>>>>> handler for >>>>>>>> vector >>>>>>>> Feb 07 20:14:29 localhost.localdomain kernel: #3 >>>>>>>> Feb 07 20:14:29 localhost.localdomain kernel: do_IRQ: 3.55 No irq >>>>>>>> handler for >>>>>>>> vector >>>>>>>> Feb 07 20:14:29 localhost.localdomain kernel: smp: Brought up 1 node, >>>>>>>> 4 CPUs >>>>>>>> Feb 07 20:14:29 localhost.localdomain kernel: smpboot: Max logical >>>>>>>> packages: 1 >>>>>>>> Feb 07 20:14:29 localhost.localdomain kernel: smpboot: Total of 4 >>>>>>>> processors >>>>>>>> activated (15968.49 BogoMIPS) >>>>>>>> >>>>>>>> It seems that we get an IRQ for each CPU as we bring it online, >>>>>>>> which feels to me like it is some sorta false-positive. >>>>>>> >>>>>>> Sigh, that looks like BIOS value add again. >>>>>>> >>>>>>> It's not a false positive. Something _IS_ sending a vector 55 to these >>>>>>> CPUs >>>>>>> for whatever reason. >>>>>>> >>>>>> >>>>>> I remember seeing something like this in the past and it turned out >>>>>> to be >>>>>> a BIOS issue. BIOS was enabling the APs to interact with the legacy >>>>>> 8259 >>>>>> interrupt controller when only the BSP should. During POST the APs were >>>>>> exposed to ExtINT/INTR events as a result of the mis-configuration >>>>>> (probably due to a UEFI timer-tick using the 8259) and this left a >>>>>> pending >>>>>> ExtINT/INTR interrupt latched on the APs. >>>>>> >>>>>> When the APs were started by the OS, the latched ExtINT/INTR >>>>>> interrupt is >>>>>> processed shortly after the OS enables interrupts. The AP then >>>>>> queries the >>>>>> 8259 to identify the vector number (which is the value of the 8259's >>>>>> ICW2 >>>>>> register + the IRQ level). The master 8259's ICW2 was set to 0x30 and, >>>>>> since no interrupts are actually pending, the 8259 will respond with >>>>>> IRQ7 >>>>>> (spurious interrupt) yielding a vector of 0x37 or 55. >>>>>> >>>>>> The OS was not expecting vector 55 and printed the message. >>>>>> >>>>>> From the Intel Developer's Manual: Vol 3a, Section 10.5.1: >>>>>> "Only one processor in the system should have an LVT entry >>>>>> configured to >>>>>> use the ExtINT delivery mode." >>>>>> >>>>>> Not saying this is the problem, but very well could be. >>>>> >>>>> That sounds like a likely candidate, esp. also since this only happens >>>>> once per CPU when we first only the CPU. >>>>> >>>>> Can you provide me with a patch with some printk-s / pr_debugs to >>>>> test for this, then I can build a kernel with that patch added and >>>>> we can see if your hypothesis is right. >>>> >>>> Ping? I like your theory, can you provide some help with debugging this >>>> further (to prove that your theory is correct ) ? >>> >>> It's been a very long time since I dealt with this and I was only on the >>> periphery. You might be able to print the LVT entries from the APIC and >>> see if any of them have an un-masked ExtINT delivery mode. You would need >>> to do this very early before Linux modifies any values. >> >> I'm afraid I'm not familiar enough with the interrupt / APIC parts of >> the kernel to do something like this myself. >> >>> Or you can report the issue to the OEM and have them check their BIOS >>> code to see if they are doing this. >> >> I will try to go this route, but I'm not really hopeful that will >> lead to a solution. > > A similar issue is also reported here: > > https://bugzilla.redhat.com/show_bug.cgi?id=1551605 > > There are multiple people with different vectors (so likely / possibly > different bugs) commenting on that bug, but I just got confirmation > that the vector 55 issue is also happening on an Acer system with an AMD > A8 processor (I suspect a Ryzen, but that still needs to be confirmed). > > So this seems to be a generic issue with (some) AMD laptops and > not specific to one OEM. I also see that comment 17 is for an Intel based machine, which to me implies that it really is a BIOS issue. Thanks, Tom > > Regards, > > Hans ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: False positive "do_IRQ: #.55 No irq handler for vector" messages on AMD ryzen based laptops 2019-03-05 19:31 ` Lendacky, Thomas @ 2019-03-05 19:40 ` Hans de Goede 2019-03-05 19:54 ` Borislav Petkov 0 siblings, 1 reply; 16+ messages in thread From: Hans de Goede @ 2019-03-05 19:40 UTC (permalink / raw) To: Lendacky, Thomas, Thomas Gleixner Cc: Linux Kernel Mailing List, Rafael J. Wysocki, Borislav Petkov Hi, On 05-03-19 20:31, Lendacky, Thomas wrote: > On 3/5/19 1:19 PM, Hans de Goede wrote: >> Hi, >> >> On 05-03-19 17:02, Hans de Goede wrote: >>> Hi, >>> >>> On 05-03-19 15:06, Lendacky, Thomas wrote: >>>> On 3/3/19 4:57 AM, Hans de Goede wrote: >>>>> Hi, >>>>> >>>>> On 21-02-19 13:30, Hans de Goede wrote: >>>>>> Hi, >>>>>> >>>>>> On 19-02-19 22:47, Lendacky, Thomas wrote: >>>>>>> On 2/19/19 3:01 PM, Thomas Gleixner wrote: >>>>>>>> Hans, >>>>>>>> >>>>>>>> On Tue, 19 Feb 2019, Hans de Goede wrote: >>>>>>>> >>>>>>>> Cc+: ACPI/AMD folks >>>>>>>> >>>>>>>>> Various people are reporting false positive "do_IRQ: #.55 No irq >>>>>>>>> handler for >>>>>>>>> vector" >>>>>>>>> messages on AMD ryzen based laptops, see e.g.: >>>>>>>>> >>>>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1551605 >>>>>>>>> >>>>>>>>> Which contains this dmesg snippet: >>>>>>>>> >>>>>>>>> Feb 07 20:14:29 localhost.localdomain kernel: smp: Bringing up >>>>>>>>> secondary CPUs >>>>>>>>> ... >>>>>>>>> Feb 07 20:14:29 localhost.localdomain kernel: x86: Booting SMP >>>>>>>>> configuration: >>>>>>>>> Feb 07 20:14:29 localhost.localdomain kernel: .... node #0, >>>>>>>>> CPUs: #1 >>>>>>>>> Feb 07 20:14:29 localhost.localdomain kernel: do_IRQ: 1.55 No irq >>>>>>>>> handler for >>>>>>>>> vector >>>>>>>>> Feb 07 20:14:29 localhost.localdomain kernel: #2 >>>>>>>>> Feb 07 20:14:29 localhost.localdomain kernel: do_IRQ: 2.55 No irq >>>>>>>>> handler for >>>>>>>>> vector >>>>>>>>> Feb 07 20:14:29 localhost.localdomain kernel: #3 >>>>>>>>> Feb 07 20:14:29 localhost.localdomain kernel: do_IRQ: 3.55 No irq >>>>>>>>> handler for >>>>>>>>> vector >>>>>>>>> Feb 07 20:14:29 localhost.localdomain kernel: smp: Brought up 1 node, >>>>>>>>> 4 CPUs >>>>>>>>> Feb 07 20:14:29 localhost.localdomain kernel: smpboot: Max logical >>>>>>>>> packages: 1 >>>>>>>>> Feb 07 20:14:29 localhost.localdomain kernel: smpboot: Total of 4 >>>>>>>>> processors >>>>>>>>> activated (15968.49 BogoMIPS) >>>>>>>>> >>>>>>>>> It seems that we get an IRQ for each CPU as we bring it online, >>>>>>>>> which feels to me like it is some sorta false-positive. >>>>>>>> >>>>>>>> Sigh, that looks like BIOS value add again. >>>>>>>> >>>>>>>> It's not a false positive. Something _IS_ sending a vector 55 to these >>>>>>>> CPUs >>>>>>>> for whatever reason. >>>>>>>> >>>>>>> >>>>>>> I remember seeing something like this in the past and it turned out >>>>>>> to be >>>>>>> a BIOS issue. BIOS was enabling the APs to interact with the legacy >>>>>>> 8259 >>>>>>> interrupt controller when only the BSP should. During POST the APs were >>>>>>> exposed to ExtINT/INTR events as a result of the mis-configuration >>>>>>> (probably due to a UEFI timer-tick using the 8259) and this left a >>>>>>> pending >>>>>>> ExtINT/INTR interrupt latched on the APs. >>>>>>> >>>>>>> When the APs were started by the OS, the latched ExtINT/INTR >>>>>>> interrupt is >>>>>>> processed shortly after the OS enables interrupts. The AP then >>>>>>> queries the >>>>>>> 8259 to identify the vector number (which is the value of the 8259's >>>>>>> ICW2 >>>>>>> register + the IRQ level). The master 8259's ICW2 was set to 0x30 and, >>>>>>> since no interrupts are actually pending, the 8259 will respond with >>>>>>> IRQ7 >>>>>>> (spurious interrupt) yielding a vector of 0x37 or 55. >>>>>>> >>>>>>> The OS was not expecting vector 55 and printed the message. >>>>>>> >>>>>>> From the Intel Developer's Manual: Vol 3a, Section 10.5.1: >>>>>>> "Only one processor in the system should have an LVT entry >>>>>>> configured to >>>>>>> use the ExtINT delivery mode." >>>>>>> >>>>>>> Not saying this is the problem, but very well could be. >>>>>> >>>>>> That sounds like a likely candidate, esp. also since this only happens >>>>>> once per CPU when we first only the CPU. >>>>>> >>>>>> Can you provide me with a patch with some printk-s / pr_debugs to >>>>>> test for this, then I can build a kernel with that patch added and >>>>>> we can see if your hypothesis is right. >>>>> >>>>> Ping? I like your theory, can you provide some help with debugging this >>>>> further (to prove that your theory is correct ) ? >>>> >>>> It's been a very long time since I dealt with this and I was only on the >>>> periphery. You might be able to print the LVT entries from the APIC and >>>> see if any of them have an un-masked ExtINT delivery mode. You would need >>>> to do this very early before Linux modifies any values. >>> >>> I'm afraid I'm not familiar enough with the interrupt / APIC parts of >>> the kernel to do something like this myself. >>> >>>> Or you can report the issue to the OEM and have them check their BIOS >>>> code to see if they are doing this. >>> >>> I will try to go this route, but I'm not really hopeful that will >>> lead to a solution. >> >> A similar issue is also reported here: >> >> https://bugzilla.redhat.com/show_bug.cgi?id=1551605 >> >> There are multiple people with different vectors (so likely / possibly >> different bugs) commenting on that bug, but I just got confirmation >> that the vector 55 issue is also happening on an Acer system with an AMD >> A8 processor (I suspect a Ryzen, but that still needs to be confirmed). >> >> So this seems to be a generic issue with (some) AMD laptops and >> not specific to one OEM. > > I also see that comment 17 is for an Intel based machine, which to me > implies that it really is a BIOS issue. That user is seeing "No irq handler for vector" on vectors 33-35 so that is likely / possibly another bug. Finger pointing at the firmware if there are multiple vendors involved is really not going to help here. Esp. since most OEMs will just respond with "the machine works fine with Windows" Regards, Hans ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: False positive "do_IRQ: #.55 No irq handler for vector" messages on AMD ryzen based laptops 2019-03-05 19:40 ` Hans de Goede @ 2019-03-05 19:54 ` Borislav Petkov 2019-03-06 8:41 ` Hans de Goede 0 siblings, 1 reply; 16+ messages in thread From: Borislav Petkov @ 2019-03-05 19:54 UTC (permalink / raw) To: Hans de Goede Cc: Lendacky, Thomas, Thomas Gleixner, Linux Kernel Mailing List, Rafael J. Wysocki On Tue, Mar 05, 2019 at 08:40:02PM +0100, Hans de Goede wrote: > Finger pointing at the firmware if there are multiple vendors involved > is really not going to help here. Esp. since most OEMs will just respond > with "the machine works fine with Windows" Yes, because windoze simply doesn't report that spurious IRQ, most likely. Firmware is fiddling with some crap underneath and it ends up raising IRQs. tglx told you that too. -- Regards/Gruss, Boris. Good mailing practices for 400: avoid top-posting and trim the reply. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: False positive "do_IRQ: #.55 No irq handler for vector" messages on AMD ryzen based laptops 2019-03-05 19:54 ` Borislav Petkov @ 2019-03-06 8:41 ` Hans de Goede 2019-03-06 10:14 ` Thomas Gleixner 0 siblings, 1 reply; 16+ messages in thread From: Hans de Goede @ 2019-03-06 8:41 UTC (permalink / raw) To: Borislav Petkov Cc: Lendacky, Thomas, Thomas Gleixner, Linux Kernel Mailing List, Rafael J. Wysocki Hi, On 05-03-19 20:54, Borislav Petkov wrote: > On Tue, Mar 05, 2019 at 08:40:02PM +0100, Hans de Goede wrote: >> Finger pointing at the firmware if there are multiple vendors involved >> is really not going to help here. Esp. since most OEMs will just respond >> with "the machine works fine with Windows" > > Yes, because windoze simply doesn't report that spurious IRQ, most > likely. So maybe we need to lower the priority of the do_IRQ error from pr_emerg to pr_err then ? That will stop throwing the errors in the users face each boot on distros which have chosen to set the quiet loglevel to such a level that pr_err messages are not shown on the console (*). Regards, Hans *) Since there are simply too much false-positive pr_err messages in the kernel, try e.g. plugging in a usb-stick and then do "dmesg -level=err" Note the messages will still be in dmesg and in the system logs ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: False positive "do_IRQ: #.55 No irq handler for vector" messages on AMD ryzen based laptops 2019-03-06 8:41 ` Hans de Goede @ 2019-03-06 10:14 ` Thomas Gleixner 2019-03-07 11:20 ` Hans de Goede 0 siblings, 1 reply; 16+ messages in thread From: Thomas Gleixner @ 2019-03-06 10:14 UTC (permalink / raw) To: Hans de Goede Cc: Borislav Petkov, Lendacky, Thomas, Linux Kernel Mailing List, Rafael J. Wysocki Hans, On Wed, 6 Mar 2019, Hans de Goede wrote: > On 05-03-19 20:54, Borislav Petkov wrote: > > On Tue, Mar 05, 2019 at 08:40:02PM +0100, Hans de Goede wrote: > > > Finger pointing at the firmware if there are multiple vendors involved > > > is really not going to help here. Esp. since most OEMs will just respond > > > with "the machine works fine with Windows" > > > > Yes, because windoze simply doesn't report that spurious IRQ, most > > likely. > > So maybe we need to lower the priority of the do_IRQ error from pr_emerg > to pr_err then ? That will stop throwing the errors in the users face each > boot on distros which have chosen to set the quiet loglevel to such a level > that pr_err messages are not shown on the console (*). Well, we rather try to understand and fix the issue. So if Tom's theory holds, then the patch below should cure it. Thanks, tglx 8<--------------------- --- a/arch/x86/kernel/apic/apic.c +++ b/arch/x86/kernel/apic/apic.c @@ -1642,6 +1642,7 @@ static void end_local_APIC_setup(void) */ void apic_ap_setup(void) { + clear_local_APIC(); setup_local_APIC(); end_local_APIC_setup(); } ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: False positive "do_IRQ: #.55 No irq handler for vector" messages on AMD ryzen based laptops 2019-03-06 10:14 ` Thomas Gleixner @ 2019-03-07 11:20 ` Hans de Goede 0 siblings, 0 replies; 16+ messages in thread From: Hans de Goede @ 2019-03-07 11:20 UTC (permalink / raw) To: Thomas Gleixner Cc: Borislav Petkov, Lendacky, Thomas, Linux Kernel Mailing List, Rafael J. Wysocki Hi, On 06-03-19 11:14, Thomas Gleixner wrote: > Hans, > > On Wed, 6 Mar 2019, Hans de Goede wrote: >> On 05-03-19 20:54, Borislav Petkov wrote: >>> On Tue, Mar 05, 2019 at 08:40:02PM +0100, Hans de Goede wrote: >>>> Finger pointing at the firmware if there are multiple vendors involved >>>> is really not going to help here. Esp. since most OEMs will just respond >>>> with "the machine works fine with Windows" >>> >>> Yes, because windoze simply doesn't report that spurious IRQ, most >>> likely. >> >> So maybe we need to lower the priority of the do_IRQ error from pr_emerg >> to pr_err then ? That will stop throwing the errors in the users face each >> boot on distros which have chosen to set the quiet loglevel to such a level >> that pr_err messages are not shown on the console (*). > > Well, we rather try to understand and fix the issue. > > So if Tom's theory holds, then the patch below should cure it. Thank you for the patch, unfortunately the messages still happen with a kernel with the patch applied: [ 0.741479] smp: Bringing up secondary CPUs ... [ 0.741654] x86: Booting SMP configuration: [ 0.741655] .... node #0, CPUs: #1 [ 0.742231] TSC synchronization [CPU#0 -> CPU#1]: [ 0.742231] Measured 3346474670 cycles TSC warp between CPUs, turning off TSC clock. [ 0.742231] tsc: Marking TSC unstable due to check_tsc_sync_source failed [ 0.321639] do_IRQ: 1.55 No irq handler for vector [ 0.743371] #2 [ 0.321639] do_IRQ: 2.55 No irq handler for vector [ 0.743598] #3 [ 0.321639] do_IRQ: 3.55 No irq handler for vector [ 0.744306] #4 [ 0.321639] do_IRQ: 4.55 No irq handler for vector [ 0.744531] #5 [ 0.321639] do_IRQ: 5.55 No irq handler for vector [ 0.745241] #6 [ 0.321639] do_IRQ: 6.55 No irq handler for vector [ 0.745467] #7 [ 0.321639] do_IRQ: 7.55 No irq handler for vector [ 0.745627] smp: Brought up 1 node, 8 CPUs [ 0.745627] smpboot: Max logical packages: 2 [ 0.745627] smpboot: Total of 8 processors activated (35133.37 BogoMIPS) I also tried suspend/resume. In that case there are no extra "No irq handler for vector" printed, this seems to only trigger once per CPU on boot only. I do get these messages during resume, but I guess these are unrelated: [ 167.034247] ACPI: Low-level resume complete [ 167.034247] ACPI: EC: EC started [ 167.034247] PM: Restoring platform NVS memory [ 167.034247] Enabling non-boot CPUs ... [ 167.034247] x86: Booting SMP configuration: [ 167.034247] smpboot: Booting Node 0 Processor 1 APIC 0x1 [ 167.034247] cache: parent cpu1 should not be sleeping [ 167.034281] microcode: CPU1: patch_level=0x08101007 [ 167.034542] CPU1 is up [ 167.034583] smpboot: Booting Node 0 Processor 2 APIC 0x2 [ 167.035347] cache: parent cpu2 should not be sleeping [ 167.035484] microcode: CPU2: patch_level=0x08101007 [ 167.035690] CPU2 is up [ 167.035703] smpboot: Booting Node 0 Processor 3 APIC 0x3 [ 167.036447] cache: parent cpu3 should not be sleeping [ 167.036580] microcode: CPU3: patch_level=0x08101007 [ 167.036819] CPU3 is up [ 167.036843] smpboot: Booting Node 0 Processor 4 APIC 0x4 [ 167.038227] cache: parent cpu4 should not be sleeping [ 167.038384] microcode: CPU4: patch_level=0x08101007 etc. Regards, Hans > 8<--------------------- > > --- a/arch/x86/kernel/apic/apic.c > +++ b/arch/x86/kernel/apic/apic.c > @@ -1642,6 +1642,7 @@ static void end_local_APIC_setup(void) > */ > void apic_ap_setup(void) > { > + clear_local_APIC(); > setup_local_APIC(); > end_local_APIC_setup(); > } > ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: False positive "do_IRQ: #.55 No irq handler for vector" messages on AMD ryzen based laptops 2019-02-19 21:01 ` Thomas Gleixner 2019-02-19 21:47 ` Lendacky, Thomas @ 2019-02-21 12:28 ` Hans de Goede 1 sibling, 0 replies; 16+ messages in thread From: Hans de Goede @ 2019-02-21 12:28 UTC (permalink / raw) To: Thomas Gleixner Cc: Linux Kernel Mailing List, Rafael J. Wysocki, Borislav Petkov, Tom Lendacky Hi, On 19-02-19 22:01, Thomas Gleixner wrote: > Hans, > > On Tue, 19 Feb 2019, Hans de Goede wrote: > > Cc+: ACPI/AMD folks > >> Various people are reporting false positive "do_IRQ: #.55 No irq handler for >> vector" >> messages on AMD ryzen based laptops, see e.g.: >> >> https://bugzilla.redhat.com/show_bug.cgi?id=1551605 >> >> Which contains this dmesg snippet: >> >> Feb 07 20:14:29 localhost.localdomain kernel: smp: Bringing up secondary CPUsHi, >> ... >> Feb 07 20:14:29 localhost.localdomain kernel: x86: Booting SMP configuration: >> Feb 07 20:14:29 localhost.localdomain kernel: .... node #0, CPUs: #1 >> Feb 07 20:14:29 localhost.localdomain kernel: do_IRQ: 1.55 No irq handler for >> vector >> Feb 07 20:14:29 localhost.localdomain kernel: #2 >> Feb 07 20:14:29 localhost.localdomain kernel: do_IRQ: 2.55 No irq handler for >> vector >> Feb 07 20:14:29 localhost.localdomain kernel: #3 >> Feb 07 20:14:29 localhost.localdomain kernel: do_IRQ: 3.55 No irq handler for >> vector >> Feb 07 20:14:29 localhost.localdomain kernel: smp: Brought up 1 node, 4 CPUs >> Feb 07 20:14:29 localhost.localdomain kernel: smpboot: Max logical packages: 1 >> Feb 07 20:14:29 localhost.localdomain kernel: smpboot: Total of 4 processors >> activated (15968.49 BogoMIPS) >> >> It seems that we get an IRQ for each CPU as we bring it online, >> which feels to me like it is some sorta false-positive. > > Sigh, that looks like BIOS value add again. > > It's not a false positive. Something _IS_ sending a vector 55 to these CPUs > for whatever reason. > >> I temporarily have access to a loaner laptop for a couple of weeks which shows >> the same errors and I would like to fix this, but I don't really know how to >> fix this. > > Can you please enable CONFIG_GENERIC_IRQ_DEBUGFS and dig in the files there > whether vector 55 is used on CPU0 and which device is associated to that. ls /sys/kernel/debug/irq/domains gives: AMD-IR-0 IO-APIC-IR-0 PCI-MSI-3 default AMD-IR-MSI-0-3 IO-APIC-IR-1 VECTOR Non of the files under /sys/kernel/debug/irq/domains list 55 under the "vectors" column of their output. The part with the vectors column is identical for all of them and looks like this for all of them: | CPU | avl | man | mac | act | vectors 0 195 1 1 6 33-37,48 1 195 1 1 6 33-38 2 195 1 1 6 33-38 3 195 1 1 6 33-38 4 195 1 1 6 33-38 5 195 1 1 6 33-38 6 195 1 1 6 33-38 7 195 1 1 6 33-38 cat /sys/kernel/debug/irq/irqs/55 Gives: handler: handle_fasteoi_irq device: (null) status: 0x00004100 istate: 0x00000000 ddepth: 1 wdepth: 0 dstate: 0x0503a000 IRQD_LEVEL IRQD_IRQ_DISABLED IRQD_IRQ_MASKED IRQD_SINGLE_TARGET IRQD_MOVE_PCNTXT IRQD_CAN_RESERVE node: -1 affinity: 0-15 effectiv: 0 pending: domain: IO-APIC-IR-1 hwirq: 0x0 chip: IR-IO-APIC flags: 0x10 IRQCHIP_SKIP_SET_WAKE parent: domain: AMD-IR-0 hwirq: 0x10000 chip: AMD-IR flags: 0x0 parent: domain: VECTOR hwirq: 0x37 chip: APIC flags: 0x0 Vector: 0 Target: 0 move_in_progress: 0 is_managed: 0 can_reserve: 1 has_reserved: 1 cleanup_pending: 0 cat /proc/interrupt Gives: CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7 0: 123 0 0 0 0 0 0 0 IR-IO-APIC 2-edge timer 1: 0 0 0 0 0 0 188 0 IR-IO-APIC 1-edge i8042 8: 0 0 0 0 0 0 0 1 IR-IO-APIC 8-edge rtc0 9: 0 6564 0 0 0 0 0 0 IR-IO-APIC 9-fasteoi acpi 12: 0 0 0 0 0 511 0 0 IR-IO-APIC 12-edge i8042 25: 0 0 0 0 0 0 0 0 PCI-MSI 4096-edge AMD-Vi 26: 0 0 0 0 0 0 0 0 IR-PCI-MSI 18432-edge PCIe PME, aerdrv 27: 0 0 0 0 0 0 0 0 IR-PCI-MSI 20480-edge PCIe PME, aerdrv 28: 0 0 0 0 0 0 0 0 IR-PCI-MSI 22528-edge PCIe PME, aerdrv 29: 0 0 0 0 0 0 0 0 IR-PCI-MSI 24576-edge PCIe PME, aerdrv 30: 0 0 0 0 0 0 0 0 IR-PCI-MSI 26624-edge PCIe PME, aerdrv 31: 0 0 0 0 0 0 0 0 IR-PCI-MSI 28672-edge PCIe PME, aerdrv 32: 0 0 0 0 0 0 0 0 IR-PCI-MSI 133120-edge PCIe PME 33: 0 0 0 0 0 0 0 0 IR-PCI-MSI 135168-edge PCIe PME 35: 0 0 0 0 0 0 0 0 IR-PCI-MSI 4194304-edge ahci[0000:08:00.0] 36: 0 0 0 0 0 0 0 0 IR-IO-APIC 15-fasteoi ehci_hcd:usb1 38: 0 0 0 0 0 0 0 0 IR-PCI-MSI 3676160-edge xhci_hcd 39: 0 0 0 0 0 0 0 0 IR-PCI-MSI 3676161-edge xhci_hcd 40: 0 0 0 0 0 0 0 0 IR-PCI-MSI 3676162-edge xhci_hcd 41: 0 0 0 0 0 0 0 0 IR-PCI-MSI 3676163-edge xhci_hcd 42: 0 0 0 0 0 0 0 0 IR-PCI-MSI 3676164-edge xhci_hcd 43: 0 0 0 0 0 0 0 0 IR-PCI-MSI 3676165-edge xhci_hcd 44: 0 0 0 0 0 0 0 0 IR-PCI-MSI 3676166-edge xhci_hcd 45: 0 0 0 0 0 0 0 0 IR-PCI-MSI 3676167-edge xhci_hcd 47: 0 0 0 0 0 623 0 0 IR-PCI-MSI 3678208-edge xhci_hcd 48: 0 0 0 0 0 0 0 0 IR-PCI-MSI 3678209-edge xhci_hcd 49: 0 0 0 0 0 0 0 0 IR-PCI-MSI 3678210-edge xhci_hcd 50: 0 0 0 0 0 0 0 0 IR-PCI-MSI 3678211-edge xhci_hcd 51: 0 0 0 0 0 0 0 0 IR-PCI-MSI 3678212-edge xhci_hcd 52: 0 0 0 0 0 0 0 0 IR-PCI-MSI 3678213-edge xhci_hcd 53: 0 0 0 0 0 0 0 0 IR-PCI-MSI 3678214-edge xhci_hcd 54: 0 0 0 0 0 0 0 0 IR-PCI-MSI 3678215-edge xhci_hcd 56: 22 0 0 0 0 0 0 0 IR-PCI-MSI 524288-edge rtsx_pci 58: 0 37 0 0 0 0 0 0 IR-PCI-MSI 1572864-edge nvme0q0 59: 3838 0 0 0 0 0 0 0 IR-PCI-MSI 1572865-edge nvme0q1 60: 0 2036 0 0 0 0 0 0 IR-PCI-MSI 1572866-edge nvme0q2 61: 0 0 3525 0 0 0 0 0 IR-PCI-MSI 1572867-edge nvme0q3 62: 0 0 0 5013 0 0 0 0 IR-PCI-MSI 1572868-edge nvme0q4 63: 0 0 0 0 3025 0 0 0 IR-PCI-MSI 1572869-edge nvme0q5 64: 0 0 0 0 0 2271 0 0 IR-PCI-MSI 1572870-edge nvme0q6 65: 0 0 0 0 0 0 3948 0 IR-PCI-MSI 1572871-edge nvme0q7 66: 0 0 0 0 0 0 0 2094 IR-PCI-MSI 1572872-edge nvme0q8 67: 0 0 0 0 0 0 0 0 IR-PCI-MSI 1572873-edge nvme0q9 68: 0 0 0 0 0 0 0 0 IR-PCI-MSI 1572874-edge nvme0q10 69: 0 0 0 0 0 0 0 0 IR-PCI-MSI 1572875-edge nvme0q11 70: 0 0 0 0 0 0 0 0 IR-PCI-MSI 1572876-edge nvme0q12 71: 0 0 0 0 0 0 0 0 IR-PCI-MSI 1572877-edge nvme0q13 72: 0 0 0 0 0 0 0 0 IR-PCI-MSI 1572878-edge nvme0q14 73: 0 0 0 0 0 0 0 0 IR-PCI-MSI 1572879-edge nvme0q15 74: 0 0 0 0 0 0 0 0 IR-PCI-MSI 1572880-edge nvme0q16 75: 0 0 7598 0 0 0 0 0 IR-PCI-MSI 3670016-edge amdgpu 77: 0 0 0 0 0 0 0 0 IR-PCI-MSI 2097152-edge enp4s0f0 79: 0 0 0 0 0 0 0 0 IR-PCI-MSI 3145728-edge enp6s0 81: 0 0 0 527 0 0 0 0 IR-PCI-MSI 3672064-edge snd_hda_intel:card0 82: 0 0 0 0 930 0 0 0 IR-PCI-MSI 3682304-edge snd_hda_intel:card1 84: 0 0 0 0 0 15493 0 0 IR-PCI-MSI 1048576-edge r8822be NMI: 2 1 1 1 1 1 1 1 Non-maskable interrupts LOC: 55193 40080 52795 34289 48822 42298 57746 33306 Local timer interrupts SPU: 0 0 0 0 0 0 0 0 Spurious interrupts PMI: 2 1 1 1 1 1 1 1 Performance monitoring interrupts IWI: 15286 10090 14311 9249 13054 23194 13384 9842 IRQ work interrupts RTR: 0 0 0 0 0 0 0 0 APIC ICR read retries RES: 26829 14012 14311 8544 12130 6480 13649 6414 Rescheduling interrupts CAL: 15273 18572 16350 18090 14929 18234 17090 17644 Function call interrupts TLB: 5771 5218 5098 5248 5571 3619 8354 5405 TLB shootdowns TRM: 0 0 0 0 0 0 0 0 Thermal event interrupts THR: 0 0 0 0 0 0 0 0 Threshold APIC interrupts DFR: 0 0 0 0 0 0 0 0 Deferred Error APIC interrupts MCE: 0 0 0 0 0 0 0 0 Machine check exceptions MCP: 5 5 5 5 5 5 5 5 Machine check polls HYP: 0 0 0 0 0 0 0 0 Hypervisor callback interrupts HRE: 0 0 0 0 0 0 0 0 Hyper-V reenlightenment interrupts HVS: 0 0 0 0 0 0 0 0 Hyper-V stimer0 interrupts ERR: 0 MIS: 0 PIN: 0 0 0 0 0 0 0 0 Posted-interrupt notification event NPI: 0 0 0 0 0 0 0 0 Nested posted-interrupt event PIW: 0 0 0 0 0 0 0 0 Posted-interrupt wakeup event > I bet its a legacy IRQ and as that space starts at 48 (IRQ0) this should be > IRQ9 which is usually - DRUMROLL - the ACPI interrupt. > > The kernel clearly sets that up to be delivered to CPU 0 only, but I've > seen that before that the BIOS value add thinks that this setup is not > relevant. > > /me goes off and sings LALALA Regards, Hans ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: False positive "do_IRQ: #.55 No irq handler for vector" messages on AMD ryzen based laptops @ 2021-01-09 5:50 Christopher William Snowhill 0 siblings, 0 replies; 16+ messages in thread From: Christopher William Snowhill @ 2021-01-09 5:50 UTC (permalink / raw) To: linux-kernel Replying to https://lkml.org/lkml/2019/2/19/516 from yes, 2019. My MSI B450 Tomahawk is exhibiting this bug now that I've updated the firmware to the latest beta BIOS with AGESA 1.1.0.0 patch D. ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2021-01-09 5:51 UTC | newest] Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-02-19 15:53 False positive "do_IRQ: #.55 No irq handler for vector" messages on AMD ryzen based laptops Hans de Goede 2019-02-19 21:01 ` Thomas Gleixner 2019-02-19 21:47 ` Lendacky, Thomas 2019-02-21 12:30 ` Hans de Goede 2019-03-03 10:57 ` Hans de Goede 2019-03-05 14:06 ` Lendacky, Thomas 2019-03-05 16:02 ` Hans de Goede 2019-03-05 19:19 ` Hans de Goede 2019-03-05 19:31 ` Lendacky, Thomas 2019-03-05 19:40 ` Hans de Goede 2019-03-05 19:54 ` Borislav Petkov 2019-03-06 8:41 ` Hans de Goede 2019-03-06 10:14 ` Thomas Gleixner 2019-03-07 11:20 ` Hans de Goede 2019-02-21 12:28 ` Hans de Goede 2021-01-09 5:50 Christopher William Snowhill
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).