Hi, On 1/3/22 13:34, Jarkko Nikula wrote: > Hi > > On 1/3/22 12:42, Hans de Goede wrote: >> Hi Jarkko, >> >> On 1/3/22 10:42, Jarkko Nikula wrote: >>> Hi >>> >>> We have a Braswell based preproduction HW. I noticed linux-next tag next-20211224 doesn't boot on it due to following error: >>> >>> [   34.674271] cherryview-pinctrl INT33FF:00: interrupt on unused interrupt line 0 >>> [   34.682476] cherryview-pinctrl INT33FF:00: interrupt on unused interrupt line 0 >>> [   34.690681] cherryview-pinctrl INT33FF:00: interrupt on unused interrupt line 0 >>> ... >>> >>> Linux v5.16-rc8 is ok. I found the following commit to be reason for the regression: >>> >>> bdfbef2d29dc ("pinctrl: cherryview: Don't use selection 0 to mark an interrupt line as unused") >> >> Thank you for the timely headsup about this. >> >> I assume that you have tried a revert (if necessary including reverting some >> of the follow ups) and that reverting the patch you point to fixes the >> issue, right ? >> > Yes since linux-next has only these three commits below to pinctrl-cherryview.c that are not in v5.16-rc8 I reverted them one by one. I often try these kind of simple steps before going to more labor git bisect :-) > > db1b2a8caf5b pinctrl: cherryview: Use temporary variable for struct device > 07199dbf8cae pinctrl: cherryview: Do not allow the same interrupt line to be used by 2 pins > bdfbef2d29dc pinctrl: cherryview: Don't use selection 0 to mark an interrupt line as unused > > I also double checked by checking out to bdfbef2d29dc and tested that commit and it reverted. > >> Can you try the 2 attached patches *one at a time* ? : >> >> 1. Restores the old behavior of just triggering hwirq 0 of >> the pincontroller-domain when we don't know the mapping >> > Patch 1 does work and here's the output from modified error print: > > [   13.550781] cherryview-pinctrl INT33FF:00: interrupt on unmapped interrupt line 0 > > If you want to go with patch 1 you may add my > Tested-by: Jarkko Nikula > >> 2. Restores the old behavior which allows chv_gpio_irq_startup() >> to overwrite the interrupt-line to pin mapping if the current >> mapping points to pin 0 >> > Patch 2 alone doesn't: > > [   24.977116] cherryview-pinctrl INT33FF:00: interrupt on unused interrupt line 0 > [   24.985314] cherryview-pinctrl INT33FF:00: interrupt on unused interrupt line 0 > [   24.993513] cherryview-pinctrl INT33FF:00: interrupt on unused interrupt line 0 > ... Ok, that is good news actually, I was already hoping that patch 1 would fix things. >> Both of these restore old behavior caused by a mapping-table >> entry containing 0 meaning both unset as well as HWIRQ0 >> before the patch in question. >> >> If applying them one at a time does not help, please also try with >> both applied. >> >> These 2 patches should apply cleanly on top of linux-next. >> >> If patch 2 fixes things it would be interesting if you can grab a >> dmesg with "pinctrl-cherryview.dyndbg" added to the command line >> (with the patched kernel). >> > With both applied it does work: > > # dmesg |grep cherryview-pinctrl > [   15.465425] cherryview-pinctrl INT33FF:00: interrupt on unmapped interrupt line 0 > [   16.562282] cherryview-pinctrl INT33FF:03: using interrupt line 0 for pin 81 > [   17.824905] cherryview-pinctrl INT33FF:02: using interrupt line 0 for pin 22 > [   17.835757] cherryview-pinctrl INT33FF:03: using interrupt line 2 for pin 77 > [   17.850170] cherryview-pinctrl INT33FF:00: using interrupt line 0 for pin 76 Oh, that is actually interesting, this is a per gpio controller thing, so if we filter on the controller then we end up with: [ 15.465425] cherryview-pinctrl INT33FF:00: interrupt on unmapped interrupt line 0 [ 17.850170] cherryview-pinctrl INT33FF:00: using interrupt line 0 for pin 76 So we do eventually get an IRQ request for a pin using the GPIO controller internal interrupt-line 0. But the IRQ triggers at least once before then and even though we haven't set a handler yet, calling generic_handle_irq for the GPIO-chips irqdomain, offset 0 IRQ does manage to silence the interrupt. I've been tracing this through the kernel code and AFAICT we end up in: arch/x86/kernel/irq.c: ack_bad_irq() in this case: Which means that this message should show up in dmesg: if (printk_ratelimit()) pr_err("unexpected IRQ trap at vector %02x\n", irq); Can you confirm this? Also can you share the full dmesg output of this device (with both patches, with dyndbg option) ? ### Note what we are seeing here is basically a spurious IRQ. Except on a few known broken devices the cherryview pinctrl code relies on the BIOS having configured things so that there are no spurious IRQs. I've attached a quick hack which activates the workaround for known broken devices unconditionally. This replace my previous 2 patches. I expect this to fix things too. If you can make some time to give this one a test too that would be great, then we have some options on how to fix this :) Regards, Hans