* [PATCH] gpio: pca953x: Survive spurious interrupts @ 2020-10-05 14:02 Marc Zyngier 2020-10-07 9:48 ` Linus Walleij 0 siblings, 1 reply; 9+ messages in thread From: Marc Zyngier @ 2020-10-05 14:02 UTC (permalink / raw) To: linux-gpio, linux-kernel; +Cc: Linus Walleij, Bartosz Golaszewski, kernel-team The pca953x driver never checks the result of irq_find_mapping(), which returns 0 when no mapping is found. When a spurious interrupt is delivered (which can happen under obscure circumstances), the kernel explodes as it still tries to handle the error code as a real interrupt. Handle this particular case and warn on spurious interrupts. Signed-off-by: Marc Zyngier <maz@kernel.org> --- drivers/gpio/gpio-pca953x.c | 17 +++++++++++++++-- 1 file changed, 15 insertions(+), 2 deletions(-) diff --git a/drivers/gpio/gpio-pca953x.c b/drivers/gpio/gpio-pca953x.c index fb61f2fc6ed7..c2d6121c48c9 100644 --- a/drivers/gpio/gpio-pca953x.c +++ b/drivers/gpio/gpio-pca953x.c @@ -824,8 +824,21 @@ static irqreturn_t pca953x_irq_handler(int irq, void *devid) ret = pca953x_irq_pending(chip, pending); mutex_unlock(&chip->i2c_lock); - for_each_set_bit(level, pending, gc->ngpio) - handle_nested_irq(irq_find_mapping(gc->irq.domain, level)); + if (ret) { + ret = 0; + + for_each_set_bit(level, pending, gc->ngpio) { + int nested_irq = irq_find_mapping(gc->irq.domain, level); + + if (unlikely(nested_irq <= 0)) { + dev_warn_ratelimited(gc->parent, "unmapped interrupt %d\n", level); + continue; + } + + handle_nested_irq(nested_irq); + ret = 1; + } + } return IRQ_RETVAL(ret); } -- 2.28.0 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH] gpio: pca953x: Survive spurious interrupts 2020-10-05 14:02 [PATCH] gpio: pca953x: Survive spurious interrupts Marc Zyngier @ 2020-10-07 9:48 ` Linus Walleij 2020-10-07 12:02 ` Andy Shevchenko 0 siblings, 1 reply; 9+ messages in thread From: Linus Walleij @ 2020-10-07 9:48 UTC (permalink / raw) To: Marc Zyngier Cc: open list:GPIO SUBSYSTEM, linux-kernel, Bartosz Golaszewski, kernel-team On Mon, Oct 5, 2020 at 4:02 PM Marc Zyngier <maz@kernel.org> wrote: > The pca953x driver never checks the result of irq_find_mapping(), > which returns 0 when no mapping is found. When a spurious interrupt > is delivered (which can happen under obscure circumstances), the > kernel explodes as it still tries to handle the error code as > a real interrupt. > > Handle this particular case and warn on spurious interrupts. > > Signed-off-by: Marc Zyngier <maz@kernel.org> Patch applied for fixes. Yours, Linus Walleij ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] gpio: pca953x: Survive spurious interrupts 2020-10-07 9:48 ` Linus Walleij @ 2020-10-07 12:02 ` Andy Shevchenko 2020-10-07 12:09 ` Marc Zyngier 0 siblings, 1 reply; 9+ messages in thread From: Andy Shevchenko @ 2020-10-07 12:02 UTC (permalink / raw) To: Linus Walleij Cc: Marc Zyngier, open list:GPIO SUBSYSTEM, linux-kernel, Bartosz Golaszewski, kernel-team On Wed, Oct 7, 2020 at 12:49 PM Linus Walleij <linus.walleij@linaro.org> wrote: > > On Mon, Oct 5, 2020 at 4:02 PM Marc Zyngier <maz@kernel.org> wrote: > > > The pca953x driver never checks the result of irq_find_mapping(), > > which returns 0 when no mapping is found. When a spurious interrupt > > is delivered (which can happen under obscure circumstances), the > > kernel explodes as it still tries to handle the error code as > > a real interrupt. > > > > Handle this particular case and warn on spurious interrupts. > > > > Signed-off-by: Marc Zyngier <maz@kernel.org> Wait, doesn't actually [1] fix the reported issue? Marc, can you confirm this? [1]: e43c26e12dd4 ("gpio: pca953x: Fix uninitialized pending variable") -- With Best Regards, Andy Shevchenko ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] gpio: pca953x: Survive spurious interrupts 2020-10-07 12:02 ` Andy Shevchenko @ 2020-10-07 12:09 ` Marc Zyngier 2020-10-07 13:10 ` Andy Shevchenko 0 siblings, 1 reply; 9+ messages in thread From: Marc Zyngier @ 2020-10-07 12:09 UTC (permalink / raw) To: Andy Shevchenko Cc: Linus Walleij, open list:GPIO SUBSYSTEM, linux-kernel, Bartosz Golaszewski, kernel-team On 2020-10-07 13:02, Andy Shevchenko wrote: > On Wed, Oct 7, 2020 at 12:49 PM Linus Walleij > <linus.walleij@linaro.org> wrote: >> >> On Mon, Oct 5, 2020 at 4:02 PM Marc Zyngier <maz@kernel.org> wrote: >> >> > The pca953x driver never checks the result of irq_find_mapping(), >> > which returns 0 when no mapping is found. When a spurious interrupt >> > is delivered (which can happen under obscure circumstances), the >> > kernel explodes as it still tries to handle the error code as >> > a real interrupt. >> > >> > Handle this particular case and warn on spurious interrupts. >> > >> > Signed-off-by: Marc Zyngier <maz@kernel.org> > > Wait, doesn't actually [1] fix the reported issue? Not at all. > Marc, can you confirm this? > > [1]: e43c26e12dd4 ("gpio: pca953x: Fix uninitialized pending variable") Different bug, really. If an interrupt is *really* pending, and no mapping established yet, feeding the result of irq_find_mapping() to handle_nested_irq() will lead to a panic. Recently seen on a Tegra system suffering from even more pathological bugs. M. -- Jazz is not dead. It just smells funny... ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] gpio: pca953x: Survive spurious interrupts 2020-10-07 12:09 ` Marc Zyngier @ 2020-10-07 13:10 ` Andy Shevchenko 2020-10-07 13:20 ` Marc Zyngier 0 siblings, 1 reply; 9+ messages in thread From: Andy Shevchenko @ 2020-10-07 13:10 UTC (permalink / raw) To: Marc Zyngier Cc: Linus Walleij, open list:GPIO SUBSYSTEM, Linux Kernel Mailing List, Bartosz Golaszewski, kernel-team On Wed, Oct 7, 2020 at 3:09 PM Marc Zyngier <maz@kernel.org> wrote: > On 2020-10-07 13:02, Andy Shevchenko wrote: > > On Wed, Oct 7, 2020 at 12:49 PM Linus Walleij > > <linus.walleij@linaro.org> wrote: > >> On Mon, Oct 5, 2020 at 4:02 PM Marc Zyngier <maz@kernel.org> wrote: > >> > >> > The pca953x driver never checks the result of irq_find_mapping(), > >> > which returns 0 when no mapping is found. When a spurious interrupt > >> > is delivered (which can happen under obscure circumstances), the > >> > kernel explodes as it still tries to handle the error code as > >> > a real interrupt. > >> > > >> > Handle this particular case and warn on spurious interrupts. > >> > > >> > Signed-off-by: Marc Zyngier <maz@kernel.org> > > > > Wait, doesn't actually [1] fix the reported issue? > > Not at all. > > > Marc, can you confirm this? > > > > [1]: e43c26e12dd4 ("gpio: pca953x: Fix uninitialized pending variable") > > Different bug, really. If an interrupt is *really* pending, and no > mapping established yet, feeding the result of irq_find_mapping() to > handle_nested_irq() will lead to a panic. I don't understand. We have plenty of drivers doing exactly the way without checking this returned code. What circumstances makes the mapping be absent? Shouldn't we rather change this: girq->handler = handle_simple_irq; to this: girq->handler = handle_bad_irq; ? > Recently seen on a Tegra system suffering from even more pathological > bugs. -- With Best Regards, Andy Shevchenko ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] gpio: pca953x: Survive spurious interrupts 2020-10-07 13:10 ` Andy Shevchenko @ 2020-10-07 13:20 ` Marc Zyngier 2020-10-07 14:03 ` Andy Shevchenko 0 siblings, 1 reply; 9+ messages in thread From: Marc Zyngier @ 2020-10-07 13:20 UTC (permalink / raw) To: Andy Shevchenko Cc: Linus Walleij, open list:GPIO SUBSYSTEM, Linux Kernel Mailing List, Bartosz Golaszewski, kernel-team On 2020-10-07 14:10, Andy Shevchenko wrote: > On Wed, Oct 7, 2020 at 3:09 PM Marc Zyngier <maz@kernel.org> wrote: >> On 2020-10-07 13:02, Andy Shevchenko wrote: >> > On Wed, Oct 7, 2020 at 12:49 PM Linus Walleij >> > <linus.walleij@linaro.org> wrote: >> >> On Mon, Oct 5, 2020 at 4:02 PM Marc Zyngier <maz@kernel.org> wrote: >> >> >> >> > The pca953x driver never checks the result of irq_find_mapping(), >> >> > which returns 0 when no mapping is found. When a spurious interrupt >> >> > is delivered (which can happen under obscure circumstances), the >> >> > kernel explodes as it still tries to handle the error code as >> >> > a real interrupt. >> >> > >> >> > Handle this particular case and warn on spurious interrupts. >> >> > >> >> > Signed-off-by: Marc Zyngier <maz@kernel.org> >> > >> > Wait, doesn't actually [1] fix the reported issue? >> >> Not at all. >> >> > Marc, can you confirm this? >> > >> > [1]: e43c26e12dd4 ("gpio: pca953x: Fix uninitialized pending variable") >> >> Different bug, really. If an interrupt is *really* pending, and no >> mapping established yet, feeding the result of irq_find_mapping() to >> handle_nested_irq() will lead to a panic. > > I don't understand. We have plenty of drivers doing exactly the way > without checking this returned code. I'm sure we do. Most driver code is buggy as hell, but I don't see that as a reason to cargo-cult the crap. The API is crystal clear that it can return 0 for no mapping, and 0 isn't a valid interrupt. > What circumstances makes the mapping be absent? Other bugs in the system ([1]), spurious interrupts (which can *always* happen). > Shouldn't we rather change this: > > girq->handler = handle_simple_irq; > to this: > girq->handler = handle_bad_irq; > ? I don't understand what you are trying to achieve with that, apart from maybe breaking the driver. The right way to handle spurious interrupts is by telling the core code that the interrupt wasn't handled, and to let the spurious interrupt code do its magic. M. [1] https://lore.kernel.org/r/20201005111443.1390096-1-maz@kernel.org -- Jazz is not dead. It just smells funny... ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] gpio: pca953x: Survive spurious interrupts 2020-10-07 13:20 ` Marc Zyngier @ 2020-10-07 14:03 ` Andy Shevchenko 2020-10-07 15:00 ` Marc Zyngier 0 siblings, 1 reply; 9+ messages in thread From: Andy Shevchenko @ 2020-10-07 14:03 UTC (permalink / raw) To: Marc Zyngier Cc: Linus Walleij, open list:GPIO SUBSYSTEM, Linux Kernel Mailing List, Bartosz Golaszewski, kernel-team On Wed, Oct 7, 2020 at 4:20 PM Marc Zyngier <maz@kernel.org> wrote: > On 2020-10-07 14:10, Andy Shevchenko wrote: > > On Wed, Oct 7, 2020 at 3:09 PM Marc Zyngier <maz@kernel.org> wrote: > >> On 2020-10-07 13:02, Andy Shevchenko wrote: > >> > On Wed, Oct 7, 2020 at 12:49 PM Linus Walleij > >> > <linus.walleij@linaro.org> wrote: > >> >> On Mon, Oct 5, 2020 at 4:02 PM Marc Zyngier <maz@kernel.org> wrote: > >> >> > >> >> > The pca953x driver never checks the result of irq_find_mapping(), > >> >> > which returns 0 when no mapping is found. When a spurious interrupt > >> >> > is delivered (which can happen under obscure circumstances), the > >> >> > kernel explodes as it still tries to handle the error code as > >> >> > a real interrupt. > >> >> > > >> >> > Handle this particular case and warn on spurious interrupts. > >> >> > > >> >> > Signed-off-by: Marc Zyngier <maz@kernel.org> > >> > > >> > Wait, doesn't actually [1] fix the reported issue? > >> > >> Not at all. > >> > >> > Marc, can you confirm this? > >> > > >> > [1]: e43c26e12dd4 ("gpio: pca953x: Fix uninitialized pending variable") > >> > >> Different bug, really. If an interrupt is *really* pending, and no > >> mapping established yet, feeding the result of irq_find_mapping() to > >> handle_nested_irq() will lead to a panic. > > > > I don't understand. We have plenty of drivers doing exactly the way > > without checking this returned code. > > I'm sure we do. Most driver code is buggy as hell, but I don't see that > as a reason to cargo-cult the crap. The API is crystal clear that it can > return 0 for no mapping, and 0 isn't a valid interrupt. Yes, and the problem here is that we got this response from IRQ core, which we shouldn't. > > What circumstances makes the mapping be absent? > > Other bugs in the system ([1]), spurious interrupts (which can *always* > happen). > > > Shouldn't we rather change this: > > > > girq->handler = handle_simple_irq; > > to this: > > girq->handler = handle_bad_irq; > > ? > > I don't understand what you are trying to achieve with that, apart from > maybe breaking the driver. The right way to handle spurious interrupts > is by telling the core code that the interrupt wasn't handled, and to > let > the spurious interrupt code do its magic. handle_bad_irq() is exactly for handling spurious IRQs as far as we believe documentation. So, by default the driver assigns (should assign) handle_bad_irq() to all IRQs as a default handler. If, by any chance, we got it, we already have a proper handler in place. The read handler is assigned whenever the IRQ core is called to register it (by means of ->irq_set_type() callback). My understanding that GPIO IRQ drivers are designed (should be designed) in this way. The approach will make us sure that we don't have spurious interrupts with assigned handlers. > [1] https://lore.kernel.org/r/20201005111443.1390096-1-maz@kernel.org -- With Best Regards, Andy Shevchenko ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] gpio: pca953x: Survive spurious interrupts 2020-10-07 14:03 ` Andy Shevchenko @ 2020-10-07 15:00 ` Marc Zyngier 2020-10-07 15:43 ` Andy Shevchenko 0 siblings, 1 reply; 9+ messages in thread From: Marc Zyngier @ 2020-10-07 15:00 UTC (permalink / raw) To: Andy Shevchenko Cc: Linus Walleij, open list:GPIO SUBSYSTEM, Linux Kernel Mailing List, Bartosz Golaszewski, kernel-team On 2020-10-07 15:03, Andy Shevchenko wrote: > On Wed, Oct 7, 2020 at 4:20 PM Marc Zyngier <maz@kernel.org> wrote: >> On 2020-10-07 14:10, Andy Shevchenko wrote: >> > On Wed, Oct 7, 2020 at 3:09 PM Marc Zyngier <maz@kernel.org> wrote: >> >> On 2020-10-07 13:02, Andy Shevchenko wrote: >> >> > On Wed, Oct 7, 2020 at 12:49 PM Linus Walleij >> >> > <linus.walleij@linaro.org> wrote: >> >> >> On Mon, Oct 5, 2020 at 4:02 PM Marc Zyngier <maz@kernel.org> wrote: >> >> >> >> >> >> > The pca953x driver never checks the result of irq_find_mapping(), >> >> >> > which returns 0 when no mapping is found. When a spurious interrupt >> >> >> > is delivered (which can happen under obscure circumstances), the >> >> >> > kernel explodes as it still tries to handle the error code as >> >> >> > a real interrupt. >> >> >> > >> >> >> > Handle this particular case and warn on spurious interrupts. >> >> >> > >> >> >> > Signed-off-by: Marc Zyngier <maz@kernel.org> >> >> > >> >> > Wait, doesn't actually [1] fix the reported issue? >> >> >> >> Not at all. >> >> >> >> > Marc, can you confirm this? >> >> > >> >> > [1]: e43c26e12dd4 ("gpio: pca953x: Fix uninitialized pending variable") >> >> >> >> Different bug, really. If an interrupt is *really* pending, and no >> >> mapping established yet, feeding the result of irq_find_mapping() to >> >> handle_nested_irq() will lead to a panic. >> > >> > I don't understand. We have plenty of drivers doing exactly the way >> > without checking this returned code. >> >> I'm sure we do. Most driver code is buggy as hell, but I don't see >> that >> as a reason to cargo-cult the crap. The API is crystal clear that it >> can >> return 0 for no mapping, and 0 isn't a valid interrupt. > > Yes, and the problem here is that we got this response from IRQ core, > which we shouldn't. What do you mean? There is no mapping at all. and all the core code can tell you is exactly that. If you think that using an error code as a valid input to another function is OK, we have a much bigger problem. > >> > What circumstances makes the mapping be absent? >> >> Other bugs in the system ([1]), spurious interrupts (which can >> *always* >> happen). >> >> > Shouldn't we rather change this: >> > >> > girq->handler = handle_simple_irq; >> > to this: >> > girq->handler = handle_bad_irq; >> > ? >> >> I don't understand what you are trying to achieve with that, apart >> from >> maybe breaking the driver. The right way to handle spurious interrupts >> is by telling the core code that the interrupt wasn't handled, and to >> let >> the spurious interrupt code do its magic. > > handle_bad_irq() is exactly for handling spurious IRQs as far as we > believe documentation. So, by default the driver assigns (should > assign) handle_bad_irq() to all IRQs as a default handler. If, by any > chance, we got it, we already have a proper handler in place. The read > handler is assigned whenever the IRQ core is called to register it (by > means of ->irq_set_type() callback). My understanding that GPIO IRQ > drivers are designed (should be designed) in this way. The approach > will make us sure that we don't have spurious interrupts with assigned > handlers. I can't see how setting this to anything else can work, given that handle_nested_irq() knows nothing about this flow (it doesn't use any). M. -- Jazz is not dead. It just smells funny... ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] gpio: pca953x: Survive spurious interrupts 2020-10-07 15:00 ` Marc Zyngier @ 2020-10-07 15:43 ` Andy Shevchenko 0 siblings, 0 replies; 9+ messages in thread From: Andy Shevchenko @ 2020-10-07 15:43 UTC (permalink / raw) To: Marc Zyngier Cc: Linus Walleij, open list:GPIO SUBSYSTEM, Linux Kernel Mailing List, Bartosz Golaszewski, kernel-team On Wed, Oct 7, 2020 at 6:00 PM Marc Zyngier <maz@kernel.org> wrote: > On 2020-10-07 15:03, Andy Shevchenko wrote: > > On Wed, Oct 7, 2020 at 4:20 PM Marc Zyngier <maz@kernel.org> wrote: > >> On 2020-10-07 14:10, Andy Shevchenko wrote: > >> > On Wed, Oct 7, 2020 at 3:09 PM Marc Zyngier <maz@kernel.org> wrote: > >> >> On 2020-10-07 13:02, Andy Shevchenko wrote: > >> >> > On Wed, Oct 7, 2020 at 12:49 PM Linus Walleij > >> >> > <linus.walleij@linaro.org> wrote: > >> >> >> On Mon, Oct 5, 2020 at 4:02 PM Marc Zyngier <maz@kernel.org> wrote: > >> >> >> > >> >> >> > The pca953x driver never checks the result of irq_find_mapping(), > >> >> >> > which returns 0 when no mapping is found. When a spurious interrupt > >> >> >> > is delivered (which can happen under obscure circumstances), the > >> >> >> > kernel explodes as it still tries to handle the error code as > >> >> >> > a real interrupt. > >> >> >> > > >> >> >> > Handle this particular case and warn on spurious interrupts. > >> >> >> > > >> >> >> > Signed-off-by: Marc Zyngier <maz@kernel.org> > >> >> > > >> >> > Wait, doesn't actually [1] fix the reported issue? > >> >> > >> >> Not at all. > >> >> > >> >> > Marc, can you confirm this? > >> >> > > >> >> > [1]: e43c26e12dd4 ("gpio: pca953x: Fix uninitialized pending variable") > >> >> > >> >> Different bug, really. If an interrupt is *really* pending, and no > >> >> mapping established yet, feeding the result of irq_find_mapping() to > >> >> handle_nested_irq() will lead to a panic. > >> > > >> > I don't understand. We have plenty of drivers doing exactly the way > >> > without checking this returned code. > >> > >> I'm sure we do. Most driver code is buggy as hell, but I don't see > >> that > >> as a reason to cargo-cult the crap. The API is crystal clear that it > >> can > >> return 0 for no mapping, and 0 isn't a valid interrupt. > > > > Yes, and the problem here is that we got this response from IRQ core, > > which we shouldn't. > > What do you mean? There is no mapping at all. and all the core code > can tell you is exactly that. If you think that using an error code > as a valid input to another function is OK, we have a much bigger > problem. Of course it's not okay. And that's what puzzles me. We shouldn't get bit set in pending if there is no requested IRQ (handler assigned). I think there is a bug indeed, but I'm not sure it is in the code you are patching. Rather in the code when we are preparing a pending bitmap. Shouldn't we have unused (unassigned interrupts) being masked in the first place? I can imagine that we have the chip preconfigured by firmware and when ->probe() happens the enabled IRQs should be left untouched, but is it the case? I guess you are using a non-latched version of the GPIO expander (I don't have such for a test). I need to look at this closer... Since Linus already applied this we will live with it now, but it would be really helpful if you may dump the traces of non-working case before this patch to analyze (I would like to see all regmap IO for this chip). > >> > What circumstances makes the mapping be absent? > >> > >> Other bugs in the system ([1]), spurious interrupts (which can > >> *always* > >> happen). > >> > >> > Shouldn't we rather change this: > >> > > >> > girq->handler = handle_simple_irq; > >> > to this: > >> > girq->handler = handle_bad_irq; > >> > ? > >> > >> I don't understand what you are trying to achieve with that, apart > >> from > >> maybe breaking the driver. The right way to handle spurious interrupts > >> is by telling the core code that the interrupt wasn't handled, and to > >> let > >> the spurious interrupt code do its magic. > > > > handle_bad_irq() is exactly for handling spurious IRQs as far as we > > believe documentation. So, by default the driver assigns (should > > assign) handle_bad_irq() to all IRQs as a default handler. If, by any > > chance, we got it, we already have a proper handler in place. The read > > handler is assigned whenever the IRQ core is called to register it (by > > means of ->irq_set_type() callback). My understanding that GPIO IRQ > > drivers are designed (should be designed) in this way. The approach > > will make us sure that we don't have spurious interrupts with assigned > > handlers. > > I can't see how setting this to anything else can work, given that > handle_nested_irq() knows nothing about this flow (it doesn't use > any). -- With Best Regards, Andy Shevchenko ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2020-10-07 15:42 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-10-05 14:02 [PATCH] gpio: pca953x: Survive spurious interrupts Marc Zyngier 2020-10-07 9:48 ` Linus Walleij 2020-10-07 12:02 ` Andy Shevchenko 2020-10-07 12:09 ` Marc Zyngier 2020-10-07 13:10 ` Andy Shevchenko 2020-10-07 13:20 ` Marc Zyngier 2020-10-07 14:03 ` Andy Shevchenko 2020-10-07 15:00 ` Marc Zyngier 2020-10-07 15:43 ` Andy Shevchenko
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).