* [BUG] genirq: Race condition in ONESHOT IRQ handler disabling IRQ forever @ 2012-02-06 8:14 =?utf-8?Q?Lothar_Wa=C3=9Fmann?= 2012-02-06 10:42 ` Lars-Peter Clausen 2012-02-07 9:03 ` Yong Zhang 0 siblings, 2 replies; 12+ messages in thread From: =?utf-8?Q?Lothar_Wa=C3=9Fmann?= @ 2012-02-06 8:14 UTC (permalink / raw) To: linux-kernel; +Cc: linux-arm-kernel, Thomas Gleixner Hi, I already sent this to <linux-kernel@vger.kernel.org> on Feb. 1, 2012 but did not get any response there. So resending to a wider audience with improved subject line: there is a race condition in the threaded IRQ handler code for oneshot interrupts that may lead to disabling an IRQ indefinitely. IRQs are masked before calling the hard-irq handler and are unmasked only after the soft-irq handler has been run. Thus if the hard-irq handler returns IRQ_HANDLED instead of IRQ_WAKE_THREAD, meaning the soft-irq will not be called, the interrupt will remain masked forever. This can happen due to a short pulse on the interrupt line, that triggers the interrupt logic, but goes undetected by the hard-irq handler. The problem can be reproduced with the TSC2007 touch controller driver that uses ONESHOT interrupts. The problem arises also with interrupt controllers that latch a level triggered IRQ until it is acknowledged (like the i.MX28 does). In this case the IRQ status bit will remain asserted after the soft-irq finishes and retrigger the interrupt while the interrupt line is already deasserted. The following patch would solve the problem, but I'm not sure whether it's the Right Thing(TM) to do. Especially wrt. shared interrupts. diff --git a/kernel/irq/handle.c b/kernel/irq/handle.c index 470d08c..93beadb 100644 --- a/kernel/irq/handle.c +++ b/kernel/irq/handle.c @@ -146,6 +146,11 @@ handle_irq_event_percpu(struct irq_desc *desc, struct irqaction *action) /* Fall through to add to randomness */ case IRQ_HANDLED: random |= action->flags; + /* unmask the IRQ that has been left masked + * due to race condition + */ + if (res == IRQ_HANDLED && (action->flags & IRQF_ONESHOT)) + unmask_irq(desc); break; default: Best regards, Lothar Wassmann -- ___________________________________________________________ Ka-Ro electronics GmbH | Pascalstraße 22 | D - 52076 Aachen Phone: +49 2408 1402-0 | Fax: +49 2408 1402-10 Geschäftsführer: Matthias Kaussen Handelsregistereintrag: Amtsgericht Aachen, HRB 4996 www.karo-electronics.de | info@karo-electronics.de ___________________________________________________________ ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [BUG] genirq: Race condition in ONESHOT IRQ handler disabling IRQ forever 2012-02-06 8:14 [BUG] genirq: Race condition in ONESHOT IRQ handler disabling IRQ forever =?utf-8?Q?Lothar_Wa=C3=9Fmann?= @ 2012-02-06 10:42 ` Lars-Peter Clausen 2012-02-07 9:03 ` Yong Zhang 1 sibling, 0 replies; 12+ messages in thread From: Lars-Peter Clausen @ 2012-02-06 10:42 UTC (permalink / raw) To: Lothar Waßmann; +Cc: linux-kernel, linux-arm-kernel, Thomas Gleixner On 02/06/2012 09:14 AM, =?utf-8?Q?Lothar_Wa=C3=9Fmann?= wrote: > Hi, > > I already sent this to <linux-kernel@vger.kernel.org> on Feb. 1, 2012 > but did not get any response there. So resending to a wider audience > with improved subject line: > > there is a race condition in the threaded IRQ handler code for oneshot > interrupts that may lead to disabling an IRQ indefinitely. IRQs are > masked before calling the hard-irq handler and are unmasked only after > the soft-irq handler has been run. Thus if the hard-irq handler > returns IRQ_HANDLED instead of IRQ_WAKE_THREAD, meaning the soft-irq > will not be called, the interrupt will remain masked forever. > > This can happen due to a short pulse on the interrupt line, that > triggers the interrupt logic, but goes undetected by the hard-irq > handler. The problem can be reproduced with the TSC2007 touch > controller driver that uses ONESHOT interrupts. > > The problem arises also with interrupt controllers that latch a level > triggered IRQ until it is acknowledged (like the i.MX28 does). > In this case the IRQ status bit will remain asserted after the > soft-irq finishes and retrigger the interrupt while the interrupt line > is already deasserted. > > The following patch would solve the problem, but I'm not sure whether > it's the Right Thing(TM) to do. Especially wrt. shared interrupts. > > diff --git a/kernel/irq/handle.c b/kernel/irq/handle.c > index 470d08c..93beadb 100644 > --- a/kernel/irq/handle.c > +++ b/kernel/irq/handle.c > @@ -146,6 +146,11 @@ handle_irq_event_percpu(struct irq_desc *desc, struct irqaction *action) > /* Fall through to add to randomness */ > case IRQ_HANDLED: > random |= action->flags; > + /* unmask the IRQ that has been left masked > + * due to race condition > + */ > + if (res == IRQ_HANDLED && (action->flags & IRQF_ONESHOT)) > + unmask_irq(desc); > break; > > default: I think a better fix is to check the return value of handle_irq_event in handle_level_irq and if the IRQ_WAKE_THREADED bit is not set unmask the irq. The same should probably also be done for handle_fasteoi_irq. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [BUG] genirq: Race condition in ONESHOT IRQ handler disabling IRQ forever 2012-02-06 8:14 [BUG] genirq: Race condition in ONESHOT IRQ handler disabling IRQ forever =?utf-8?Q?Lothar_Wa=C3=9Fmann?= 2012-02-06 10:42 ` Lars-Peter Clausen @ 2012-02-07 9:03 ` Yong Zhang 2012-02-07 10:01 ` Lothar Waßmann 1 sibling, 1 reply; 12+ messages in thread From: Yong Zhang @ 2012-02-07 9:03 UTC (permalink / raw) To: =?utf-8?Q?Lothar_Wa=C3=9Fmann?= Cc: linux-kernel, linux-arm-kernel, Thomas Gleixner On Mon, Feb 06, 2012 at 09:14:47AM +0100, =?utf-8?Q?Lothar_Wa=C3=9Fmann?= wrote: > Hi, > > I already sent this to <linux-kernel@vger.kernel.org> on Feb. 1, 2012 > but did not get any response there. So resending to a wider audience > with improved subject line: > > there is a race condition in the threaded IRQ handler code for oneshot > interrupts that may lead to disabling an IRQ indefinitely. IRQs are > masked before calling the hard-irq handler and are unmasked only after > the soft-irq handler has been run. Thus if the hard-irq handler > returns IRQ_HANDLED instead of IRQ_WAKE_THREAD, meaning the soft-irq > will not be called, the interrupt will remain masked forever. > > This can happen due to a short pulse on the interrupt line, that > triggers the interrupt logic, but goes undetected by the hard-irq > handler. The problem can be reproduced with the TSC2007 touch > controller driver that uses ONESHOT interrupts. Isn't it the responsibility of the driver (say TSC2007)? In this case, TSC2007 should return IRQ_WAKE_THREAD IMHO. Thanks, Yong > > The problem arises also with interrupt controllers that latch a level > triggered IRQ until it is acknowledged (like the i.MX28 does). > In this case the IRQ status bit will remain asserted after the > soft-irq finishes and retrigger the interrupt while the interrupt line > is already deasserted. > > The following patch would solve the problem, but I'm not sure whether > it's the Right Thing(TM) to do. Especially wrt. shared interrupts. > > diff --git a/kernel/irq/handle.c b/kernel/irq/handle.c > index 470d08c..93beadb 100644 > --- a/kernel/irq/handle.c > +++ b/kernel/irq/handle.c > @@ -146,6 +146,11 @@ handle_irq_event_percpu(struct irq_desc *desc, struct irqaction *action) > /* Fall through to add to randomness */ > case IRQ_HANDLED: > random |= action->flags; > + /* unmask the IRQ that has been left masked > + * due to race condition > + */ > + if (res == IRQ_HANDLED && (action->flags & IRQF_ONESHOT)) > + unmask_irq(desc); > break; > > default: > > Best regards, > > Lothar Wassmann > -- > ___________________________________________________________ > > Ka-Ro electronics GmbH | Pascalstraße 22 | D - 52076 Aachen > Phone: +49 2408 1402-0 | Fax: +49 2408 1402-10 > Geschäftsführer: Matthias Kaussen > Handelsregistereintrag: Amtsgericht Aachen, HRB 4996 > > www.karo-electronics.de | info@karo-electronics.de > ___________________________________________________________ > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- Only stand for myself ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [BUG] genirq: Race condition in ONESHOT IRQ handler disabling IRQ forever 2012-02-07 9:03 ` Yong Zhang @ 2012-02-07 10:01 ` Lothar Waßmann 2012-02-07 12:34 ` Yong Zhang 0 siblings, 1 reply; 12+ messages in thread From: Lothar Waßmann @ 2012-02-07 10:01 UTC (permalink / raw) To: Yong Zhang; +Cc: linux-kernel, linux-arm-kernel, Thomas Gleixner Hi, > On Mon, Feb 06, 2012 at 09:14:47AM +0100, Lothar Waßmann wrote: > > Hi, > > > > I already sent this to <linux-kernel@vger.kernel.org> on Feb. 1, 2012 > > but did not get any response there. So resending to a wider audience > > with improved subject line: > > > > there is a race condition in the threaded IRQ handler code for oneshot > > interrupts that may lead to disabling an IRQ indefinitely. IRQs are > > masked before calling the hard-irq handler and are unmasked only after > > the soft-irq handler has been run. Thus if the hard-irq handler > > returns IRQ_HANDLED instead of IRQ_WAKE_THREAD, meaning the soft-irq > > will not be called, the interrupt will remain masked forever. > > > > This can happen due to a short pulse on the interrupt line, that > > triggers the interrupt logic, but goes undetected by the hard-irq > > handler. The problem can be reproduced with the TSC2007 touch > > controller driver that uses ONESHOT interrupts. > > Isn't it the responsibility of the driver (say TSC2007)? > > In this case, TSC2007 should return IRQ_WAKE_THREAD IMHO. > That would mean it had to return IRQ_WAKE_THREAD unconditionally making the return code useless. And it would cause an extra useless loop through the softirq handler. Lothar Waßmann -- ___________________________________________________________ Ka-Ro electronics GmbH | Pascalstraße 22 | D - 52076 Aachen Phone: +49 2408 1402-0 | Fax: +49 2408 1402-10 Geschäftsführer: Matthias Kaussen Handelsregistereintrag: Amtsgericht Aachen, HRB 4996 www.karo-electronics.de | info@karo-electronics.de ___________________________________________________________ ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [BUG] genirq: Race condition in ONESHOT IRQ handler disabling IRQ forever 2012-02-07 10:01 ` Lothar Waßmann @ 2012-02-07 12:34 ` Yong Zhang 2012-02-07 12:52 ` Lothar Waßmann 0 siblings, 1 reply; 12+ messages in thread From: Yong Zhang @ 2012-02-07 12:34 UTC (permalink / raw) To: Lothar Waßmann; +Cc: linux-kernel, linux-arm-kernel, Thomas Gleixner On Tue, Feb 07, 2012 at 11:01:06AM +0100, Lothar Waßmann wrote: > Hi, > > > On Mon, Feb 06, 2012 at 09:14:47AM +0100, Lothar Waßmann wrote: > > > Hi, > > > > > > I already sent this to <linux-kernel@vger.kernel.org> on Feb. 1, 2012 > > > but did not get any response there. So resending to a wider audience > > > with improved subject line: > > > > > > there is a race condition in the threaded IRQ handler code for oneshot > > > interrupts that may lead to disabling an IRQ indefinitely. IRQs are > > > masked before calling the hard-irq handler and are unmasked only after > > > the soft-irq handler has been run. Thus if the hard-irq handler > > > returns IRQ_HANDLED instead of IRQ_WAKE_THREAD, meaning the soft-irq > > > will not be called, the interrupt will remain masked forever. > > > > > > This can happen due to a short pulse on the interrupt line, that > > > triggers the interrupt logic, but goes undetected by the hard-irq > > > handler. The problem can be reproduced with the TSC2007 touch > > > controller driver that uses ONESHOT interrupts. > > > > Isn't it the responsibility of the driver (say TSC2007)? > > > > In this case, TSC2007 should return IRQ_WAKE_THREAD IMHO. > > > That would mean it had to return IRQ_WAKE_THREAD unconditionally > making the return code useless. > And it would cause an extra useless loop through the softirq > handler. Yeah, it's the default behavior when we introduce 'theadirqs', and it's safe. You know in your patch unmask_irq() is called locklessly and it will introduce other race. Thanks, Yong ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [BUG] genirq: Race condition in ONESHOT IRQ handler disabling IRQ forever 2012-02-07 12:34 ` Yong Zhang @ 2012-02-07 12:52 ` Lothar Waßmann 2012-02-07 13:07 ` Lars-Peter Clausen 0 siblings, 1 reply; 12+ messages in thread From: Lothar Waßmann @ 2012-02-07 12:52 UTC (permalink / raw) To: Yong Zhang; +Cc: linux-kernel, linux-arm-kernel, Thomas Gleixner Hi, Yong Zhang writes: > On Tue, Feb 07, 2012 at 11:01:06AM +0100, Lothar Waßmann wrote: > > Hi, > > > > > On Mon, Feb 06, 2012 at 09:14:47AM +0100, Lothar Waßmann wrote: > > > > Hi, > > > > > > > > I already sent this to <linux-kernel@vger.kernel.org> on Feb. 1, 2012 > > > > but did not get any response there. So resending to a wider audience > > > > with improved subject line: > > > > > > > > there is a race condition in the threaded IRQ handler code for oneshot > > > > interrupts that may lead to disabling an IRQ indefinitely. IRQs are > > > > masked before calling the hard-irq handler and are unmasked only after > > > > the soft-irq handler has been run. Thus if the hard-irq handler > > > > returns IRQ_HANDLED instead of IRQ_WAKE_THREAD, meaning the soft-irq > > > > will not be called, the interrupt will remain masked forever. > > > > > > > > This can happen due to a short pulse on the interrupt line, that > > > > triggers the interrupt logic, but goes undetected by the hard-irq > > > > handler. The problem can be reproduced with the TSC2007 touch > > > > controller driver that uses ONESHOT interrupts. > > > > > > Isn't it the responsibility of the driver (say TSC2007)? > > > > > > In this case, TSC2007 should return IRQ_WAKE_THREAD IMHO. > > > > > That would mean it had to return IRQ_WAKE_THREAD unconditionally > > making the return code useless. > > And it would cause an extra useless loop through the softirq > > handler. > > Yeah, it's the default behavior when we introduce 'theadirqs', > and it's safe. > So, the correct solution would be to remove the check for IRQ_WAKE_THREAD in handle_irq_event_percpu() and always invoke the softirq handler? Note that this problem is not specific to the TSC2007 driver, but may occur with any hardware. Or maybe do the unmasking in handle_irq_event() as proposed by Lars-Peter Clausen in <4F2FAE93.5020205@metafoo.de>? Like that: diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c index f7c543a..fbf68c7 100644 --- a/kernel/irq/chip.c +++ b/kernel/irq/chip.c @@ -343,6 +343,8 @@ EXPORT_SYMBOL_GPL(handle_simple_irq); void handle_level_irq(unsigned int irq, struct irq_desc *desc) { + int ret; + raw_spin_lock(&desc->lock); mask_ack_irq(desc); @@ -360,10 +362,12 @@ handle_level_irq(unsigned int irq, struct irq_desc *desc) if (unlikely(!desc->action || irqd_irq_disabled(&desc->irq_data))) goto out_unlock; - handle_irq_event(desc); + ret = handle_irq_event(desc); - if (!irqd_irq_disabled(&desc->irq_data) && !(desc->istate & IRQS_ONESHOT)) + if (!irqd_irq_disabled(&desc->irq_data) && + (!(desc->istate & IRQS_ONESHOT) || ret != IRQ_WAKE_THREAD)) unmask_irq(desc); + out_unlock: raw_spin_unlock(&desc->lock); } Lothar Waßmann -- ___________________________________________________________ Ka-Ro electronics GmbH | Pascalstraße 22 | D - 52076 Aachen Phone: +49 2408 1402-0 | Fax: +49 2408 1402-10 Geschäftsführer: Matthias Kaussen Handelsregistereintrag: Amtsgericht Aachen, HRB 4996 www.karo-electronics.de | info@karo-electronics.de ___________________________________________________________ ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [BUG] genirq: Race condition in ONESHOT IRQ handler disabling IRQ forever 2012-02-07 12:52 ` Lothar Waßmann @ 2012-02-07 13:07 ` Lars-Peter Clausen 2012-02-07 13:38 ` [PATCH] genirq: Fix race condition in ONESHOT irq handler Lothar Waßmann 0 siblings, 1 reply; 12+ messages in thread From: Lars-Peter Clausen @ 2012-02-07 13:07 UTC (permalink / raw) To: Lothar Waßmann Cc: Yong Zhang, Thomas Gleixner, linux-kernel, linux-arm-kernel On 02/07/2012 01:52 PM, Lothar Waßmann wrote: > Hi, > > Yong Zhang writes: >> On Tue, Feb 07, 2012 at 11:01:06AM +0100, Lothar Waßmann wrote: >>> Hi, >>> >>>> On Mon, Feb 06, 2012 at 09:14:47AM +0100, Lothar Waßmann wrote: >>>>> Hi, >>>>> >>>>> I already sent this to <linux-kernel@vger.kernel.org> on Feb. 1, 2012 >>>>> but did not get any response there. So resending to a wider audience >>>>> with improved subject line: >>>>> >>>>> there is a race condition in the threaded IRQ handler code for oneshot >>>>> interrupts that may lead to disabling an IRQ indefinitely. IRQs are >>>>> masked before calling the hard-irq handler and are unmasked only after >>>>> the soft-irq handler has been run. Thus if the hard-irq handler >>>>> returns IRQ_HANDLED instead of IRQ_WAKE_THREAD, meaning the soft-irq >>>>> will not be called, the interrupt will remain masked forever. >>>>> >>>>> This can happen due to a short pulse on the interrupt line, that >>>>> triggers the interrupt logic, but goes undetected by the hard-irq >>>>> handler. The problem can be reproduced with the TSC2007 touch >>>>> controller driver that uses ONESHOT interrupts. >>>> >>>> Isn't it the responsibility of the driver (say TSC2007)? >>>> >>>> In this case, TSC2007 should return IRQ_WAKE_THREAD IMHO. >>>> >>> That would mean it had to return IRQ_WAKE_THREAD unconditionally >>> making the return code useless. >>> And it would cause an extra useless loop through the softirq >>> handler. >> >> Yeah, it's the default behavior when we introduce 'theadirqs', >> and it's safe. >> > So, the correct solution would be to remove the check for > IRQ_WAKE_THREAD in handle_irq_event_percpu() and always invoke the > softirq handler? > Note that this problem is not specific to the TSC2007 driver, but may > occur with any hardware. > > Or maybe do the unmasking in handle_irq_event() as proposed by > Lars-Peter Clausen in <4F2FAE93.5020205@metafoo.de>? > Like that: > diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c > index f7c543a..fbf68c7 100644 > --- a/kernel/irq/chip.c > +++ b/kernel/irq/chip.c > @@ -343,6 +343,8 @@ EXPORT_SYMBOL_GPL(handle_simple_irq); > void > handle_level_irq(unsigned int irq, struct irq_desc *desc) > { > + int ret; This should be irqreturn_t > + > raw_spin_lock(&desc->lock); > mask_ack_irq(desc); > > @@ -360,10 +362,12 @@ handle_level_irq(unsigned int irq, struct irq_desc *desc) > if (unlikely(!desc->action || irqd_irq_disabled(&desc->irq_data))) > goto out_unlock; > > - handle_irq_event(desc); > + ret = handle_irq_event(desc); > > - if (!irqd_irq_disabled(&desc->irq_data) && !(desc->istate & IRQS_ONESHOT)) > + if (!irqd_irq_disabled(&desc->irq_data) && > + (!(desc->istate & IRQS_ONESHOT) || ret != IRQ_WAKE_THREAD)) As I said, check for the bit, not for the value. This will ensure that will also work with shared interrupts. So something like this: !((desc->istate & IRQS_ONESHOT) && (ret & IRQ_WAKE_THREAD))) > unmask_irq(desc); > + > out_unlock: > raw_spin_unlock(&desc->lock); > } > > > Lothar Waßmann ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH] genirq: Fix race condition in ONESHOT irq handler 2012-02-07 13:07 ` Lars-Peter Clausen @ 2012-02-07 13:38 ` Lothar Waßmann 2012-02-07 17:03 ` Thomas Gleixner 0 siblings, 1 reply; 12+ messages in thread From: Lothar Waßmann @ 2012-02-07 13:38 UTC (permalink / raw) To: linux-kernel, linux-kernel Cc: Lothar Waßmann, Thomas Gleixner, Lars-Peter Clausen, Yong Zhang, linux-arm-kernel There is a race condition in the threaded IRQ handler code for oneshot interrupts that may lead to disabling an IRQ indefinitely. IRQs are masked before calling the hard-irq handler and are unmasked only after the soft-irq handler has been run. Thus if the hard-irq handler returns IRQ_HANDLED instead of IRQ_WAKE_THREAD, meaning the soft-irq will not be called, the interrupt will remain masked forever. This can happen due to a short pulse on the interrupt line, that triggers the interrupt logic, but goes undetected by the hard-irq handler. The problem can be reproduced with the TSC2007 touch controller driver that uses ONESHOT interrupts. The problem arises also with interrupt controllers that latch a level triggered IRQ until it is acknowledged (like the i.MX28 does). In this case the IRQ status bit will remain asserted after the soft-irq finishes and retrigger the interrupt while the interrupt line is already deasserted. Signed-off-by: Lothar Waßmann <LW@KARO-electronics.de> --- kernel/irq/chip.c | 9 +++++++-- 1 files changed, 7 insertions(+), 2 deletions(-) diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c index f7c543a..74fdef9 100644 --- a/kernel/irq/chip.c +++ b/kernel/irq/chip.c @@ -343,6 +343,8 @@ EXPORT_SYMBOL_GPL(handle_simple_irq); void handle_level_irq(unsigned int irq, struct irq_desc *desc) { + irqreturn_t ret; + raw_spin_lock(&desc->lock); mask_ack_irq(desc); @@ -360,10 +362,13 @@ handle_level_irq(unsigned int irq, struct irq_desc *desc) if (unlikely(!desc->action || irqd_irq_disabled(&desc->irq_data))) goto out_unlock; - handle_irq_event(desc); + ret = handle_irq_event(desc); - if (!irqd_irq_disabled(&desc->irq_data) && !(desc->istate & IRQS_ONESHOT)) + if (!irqd_irq_disabled(&desc->irq_data) && + (!(desc->istate & IRQS_ONESHOT) || + !(ret & IRQ_WAKE_THREAD))) unmask_irq(desc); + out_unlock: raw_spin_unlock(&desc->lock); } -- 1.7.2.5 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH] genirq: Fix race condition in ONESHOT irq handler 2012-02-07 13:38 ` [PATCH] genirq: Fix race condition in ONESHOT irq handler Lothar Waßmann @ 2012-02-07 17:03 ` Thomas Gleixner 2012-02-08 6:05 ` Lothar Waßmann 0 siblings, 1 reply; 12+ messages in thread From: Thomas Gleixner @ 2012-02-07 17:03 UTC (permalink / raw) To: Lothar Waßmann Cc: linux-kernel, Lars-Peter Clausen, Yong Zhang, linux-arm-kernel [-- Attachment #1: Type: TEXT/PLAIN, Size: 4460 bytes --] On Tue, 7 Feb 2012, Lothar Waßmann wrote: > There is a race condition in the threaded IRQ handler code for oneshot > interrupts that may lead to disabling an IRQ indefinitely. IRQs are > masked before calling the hard-irq handler and are unmasked only after > the soft-irq handler has been run. Thus if the hard-irq handler > returns IRQ_HANDLED instead of IRQ_WAKE_THREAD, meaning the soft-irq Well, oneshot mode interrupts always had the semantics that the threaded handler needs to run unconditionally. In fact the oneshot mode was implemented to handle hardware which cannot do anything in hard interrupt context to avoid the ugliness of a primary handler calling disable_irq_nosync(). So it looks like driver developers decided that the oneshot mode might be interesting with a primary handler as well. I can see the reason why the tsc2007 driver uses it, but that does not make it a bug in the core code in the first place. Though we should handle it and the problem not only arises with the IRQ_HANDLED return code, it also arises with IRQ_NONE. > will not be called, the interrupt will remain masked forever. > > This can happen due to a short pulse on the interrupt line, that > triggers the interrupt logic, but goes undetected by the hard-irq > handler. The problem can be reproduced with the TSC2007 touch > controller driver that uses ONESHOT interrupts. It should not return IRQ_HANDLED in that case, as the real thing is a spurious interrupt. > The problem arises also with interrupt controllers that latch a level > triggered IRQ until it is acknowledged (like the i.MX28 does). > In this case the IRQ status bit will remain asserted after the > soft-irq finishes and retrigger the interrupt while the interrupt line > is already deasserted. This does not make sense. We acknowledge interrupts via mask_ack_irq() right on entry of handle_level_irq(). So either the interrupt controller is completely hosed or this explanation is bogus. > Signed-off-by: Lothar Waßmann <LW@KARO-electronics.de> > --- > kernel/irq/chip.c | 9 +++++++-- > 1 files changed, 7 insertions(+), 2 deletions(-) > > diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c > index f7c543a..74fdef9 100644 > --- a/kernel/irq/chip.c > +++ b/kernel/irq/chip.c > @@ -343,6 +343,8 @@ EXPORT_SYMBOL_GPL(handle_simple_irq); > void > handle_level_irq(unsigned int irq, struct irq_desc *desc) > { > + irqreturn_t ret; > + > raw_spin_lock(&desc->lock); > mask_ack_irq(desc); > > @@ -360,10 +362,13 @@ handle_level_irq(unsigned int irq, struct irq_desc *desc) > if (unlikely(!desc->action || irqd_irq_disabled(&desc->irq_data))) > goto out_unlock; > > - handle_irq_event(desc); > + ret = handle_irq_event(desc); > > - if (!irqd_irq_disabled(&desc->irq_data) && !(desc->istate & IRQS_ONESHOT)) > + if (!irqd_irq_disabled(&desc->irq_data) && > + (!(desc->istate & IRQS_ONESHOT) || > + !(ret & IRQ_WAKE_THREAD))) Hmm, that looks ugly and it misses the same fixup for handle_fasteoi_irq() including proper comments. The following patch should address both cases. Thanks, tglx =================================================================== --- linux-3.2.orig/kernel/irq/chip.c +++ linux-3.2/kernel/irq/chip.c @@ -330,6 +330,24 @@ out_unlock: } EXPORT_SYMBOL_GPL(handle_simple_irq); +/* + * Called unconditionally from handle_level_irq() and only for oneshot + * interrupts from handle_fasteoi_irq() + */ +static void cond_unmask_irq(struct irq_desc *desc) +{ + /* + * We need to unmask in the following cases: + * - Standard level irq (IRQF_ONESHOT is not set) + * - Oneshot irq which did not wake the thread (caused by a + * spurious interrupt or a primary handler handling it + * completely). + */ + if (!irqd_irq_disabled(&desc->irq_data) && + irqd_irq_masked(&desc->irq_data) && !desc->threads_oneshot) + unmask_irq(desc); +} + /** * handle_level_irq - Level type irq handler * @irq: the interrupt number @@ -362,8 +380,8 @@ handle_level_irq(unsigned int irq, struc handle_irq_event(desc); - if (!irqd_irq_disabled(&desc->irq_data) && !(desc->istate & IRQS_ONESHOT)) - unmask_irq(desc); + cond_unmask_irq(desc); + out_unlock: raw_spin_unlock(&desc->lock); } @@ -417,6 +435,9 @@ handle_fasteoi_irq(unsigned int irq, str preflow_handler(desc); handle_irq_event(desc); + if (desc->istate & IRQS_ONESHOT) + cond_unmask_irq(desc); + out_eoi: desc->irq_data.chip->irq_eoi(&desc->irq_data); out_unlock: ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] genirq: Fix race condition in ONESHOT irq handler 2012-02-07 17:03 ` Thomas Gleixner @ 2012-02-08 6:05 ` Lothar Waßmann 2012-02-08 10:38 ` Thomas Gleixner 0 siblings, 1 reply; 12+ messages in thread From: Lothar Waßmann @ 2012-02-08 6:05 UTC (permalink / raw) To: Thomas Gleixner Cc: linux-kernel, Lars-Peter Clausen, Yong Zhang, linux-arm-kernel Hi, Thomas Gleixner writes: > On Tue, 7 Feb 2012, Lothar Waßmann wrote: > > > There is a race condition in the threaded IRQ handler code for oneshot > > interrupts that may lead to disabling an IRQ indefinitely. IRQs are > > masked before calling the hard-irq handler and are unmasked only after > > the soft-irq handler has been run. Thus if the hard-irq handler > > returns IRQ_HANDLED instead of IRQ_WAKE_THREAD, meaning the soft-irq > > Well, oneshot mode interrupts always had the semantics that the > threaded handler needs to run unconditionally. In fact the oneshot > mode was implemented to handle hardware which cannot do anything in > hard interrupt context to avoid the ugliness of a primary handler > calling disable_irq_nosync(). > > So it looks like driver developers decided that the oneshot mode might > be interesting with a primary handler as well. I can see the reason > why the tsc2007 driver uses it, but that does not make it a bug in the > core code in the first place. > Then maybe the core code should not check the return value of the primary handler for IRQ_WAKE_THREAD but call the secondary handler unconditionally for ONESHOT interrupts. Or it should be at least documented somewhere that primary handlers have to return IRQ_WAKE_THREAD in any case for oneshot interrupts. > > The problem arises also with interrupt controllers that latch a level > > triggered IRQ until it is acknowledged (like the i.MX28 does). > > In this case the IRQ status bit will remain asserted after the > > soft-irq finishes and retrigger the interrupt while the interrupt line > > is already deasserted. > > This does not make sense. We acknowledge interrupts via mask_ack_irq() > right on entry of handle_level_irq(). So either the interrupt > That's right. But at that point the IRQ line is still asserted and since it is a level IRQ this will not actually clear the interrupt status bit. Normally the IRQ status bit would self-clear when the IRQ line is being deasserted (in this case by removing the finger from the touch panel). But the i.MX28 leaves the IRQ status bit set and it takes another write to the IRQ status register to remove the bogus IRQ status. > controller is completely hosed or this explanation is bogus. > The first is the case. Lothar Waßmann -- ___________________________________________________________ Ka-Ro electronics GmbH | Pascalstraße 22 | D - 52076 Aachen Phone: +49 2408 1402-0 | Fax: +49 2408 1402-10 Geschäftsführer: Matthias Kaussen Handelsregistereintrag: Amtsgericht Aachen, HRB 4996 www.karo-electronics.de | info@karo-electronics.de ___________________________________________________________ ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] genirq: Fix race condition in ONESHOT irq handler 2012-02-08 6:05 ` Lothar Waßmann @ 2012-02-08 10:38 ` Thomas Gleixner 2012-02-09 8:40 ` Lothar Waßmann 0 siblings, 1 reply; 12+ messages in thread From: Thomas Gleixner @ 2012-02-08 10:38 UTC (permalink / raw) To: Lothar Waßmann Cc: linux-kernel, Lars-Peter Clausen, Yong Zhang, linux-arm-kernel [-- Attachment #1: Type: TEXT/PLAIN, Size: 1956 bytes --] On Wed, 8 Feb 2012, Lothar Waßmann wrote: > > So it looks like driver developers decided that the oneshot mode might > > be interesting with a primary handler as well. I can see the reason > > why the tsc2007 driver uses it, but that does not make it a bug in the > > core code in the first place. > > > Then maybe the core code should not check the return value > of the primary handler for IRQ_WAKE_THREAD but call the secondary > handler unconditionally for ONESHOT interrupts. > Or it should be at least documented somewhere that primary handlers > have to return IRQ_WAKE_THREAD in any case for oneshot interrupts. Well, you know how good we are with documentation :) > > > The problem arises also with interrupt controllers that latch a level > > > triggered IRQ until it is acknowledged (like the i.MX28 does). > > > In this case the IRQ status bit will remain asserted after the > > > soft-irq finishes and retrigger the interrupt while the interrupt line > > > is already deasserted. > > > > This does not make sense. We acknowledge interrupts via mask_ack_irq() > > right on entry of handle_level_irq(). So either the interrupt > > > That's right. But at that point the IRQ line is still asserted and > since it is a level IRQ this will not actually clear the interrupt > status bit. Normally the IRQ status bit would self-clear when the IRQ > line is being deasserted (in this case by removing the finger from the > touch panel). But the i.MX28 leaves the IRQ status bit set and it > takes another write to the IRQ status register to remove the bogus IRQ > status. So the question is whether the imx irq chip implementation should write to the status register on unmask for level type irqs to avoid spurious interrupts being generated in the first place. This is not only an optimization for threaded interrupts, afaict this spurious effect should happen with non threaded interrupts as well. Did my patch work for you ? Thanks, tglx ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] genirq: Fix race condition in ONESHOT irq handler 2012-02-08 10:38 ` Thomas Gleixner @ 2012-02-09 8:40 ` Lothar Waßmann 0 siblings, 0 replies; 12+ messages in thread From: Lothar Waßmann @ 2012-02-09 8:40 UTC (permalink / raw) To: Thomas Gleixner Cc: linux-kernel, Lars-Peter Clausen, Yong Zhang, linux-arm-kernel Hi, Thomas Gleixner writes: > On Wed, 8 Feb 2012, Lothar Waßmann wrote: > > > > The problem arises also with interrupt controllers that latch a level > > > > triggered IRQ until it is acknowledged (like the i.MX28 does). > > > > In this case the IRQ status bit will remain asserted after the > > > > soft-irq finishes and retrigger the interrupt while the interrupt line > > > > is already deasserted. > > > > > > This does not make sense. We acknowledge interrupts via mask_ack_irq() > > > right on entry of handle_level_irq(). So either the interrupt > > > > > That's right. But at that point the IRQ line is still asserted and > > since it is a level IRQ this will not actually clear the interrupt > > status bit. Normally the IRQ status bit would self-clear when the IRQ > > line is being deasserted (in this case by removing the finger from the > > touch panel). But the i.MX28 leaves the IRQ status bit set and it > > takes another write to the IRQ status register to remove the bogus IRQ > > status. > > So the question is whether the imx irq chip implementation should > write to the status register on unmask for level type irqs to avoid > spurious interrupts being generated in the first place. This is not > only an optimization for threaded interrupts, afaict this spurious > effect should happen with non threaded interrupts as well. > > Did my patch work for you ? > Sorry, I couldn't test it earlier. Yes, it works. Lothar Waßmann -- ___________________________________________________________ Ka-Ro electronics GmbH | Pascalstraße 22 | D - 52076 Aachen Phone: +49 2408 1402-0 | Fax: +49 2408 1402-10 Geschäftsführer: Matthias Kaussen Handelsregistereintrag: Amtsgericht Aachen, HRB 4996 www.karo-electronics.de | info@karo-electronics.de ___________________________________________________________ ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2012-02-09 8:40 UTC | newest] Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2012-02-06 8:14 [BUG] genirq: Race condition in ONESHOT IRQ handler disabling IRQ forever =?utf-8?Q?Lothar_Wa=C3=9Fmann?= 2012-02-06 10:42 ` Lars-Peter Clausen 2012-02-07 9:03 ` Yong Zhang 2012-02-07 10:01 ` Lothar Waßmann 2012-02-07 12:34 ` Yong Zhang 2012-02-07 12:52 ` Lothar Waßmann 2012-02-07 13:07 ` Lars-Peter Clausen 2012-02-07 13:38 ` [PATCH] genirq: Fix race condition in ONESHOT irq handler Lothar Waßmann 2012-02-07 17:03 ` Thomas Gleixner 2012-02-08 6:05 ` Lothar Waßmann 2012-02-08 10:38 ` Thomas Gleixner 2012-02-09 8:40 ` Lothar Waßmann
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).