linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] genirq: Sanitize spurious interrupt detection of threaded irqs
@ 2013-03-07 13:53 Thomas Gleixner
  2013-03-08 14:47 ` Till Straumann
  2014-05-03 21:19 ` [tip:irq/core] " tip-bot for Thomas Gleixner
  0 siblings, 2 replies; 14+ messages in thread
From: Thomas Gleixner @ 2013-03-07 13:53 UTC (permalink / raw)
  To: LKML; +Cc: Till Straumann

Till reported that he spurious interrupt detection of threaded
interrupts is broken in two ways:

- note_interrupt() is called for each action thread of a shared
  interrupt line. That's wrong as we are only interested whether none
  of the device drivers felt responsible for the interrupt, but by
  calling multiple times for a single interrupt line we account
  IRQ_NONE even if one of the drivers felt responsible.

- note_interrupt() when called from the thread handler is not
  serialized. That leaves the members of irq_desc which are used for
  the spurious detection unprotected.

To solve this we need to defer the spurious detection of a threaded
interrupt to the next hardware interrupt context where we have
implicit serialization.

If note_interrupt is called with action_ret == IRQ_WAKE_THREAD, we
check whether the previous interrupt requested a deferred check. If
not, we request a deferred check for the next hardware interrupt and
return. 

If set, we check whether one of the interrupt threads signaled
success. Depending on this information we feed the result into the
spurious detector.

If one primary handler of a shared interrupt returns IRQ_HANDLED we
disable the deferred check of irq threads on the same line, as we have
found at least one device driver who cared.

Reported-by: Till Straumann <strauman@slac.stanford.edu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/irqdesc.h |    4 +++
 kernel/irq/internals.h  |    4 +++
 kernel/irq/manage.c     |    4 +--
 kernel/irq/spurious.c   |   57 ++++++++++++++++++++++++++++++++++++++++++++----
 4 files changed, 63 insertions(+), 6 deletions(-)

Index: linux-2.6/include/linux/irqdesc.h
===================================================================
--- linux-2.6.orig/include/linux/irqdesc.h
+++ linux-2.6/include/linux/irqdesc.h
@@ -27,6 +27,8 @@ struct irq_desc;
  * @irq_count:		stats field to detect stalled irqs
  * @last_unhandled:	aging timer for unhandled count
  * @irqs_unhandled:	stats field for spurious unhandled interrupts
+ * @threads_handled:	stats field for deferred spurious detection of threaded handlers
+ * @threads_handled_last: comparator field for deferred spurious detection of theraded handlers
  * @lock:		locking for SMP
  * @affinity_hint:	hint to user space for preferred irq affinity
  * @affinity_notify:	context for notification of affinity changes
@@ -52,6 +54,8 @@ struct irq_desc {
 	unsigned int		irq_count;	/* For detecting broken IRQs */
 	unsigned long		last_unhandled;	/* Aging timer for unhandled count */
 	unsigned int		irqs_unhandled;
+	atomic_t		threads_handled;
+	int			threads_handled_last;
 	raw_spinlock_t		lock;
 	struct cpumask		*percpu_enabled;
 #ifdef CONFIG_SMP
Index: linux-2.6/kernel/irq/internals.h
===================================================================
--- linux-2.6.orig/kernel/irq/internals.h
+++ linux-2.6/kernel/irq/internals.h
@@ -55,6 +55,10 @@ enum {
 	IRQS_SUSPENDED		= 0x00000800,
 };
 
+/* Bits for spurious detection of threaded handlers */
+#define SPURIOUS_MASK		0x7fffffff
+#define SPURIOUS_DEFERRED	0x80000000
+
 #include "debug.h"
 #include "settings.h"
 
Index: linux-2.6/kernel/irq/manage.c
===================================================================
--- linux-2.6.orig/kernel/irq/manage.c
+++ linux-2.6/kernel/irq/manage.c
@@ -867,8 +867,8 @@ static int irq_thread(void *data)
 		irq_thread_check_affinity(desc, action);
 
 		action_ret = handler_fn(desc, action);
-		if (!noirqdebug)
-			note_interrupt(action->irq, desc, action_ret);
+		if (!noirqdebug && action_ret == IRQ_HANDLED)
+			atomic_inc(&desc->threads_handled);
 
 		wake_threads_waitq(desc);
 	}
Index: linux-2.6/kernel/irq/spurious.c
===================================================================
--- linux-2.6.orig/kernel/irq/spurious.c
+++ linux-2.6/kernel/irq/spurious.c
@@ -271,15 +271,64 @@ void note_interrupt(unsigned int irq, st
 	if (desc->istate & IRQS_POLL_INPROGRESS)
 		return;
 
-	/* we get here again via the threaded handler */
-	if (action_ret == IRQ_WAKE_THREAD)
-		return;
-
 	if (bad_action_ret(action_ret)) {
 		report_bad_irq(irq, desc, action_ret);
 		return;
 	}
 
+	/*
+	 * We cannot call note_interrupt from the threaded handler. So
+	 * the threaded handlers store whether they sucessfully
+	 * handled an interrupt. Here is the right place to take care
+	 * of this.
+	 */
+	if (action_ret & IRQ_WAKE_THREAD) {
+		/*
+		 * There is a thread woken. Check whether one of the
+		 * shared primary handlers returned IRQ_HANDLED. If
+		 * not we defer the spurious detection to the next
+		 * interrupt.
+		 */
+		if (action_ret == IRQ_WAKE_THREAD) {
+			int handled;
+
+			/*
+			 * We use bit 31 of thread_handled_last to
+			 * note the deferred spurious detection
+			 * active. No locking necessary as
+			 * thread_handled_last is only accessed here
+			 * and we have the guarantee that hard
+			 * interrupts are not reentrant.
+			 */
+			if (!(desc->threads_handled_last & SPURIOUS_DEFERRED)) {
+				desc->threads_handled_last |= SPURIOUS_DEFERRED;
+				return;
+			}
+			/*
+			 * Check whether one of the threaded handlers
+			 * returned IRQ_HANDLED since the last
+			 * interrupt happened. Or the spurious
+			 * deferred flag for ease of comparison and
+			 * preservation in the writeback.
+			 */
+			handled = atomic_read(&desc->threads_handled);
+			handled |= SPURIOUS_DEFERRED;
+			if (handled != desc->threads_handled_last) {
+				action_ret = IRQ_HANDLED;
+				desc->threads_handled_last = handled;
+			} else {
+				action_ret = IRQ_NONE;
+			}
+		} else {
+			/*
+			 * One of the primary handlers returned
+			 * IRQ_HANDLED. So we don't care about the
+			 * threaded handlers on the same line.
+			 */
+			desc->threads_handled_last &= ~SPURIOUS_DEFERRED;
+		}
+	}
+
 	if (unlikely(action_ret == IRQ_NONE)) {
 		/*
 		 * If we are seeing only the odd spurious IRQ caused by

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] genirq: Sanitize spurious interrupt detection of threaded irqs
  2013-03-07 13:53 [PATCH] genirq: Sanitize spurious interrupt detection of threaded irqs Thomas Gleixner
@ 2013-03-08 14:47 ` Till Straumann
  2013-03-08 16:12   ` Thomas Gleixner
  2014-05-03 21:19 ` [tip:irq/core] " tip-bot for Thomas Gleixner
  1 sibling, 1 reply; 14+ messages in thread
From: Till Straumann @ 2013-03-08 14:47 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: LKML

Thomas.

Thanks for your quick reply. I had a look at the patch and I have a few 
comments.

1) I'm not sure adding the SPURIOUS_DEFERRED flag into 
threads_handled_last is OK
     - what happens if the atomic_t counter can hold more than 31 bits? 
In this case,
     when thread handlers increment the counter there is interference 
with the flag.
     If this is not harmful then it is at least ugly.

     I'm not as familiar with the code as you are but wouldn't it be 
simpler to always
     defer spurious detection thus avoiding to have to keep track of the 
state (deferral
     active/inactive)? I.e., if any primary handler returns IRQ_HANDLED then
     we simply increment the counter. note_interrupt() could then always 
compare the
     previous count to the current count and if they are equal conclude 
that the interrupt
     was not handled:

     handle_irq_event_percpu()
     {
     ...
       if (!noirqdebug)
         note_interrupt(irq, desc, retval);

       if ( (retval & IRQ_HANDLED) )
         atomic_inc(&desc->threads_handled);
     }

     and in 'note_interrupt()'

       handled = atomic_read(&desc->threads_handled);
       if ( desc->threads_handled_last == handled ) {
         action_ret = IRQ_NONE;
       } else {
         action_ret = IRQ_HANDLED;
         desc->threads_handled_last = handled;
       }

    Either way - I'm not sure what deferral does to the part of the 
algorithm
    in note_interrupt() which deals with misrouted interrupts since the
    'action_ret' that goes into try_misrouted_irq() is delayed by one 
interrupt
    cycle.


2) note_interrupt is also called from irq/chip.c:handle_nested_irq() and 
I believe
    this routine would also need to increment the 'threads_handled' 
counter rather
    than calling note_interrupt.

Cheers
- Till

On 03/07/2013 02:53 PM, Thomas Gleixner wrote:
> Till reported that he spurious interrupt detection of threaded
> interrupts is broken in two ways:
>
> - note_interrupt() is called for each action thread of a shared
>    interrupt line. That's wrong as we are only interested whether none
>    of the device drivers felt responsible for the interrupt, but by
>    calling multiple times for a single interrupt line we account
>    IRQ_NONE even if one of the drivers felt responsible.
>
> - note_interrupt() when called from the thread handler is not
>    serialized. That leaves the members of irq_desc which are used for
>    the spurious detection unprotected.
>
> To solve this we need to defer the spurious detection of a threaded
> interrupt to the next hardware interrupt context where we have
> implicit serialization.
>
> If note_interrupt is called with action_ret == IRQ_WAKE_THREAD, we
> check whether the previous interrupt requested a deferred check. If
> not, we request a deferred check for the next hardware interrupt and
> return.
>
> If set, we check whether one of the interrupt threads signaled
> success. Depending on this information we feed the result into the
> spurious detector.
>
> If one primary handler of a shared interrupt returns IRQ_HANDLED we
> disable the deferred check of irq threads on the same line, as we have
> found at least one device driver who cared.
>
> Reported-by: Till Straumann <strauman@slac.stanford.edu>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
>   include/linux/irqdesc.h |    4 +++
>   kernel/irq/internals.h  |    4 +++
>   kernel/irq/manage.c     |    4 +--
>   kernel/irq/spurious.c   |   57 ++++++++++++++++++++++++++++++++++++++++++++----
>   4 files changed, 63 insertions(+), 6 deletions(-)
>
> Index: linux-2.6/include/linux/irqdesc.h
> ===================================================================
> --- linux-2.6.orig/include/linux/irqdesc.h
> +++ linux-2.6/include/linux/irqdesc.h
> @@ -27,6 +27,8 @@ struct irq_desc;
>    * @irq_count:		stats field to detect stalled irqs
>    * @last_unhandled:	aging timer for unhandled count
>    * @irqs_unhandled:	stats field for spurious unhandled interrupts
> + * @threads_handled:	stats field for deferred spurious detection of threaded handlers
> + * @threads_handled_last: comparator field for deferred spurious detection of theraded handlers
>    * @lock:		locking for SMP
>    * @affinity_hint:	hint to user space for preferred irq affinity
>    * @affinity_notify:	context for notification of affinity changes
> @@ -52,6 +54,8 @@ struct irq_desc {
>   	unsigned int		irq_count;	/* For detecting broken IRQs */
>   	unsigned long		last_unhandled;	/* Aging timer for unhandled count */
>   	unsigned int		irqs_unhandled;
> +	atomic_t		threads_handled;
> +	int			threads_handled_last;
>   	raw_spinlock_t		lock;
>   	struct cpumask		*percpu_enabled;
>   #ifdef CONFIG_SMP
> Index: linux-2.6/kernel/irq/internals.h
> ===================================================================
> --- linux-2.6.orig/kernel/irq/internals.h
> +++ linux-2.6/kernel/irq/internals.h
> @@ -55,6 +55,10 @@ enum {
>   	IRQS_SUSPENDED		= 0x00000800,
>   };
>   
> +/* Bits for spurious detection of threaded handlers */
> +#define SPURIOUS_MASK		0x7fffffff
> +#define SPURIOUS_DEFERRED	0x80000000
> +
>   #include "debug.h"
>   #include "settings.h"
>   
> Index: linux-2.6/kernel/irq/manage.c
> ===================================================================
> --- linux-2.6.orig/kernel/irq/manage.c
> +++ linux-2.6/kernel/irq/manage.c
> @@ -867,8 +867,8 @@ static int irq_thread(void *data)
>   		irq_thread_check_affinity(desc, action);
>   
>   		action_ret = handler_fn(desc, action);
> -		if (!noirqdebug)
> -			note_interrupt(action->irq, desc, action_ret);
> +		if (!noirqdebug && action_ret == IRQ_HANDLED)
> +			atomic_inc(&desc->threads_handled);
>   
>   		wake_threads_waitq(desc);
>   	}
> Index: linux-2.6/kernel/irq/spurious.c
> ===================================================================
> --- linux-2.6.orig/kernel/irq/spurious.c
> +++ linux-2.6/kernel/irq/spurious.c
> @@ -271,15 +271,64 @@ void note_interrupt(unsigned int irq, st
>   	if (desc->istate & IRQS_POLL_INPROGRESS)
>   		return;
>   
> -	/* we get here again via the threaded handler */
> -	if (action_ret == IRQ_WAKE_THREAD)
> -		return;
> -
>   	if (bad_action_ret(action_ret)) {
>   		report_bad_irq(irq, desc, action_ret);
>   		return;
>   	}
>   
> +	/*
> +	 * We cannot call note_interrupt from the threaded handler. So
> +	 * the threaded handlers store whether they sucessfully
> +	 * handled an interrupt. Here is the right place to take care
> +	 * of this.
> +	 */
> +	if (action_ret & IRQ_WAKE_THREAD) {
> +		/*
> +		 * There is a thread woken. Check whether one of the
> +		 * shared primary handlers returned IRQ_HANDLED. If
> +		 * not we defer the spurious detection to the next
> +		 * interrupt.
> +		 */
> +		if (action_ret == IRQ_WAKE_THREAD) {
> +			int handled;
> +
> +			/*
> +			 * We use bit 31 of thread_handled_last to
> +			 * note the deferred spurious detection
> +			 * active. No locking necessary as
> +			 * thread_handled_last is only accessed here
> +			 * and we have the guarantee that hard
> +			 * interrupts are not reentrant.
> +			 */
> +			if (!(desc->threads_handled_last & SPURIOUS_DEFERRED)) {
> +				desc->threads_handled_last |= SPURIOUS_DEFERRED;
> +				return;
> +			}
> +			/*
> +			 * Check whether one of the threaded handlers
> +			 * returned IRQ_HANDLED since the last
> +			 * interrupt happened. Or the spurious
> +			 * deferred flag for ease of comparison and
> +			 * preservation in the writeback.
> +			 */
> +			handled = atomic_read(&desc->threads_handled);
> +			handled |= SPURIOUS_DEFERRED;
> +			if (handled != desc->threads_handled_last) {
> +				action_ret = IRQ_HANDLED;
> +				desc->threads_handled_last = handled;
> +			} else {
> +				action_ret = IRQ_NONE;
> +			}
> +		} else {
> +			/*
> +			 * One of the primary handlers returned
> +			 * IRQ_HANDLED. So we don't care about the
> +			 * threaded handlers on the same line.
> +			 */
> +			desc->threads_handled_last &= ~SPURIOUS_DEFERRED;
> +		}
> +	}
> +
>   	if (unlikely(action_ret == IRQ_NONE)) {
>   		/*
>   		 * If we are seeing only the odd spurious IRQ caused by


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] genirq: Sanitize spurious interrupt detection of threaded irqs
  2013-03-08 14:47 ` Till Straumann
@ 2013-03-08 16:12   ` Thomas Gleixner
  2013-03-08 17:19     ` Till Straumann
  0 siblings, 1 reply; 14+ messages in thread
From: Thomas Gleixner @ 2013-03-08 16:12 UTC (permalink / raw)
  To: Till Straumann; +Cc: LKML

On Fri, 8 Mar 2013, Till Straumann wrote:
 
> 1) I'm not sure adding the SPURIOUS_DEFERRED flag into
>    threads_handled_last is OK - what happens if the atomic_t counter
>    can hold more than 31 bits? In this case, when thread handlers
>    increment the counter there is interference with the flag.  If
>    this is not harmful then it is at least ugly.

atomic_t is going to stay 32 bit otherwise we'll have more horrible
problems than that one.

>     I'm not as familiar with the code as you are but wouldn't it be
> simpler to always defer spurious detection thus avoiding to have to
> keep track of the state (deferral active/inactive)? I.e., if any
> primary handler returns IRQ_HANDLED then we simply increment the
> counter. note_interrupt() could then always compare the previous
> count to the current count and if they are equal conclude that the
> interrupt was not handled:

Yeah, we could do it that way. Would probably be simpler.
 
>     handle_irq_event_percpu()
>     {
>     ...
>       if (!noirqdebug)
>         note_interrupt(irq, desc, retval);
> 
>       if ( (retval & IRQ_HANDLED) )
>         atomic_inc(&desc->threads_handled);
>     }
> 
>     and in 'note_interrupt()'
> 
>       handled = atomic_read(&desc->threads_handled);
>       if ( desc->threads_handled_last == handled ) {
>         action_ret = IRQ_NONE;
>       } else {
>         action_ret = IRQ_HANDLED;
>         desc->threads_handled_last = handled;
>       }
> 
>    Either way - I'm not sure what deferral does to the part of the algorithm
>    in note_interrupt() which deals with misrouted interrupts since the
>    'action_ret' that goes into try_misrouted_irq() is delayed by one interrupt
>    cycle.

That should not matter much methinks, but I'll try what explodes on
one of my affected machines.
 
> 
> 2) note_interrupt is also called from irq/chip.c:handle_nested_irq() and I
> believe
>    this routine would also need to increment the 'threads_handled' counter
> rather
>    than calling note_interrupt.

That's a different issue. The nested_irq handler is for interrupts
which are demultiplexed by a primary threaded handler. That interrupt
is never handled in hard interrupt context. It's always called from
the context of the demultiplxing thread.

Thanks,

	tglx

 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] genirq: Sanitize spurious interrupt detection of threaded irqs
  2013-03-08 16:12   ` Thomas Gleixner
@ 2013-03-08 17:19     ` Till Straumann
  2013-03-08 19:41       ` Thomas Gleixner
  0 siblings, 1 reply; 14+ messages in thread
From: Till Straumann @ 2013-03-08 17:19 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: LKML

On 03/08/2013 05:12 PM, Thomas Gleixner wrote:
> On Fri, 8 Mar 2013, Till Straumann wrote:
>   
>> 1) I'm not sure adding the SPURIOUS_DEFERRED flag into
>>     threads_handled_last is OK - what happens if the atomic_t counter
>>     can hold more than 31 bits? In this case, when thread handlers
>>     increment the counter there is interference with the flag.  If
>>     this is not harmful then it is at least ugly.
> atomic_t is going to stay 32 bit otherwise we'll have more horrible
> problems than that one.
I know. But this means that when the counter overflows 31 bits (2^31 - 1)
then it spills into the SPURIOUS_DEFERRED flag, right?
>
>>      I'm not as familiar with the code as you are but wouldn't it be
>> simpler to always defer spurious detection thus avoiding to have to
>> keep track of the state (deferral active/inactive)? I.e., if any
>> primary handler returns IRQ_HANDLED then we simply increment the
>> counter. note_interrupt() could then always compare the previous
>> count to the current count and if they are equal conclude that the
>> interrupt was not handled:
> Yeah, we could do it that way. Would probably be simpler.
>   
>>      handle_irq_event_percpu()
>>      {
>>      ...
>>        if (!noirqdebug)
>>          note_interrupt(irq, desc, retval);
>>
>>        if ( (retval & IRQ_HANDLED) )
>>          atomic_inc(&desc->threads_handled);
>>      }
>>
>>      and in 'note_interrupt()'
>>
>>        handled = atomic_read(&desc->threads_handled);
>>        if ( desc->threads_handled_last == handled ) {
>>          action_ret = IRQ_NONE;
>>        } else {
>>          action_ret = IRQ_HANDLED;
>>          desc->threads_handled_last = handled;
>>        }
>>
>>     Either way - I'm not sure what deferral does to the part of the algorithm
>>     in note_interrupt() which deals with misrouted interrupts since the
>>     'action_ret' that goes into try_misrouted_irq() is delayed by one interrupt
>>     cycle.
> That should not matter much methinks, but I'll try what explodes on
> one of my affected machines.
>   
>> 2) note_interrupt is also called from irq/chip.c:handle_nested_irq() and I
>> believe
>>     this routine would also need to increment the 'threads_handled' counter
>> rather
>>     than calling note_interrupt.
> That's a different issue. The nested_irq handler is for interrupts
> which are demultiplexed by a primary threaded handler. That interrupt
> is never handled in hard interrupt context. It's always called from
> the context of the demultiplxing thread.
So you are saying that there 'handle_nested_irq()' can never be executed
from more than one thread for a single interrupt?

I find, however, that e.g., the gpio-sx150x.c driver calls

request_threaded_irq() with IRQF_SHARED set and it's thread_fn does call
handle_nested_irq(). It would thus be possible that multiple drivers
could share an interrupt and each driver would call handle_nested_irq()
which in-turn executes note_interrupt(). This would again raise the
issues we already discussed (note_interrupt() not serialized and thinking
that an interrupt was not handled because it was handled by a different
thread).

Probably I'm missing something regarding the use of nested interrupts
- I would really appreciate if you could help me understand why
it should be OK for handle_nested_irq() to call note_interrupt().

Cheers
- Till
>
> Thanks,
>
> 	tglx
>
>   


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] genirq: Sanitize spurious interrupt detection of threaded irqs
  2013-03-08 17:19     ` Till Straumann
@ 2013-03-08 19:41       ` Thomas Gleixner
  2013-03-12 13:22         ` Till Straumann
  0 siblings, 1 reply; 14+ messages in thread
From: Thomas Gleixner @ 2013-03-08 19:41 UTC (permalink / raw)
  To: Till Straumann; +Cc: LKML

On Fri, 8 Mar 2013, Till Straumann wrote:
> On 03/08/2013 05:12 PM, Thomas Gleixner wrote:
> > On Fri, 8 Mar 2013, Till Straumann wrote:
> >   
> > > 1) I'm not sure adding the SPURIOUS_DEFERRED flag into
> > >     threads_handled_last is OK - what happens if the atomic_t counter
> > >     can hold more than 31 bits? In this case, when thread handlers
> > >     increment the counter there is interference with the flag.  If
> > >     this is not harmful then it is at least ugly.
> > atomic_t is going to stay 32 bit otherwise we'll have more horrible
> > problems than that one.
> I know. But this means that when the counter overflows 31 bits (2^31 - 1)
> then it spills into the SPURIOUS_DEFERRED flag, right?

Gah, yes. /me should stop doing overoptimizations :)
 
> > > 2) note_interrupt is also called from irq/chip.c:handle_nested_irq() and I
> > > believe
> > >     this routine would also need to increment the 'threads_handled'
> > > counter
> > > rather
> > >     than calling note_interrupt.
> > That's a different issue. The nested_irq handler is for interrupts
> > which are demultiplexed by a primary threaded handler. That interrupt
> > is never handled in hard interrupt context. It's always called from
> > the context of the demultiplxing thread.
> So you are saying that there 'handle_nested_irq()' can never be executed
> from more than one thread for a single interrupt?
> 
> I find, however, that e.g., the gpio-sx150x.c driver calls
> 
> request_threaded_irq() with IRQF_SHARED set and it's thread_fn does call
> handle_nested_irq(). It would thus be possible that multiple drivers
> could share an interrupt and each driver would call handle_nested_irq()
> which in-turn executes note_interrupt(). This would again raise the
> issues we already discussed (note_interrupt() not serialized and thinking
> that an interrupt was not handled because it was handled by a different
> thread).
> 
> Probably I'm missing something regarding the use of nested interrupts
> - I would really appreciate if you could help me understand why
> it should be OK for handle_nested_irq() to call note_interrupt().

The thing about nested irqs is:

main irq is threaded (requested by the driver for stuff like i2c)

The handler of this irq reads a pending irq register in the chip and
then invokes handle_nested_irq() for each of the pending bits.

Those interrupts cannot be shared even if the driver request them as
shared:

        irqd_set(&desc->irq_data, IRQD_IRQ_INPROGRESS);
        raw_spin_unlock_irq(&desc->lock);

        action_ret = action->thread_fn(action->irq, action->dev_id);
        if (!noirqdebug)
                note_interrupt(irq, desc, action_ret);

        raw_spin_lock_irq(&desc->lock);
        irqd_clear(&desc->irq_data, IRQD_IRQ_INPROGRESS);

So there is no loop over action->next. And even if that code would
loop over action next, then it still would be serialized

main irq is raised

     -> wake thread

thread runs

       read pending reg()

       for each pending bit {

       	   handle_nested_irq();
		action = desc->action;

		while (action) {
		      action->thread_fn()
		      action = action->next)
		}
		note_interrupt();
       }

thread done

Hope that helps. Thanks,

     tglx

      




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] genirq: Sanitize spurious interrupt detection of threaded irqs
  2013-03-08 19:41       ` Thomas Gleixner
@ 2013-03-12 13:22         ` Till Straumann
  0 siblings, 0 replies; 14+ messages in thread
From: Till Straumann @ 2013-03-12 13:22 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: LKML

OK. Thanks for the explanation.

However - even if not strictly necessary - wouldn't it
be simpler and the code easier to understand if
spurious interrupt detection just always would use the
deferred algorithm?
  - after handling by whatever scheme (hard, threaded, nested)
    the counter is incremented
  - the next hard IRQ calls note_interrupt() which checks
    if the counter has changed since last time.

Just my 2 cents...

Thanks again
- Till

On 03/08/2013 08:41 PM, Thomas Gleixner wrote:
> On Fri, 8 Mar 2013, Till Straumann wrote:
>> On 03/08/2013 05:12 PM, Thomas Gleixner wrote:
>>> On Fri, 8 Mar 2013, Till Straumann wrote:
>>>    
>>>> 1) I'm not sure adding the SPURIOUS_DEFERRED flag into
>>>>      threads_handled_last is OK - what happens if the atomic_t counter
>>>>      can hold more than 31 bits? In this case, when thread handlers
>>>>      increment the counter there is interference with the flag.  If
>>>>      this is not harmful then it is at least ugly.
>>> atomic_t is going to stay 32 bit otherwise we'll have more horrible
>>> problems than that one.
>> I know. But this means that when the counter overflows 31 bits (2^31 - 1)
>> then it spills into the SPURIOUS_DEFERRED flag, right?
> Gah, yes. /me should stop doing overoptimizations :)
>   
>>>> 2) note_interrupt is also called from irq/chip.c:handle_nested_irq() and I
>>>> believe
>>>>      this routine would also need to increment the 'threads_handled'
>>>> counter
>>>> rather
>>>>      than calling note_interrupt.
>>> That's a different issue. The nested_irq handler is for interrupts
>>> which are demultiplexed by a primary threaded handler. That interrupt
>>> is never handled in hard interrupt context. It's always called from
>>> the context of the demultiplxing thread.
>> So you are saying that there 'handle_nested_irq()' can never be executed
>> from more than one thread for a single interrupt?
>>
>> I find, however, that e.g., the gpio-sx150x.c driver calls
>>
>> request_threaded_irq() with IRQF_SHARED set and it's thread_fn does call
>> handle_nested_irq(). It would thus be possible that multiple drivers
>> could share an interrupt and each driver would call handle_nested_irq()
>> which in-turn executes note_interrupt(). This would again raise the
>> issues we already discussed (note_interrupt() not serialized and thinking
>> that an interrupt was not handled because it was handled by a different
>> thread).
>>
>> Probably I'm missing something regarding the use of nested interrupts
>> - I would really appreciate if you could help me understand why
>> it should be OK for handle_nested_irq() to call note_interrupt().
> The thing about nested irqs is:
>
> main irq is threaded (requested by the driver for stuff like i2c)
>
> The handler of this irq reads a pending irq register in the chip and
> then invokes handle_nested_irq() for each of the pending bits.
>
> Those interrupts cannot be shared even if the driver request them as
> shared:
>
>          irqd_set(&desc->irq_data, IRQD_IRQ_INPROGRESS);
>          raw_spin_unlock_irq(&desc->lock);
>
>          action_ret = action->thread_fn(action->irq, action->dev_id);
>          if (!noirqdebug)
>                  note_interrupt(irq, desc, action_ret);
>
>          raw_spin_lock_irq(&desc->lock);
>          irqd_clear(&desc->irq_data, IRQD_IRQ_INPROGRESS);
>
> So there is no loop over action->next. And even if that code would
> loop over action next, then it still would be serialized
>
> main irq is raised
>
>       -> wake thread
>
> thread runs
>
>         read pending reg()
>
>         for each pending bit {
>
>         	   handle_nested_irq();
> 		action = desc->action;
>
> 		while (action) {
> 		      action->thread_fn()
> 		      action = action->next)
> 		}
> 		note_interrupt();
>         }
>
> thread done
>
> Hope that helps. Thanks,
>
>       tglx
>
>        
>
>
>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [tip:irq/core] genirq: Sanitize spurious interrupt detection of threaded irqs
  2013-03-07 13:53 [PATCH] genirq: Sanitize spurious interrupt detection of threaded irqs Thomas Gleixner
  2013-03-08 14:47 ` Till Straumann
@ 2014-05-03 21:19 ` tip-bot for Thomas Gleixner
  1 sibling, 0 replies; 14+ messages in thread
From: tip-bot for Thomas Gleixner @ 2014-05-03 21:19 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, strauman, austin, mkl, socketcan, wg,
	pisa, tglx

Commit-ID:  1e77d0a1ed7417d2a5a52a7b8d32aea1833faa6c
Gitweb:     http://git.kernel.org/tip/1e77d0a1ed7417d2a5a52a7b8d32aea1833faa6c
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Thu, 7 Mar 2013 14:53:45 +0100
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Sat, 3 May 2014 23:15:39 +0200

genirq: Sanitize spurious interrupt detection of threaded irqs

Till reported that the spurious interrupt detection of threaded
interrupts is broken in two ways:

- note_interrupt() is called for each action thread of a shared
  interrupt line. That's wrong as we are only interested whether none
  of the device drivers felt responsible for the interrupt, but by
  calling multiple times for a single interrupt line we account
  IRQ_NONE even if one of the drivers felt responsible.

- note_interrupt() when called from the thread handler is not
  serialized. That leaves the members of irq_desc which are used for
  the spurious detection unprotected.

To solve this we need to defer the spurious detection of a threaded
interrupt to the next hardware interrupt context where we have
implicit serialization.

If note_interrupt is called with action_ret == IRQ_WAKE_THREAD, we
check whether the previous interrupt requested a deferred check. If
not, we request a deferred check for the next hardware interrupt and
return. 

If set, we check whether one of the interrupt threads signaled
success. Depending on this information we feed the result into the
spurious detector.

If one primary handler of a shared interrupt returns IRQ_HANDLED we
disable the deferred check of irq threads on the same line, as we have
found at least one device driver who cared.

Reported-by: Till Straumann <strauman@slac.stanford.edu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Austin Schuh <austin@peloton-tech.com>
Cc: Oliver Hartkopp <socketcan@hartkopp.net>
Cc: Wolfgang Grandegger <wg@grandegger.com>
Cc: Pavel Pisa <pisa@cmp.felk.cvut.cz>
Cc: Marc Kleine-Budde <mkl@pengutronix.de>
Cc: linux-can@vger.kernel.org
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1303071450130.22263@ionos

---
 include/linux/irqdesc.h |   4 ++
 kernel/irq/manage.c     |   4 +-
 kernel/irq/spurious.c   | 106 ++++++++++++++++++++++++++++++++++++++++++++++--
 3 files changed, 108 insertions(+), 6 deletions(-)

diff --git a/include/linux/irqdesc.h b/include/linux/irqdesc.h
index 26e2661..472c021 100644
--- a/include/linux/irqdesc.h
+++ b/include/linux/irqdesc.h
@@ -27,6 +27,8 @@ struct irq_desc;
  * @irq_count:		stats field to detect stalled irqs
  * @last_unhandled:	aging timer for unhandled count
  * @irqs_unhandled:	stats field for spurious unhandled interrupts
+ * @threads_handled:	stats field for deferred spurious detection of threaded handlers
+ * @threads_handled_last: comparator field for deferred spurious detection of theraded handlers
  * @lock:		locking for SMP
  * @affinity_hint:	hint to user space for preferred irq affinity
  * @affinity_notify:	context for notification of affinity changes
@@ -52,6 +54,8 @@ struct irq_desc {
 	unsigned int		irq_count;	/* For detecting broken IRQs */
 	unsigned long		last_unhandled;	/* Aging timer for unhandled count */
 	unsigned int		irqs_unhandled;
+	atomic_t		threads_handled;
+	int			threads_handled_last;
 	raw_spinlock_t		lock;
 	struct cpumask		*percpu_enabled;
 #ifdef CONFIG_SMP
diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index d34131c..3dc6a61 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -886,8 +886,8 @@ static int irq_thread(void *data)
 		irq_thread_check_affinity(desc, action);
 
 		action_ret = handler_fn(desc, action);
-		if (!noirqdebug)
-			note_interrupt(action->irq, desc, action_ret);
+		if (action_ret == IRQ_HANDLED)
+			atomic_inc(&desc->threads_handled);
 
 		wake_threads_waitq(desc);
 	}
diff --git a/kernel/irq/spurious.c b/kernel/irq/spurious.c
index a1d8cc6..e2514b0 100644
--- a/kernel/irq/spurious.c
+++ b/kernel/irq/spurious.c
@@ -270,6 +270,8 @@ try_misrouted_irq(unsigned int irq, struct irq_desc *desc,
 	return action && (action->flags & IRQF_IRQPOLL);
 }
 
+#define SPURIOUS_DEFERRED	0x80000000
+
 void note_interrupt(unsigned int irq, struct irq_desc *desc,
 		    irqreturn_t action_ret)
 {
@@ -277,15 +279,111 @@ void note_interrupt(unsigned int irq, struct irq_desc *desc,
 	    irq_settings_is_polled(desc))
 		return;
 
-	/* we get here again via the threaded handler */
-	if (action_ret == IRQ_WAKE_THREAD)
-		return;
-
 	if (bad_action_ret(action_ret)) {
 		report_bad_irq(irq, desc, action_ret);
 		return;
 	}
 
+	/*
+	 * We cannot call note_interrupt from the threaded handler
+	 * because we need to look at the compound of all handlers
+	 * (primary and threaded). Aside of that in the threaded
+	 * shared case we have no serialization against an incoming
+	 * hardware interrupt while we are dealing with a threaded
+	 * result.
+	 *
+	 * So in case a thread is woken, we just note the fact and
+	 * defer the analysis to the next hardware interrupt.
+	 *
+	 * The threaded handlers store whether they sucessfully
+	 * handled an interrupt and we check whether that number
+	 * changed versus the last invocation.
+	 *
+	 * We could handle all interrupts with the delayed by one
+	 * mechanism, but for the non forced threaded case we'd just
+	 * add pointless overhead to the straight hardirq interrupts
+	 * for the sake of a few lines less code.
+	 */
+	if (action_ret & IRQ_WAKE_THREAD) {
+		/*
+		 * There is a thread woken. Check whether one of the
+		 * shared primary handlers returned IRQ_HANDLED. If
+		 * not we defer the spurious detection to the next
+		 * interrupt.
+		 */
+		if (action_ret == IRQ_WAKE_THREAD) {
+			int handled;
+			/*
+			 * We use bit 31 of thread_handled_last to
+			 * denote the deferred spurious detection
+			 * active. No locking necessary as
+			 * thread_handled_last is only accessed here
+			 * and we have the guarantee that hard
+			 * interrupts are not reentrant.
+			 */
+			if (!(desc->threads_handled_last & SPURIOUS_DEFERRED)) {
+				desc->threads_handled_last |= SPURIOUS_DEFERRED;
+				return;
+			}
+			/*
+			 * Check whether one of the threaded handlers
+			 * returned IRQ_HANDLED since the last
+			 * interrupt happened.
+			 *
+			 * For simplicity we just set bit 31, as it is
+			 * set in threads_handled_last as well. So we
+			 * avoid extra masking. And we really do not
+			 * care about the high bits of the handled
+			 * count. We just care about the count being
+			 * different than the one we saw before.
+			 */
+			handled = atomic_read(&desc->threads_handled);
+			handled |= SPURIOUS_DEFERRED;
+			if (handled != desc->threads_handled_last) {
+				action_ret = IRQ_HANDLED;
+				/*
+				 * Note: We keep the SPURIOUS_DEFERRED
+				 * bit set. We are handling the
+				 * previous invocation right now.
+				 * Keep it for the current one, so the
+				 * next hardware interrupt will
+				 * account for it.
+				 */
+				desc->threads_handled_last = handled;
+			} else {
+				/*
+				 * None of the threaded handlers felt
+				 * responsible for the last interrupt
+				 *
+				 * We keep the SPURIOUS_DEFERRED bit
+				 * set in threads_handled_last as we
+				 * need to account for the current
+				 * interrupt as well.
+				 */
+				action_ret = IRQ_NONE;
+			}
+		} else {
+			/*
+			 * One of the primary handlers returned
+			 * IRQ_HANDLED. So we don't care about the
+			 * threaded handlers on the same line. Clear
+			 * the deferred detection bit.
+			 *
+			 * In theory we could/should check whether the
+			 * deferred bit is set and take the result of
+			 * the previous run into account here as
+			 * well. But it's really not worth the
+			 * trouble. If every other interrupt is
+			 * handled we never trigger the spurious
+			 * detector. And if this is just the one out
+			 * of 100k unhandled ones which is handled
+			 * then we merily delay the spurious detection
+			 * by one hard interrupt. Not a real problem.
+			 */
+			desc->threads_handled_last &= ~SPURIOUS_DEFERRED;
+		}
+	}
+
 	if (unlikely(action_ret == IRQ_NONE)) {
 		/*
 		 * If we are seeing only the odd spurious IRQ caused by

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] genirq: Sanitize spurious interrupt detection of threaded irqs
  2014-04-28 20:20             ` Austin Schuh
@ 2014-04-28 20:44               ` Thomas Gleixner
  0 siblings, 0 replies; 14+ messages in thread
From: Thomas Gleixner @ 2014-04-28 20:44 UTC (permalink / raw)
  To: Austin Schuh
  Cc: Oliver Hartkopp, Wolfgang Grandegger, Pavel Pisa,
	Marc Kleine-Budde, linux-can, linux-kernel

On Mon, 28 Apr 2014, Austin Schuh wrote:
> On Mon, Apr 7, 2014 at 1:08 PM, Austin Schuh <austin@peloton-tech.com> wrote:
> > On Mon, Apr 7, 2014 at 1:07 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> >> On Mon, 7 Apr 2014, Austin Schuh wrote:
> >>> You originally sent the patch out.  I could send your patch out back
> >>> to you, but that feels a bit weird ;)
> >>
> >> Wheee. Let me dig in my archives ....
> >
> > https://lkml.org/lkml/2013/3/7/222 in case that helps.
> 
> Did you find the patch?  I didn't see anything go by (but I'm not on
> the main mailing list and didn't find anything with a quick Google
> search.)  It would be nice to not need to run a custom kernel to keep
> my machine running.  I have what is probably a year split between 2
> machines of runtime with the patch applied, and I haven't seen any
> problems with it.

It's on my list, but Easter and traveling did not really help :)

There are a few issues with the patch I need to think through, but
I'll get to it in the next few days.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] genirq: Sanitize spurious interrupt detection of threaded irqs
  2014-04-07 20:08           ` Austin Schuh
@ 2014-04-28 20:20             ` Austin Schuh
  2014-04-28 20:44               ` Thomas Gleixner
  0 siblings, 1 reply; 14+ messages in thread
From: Austin Schuh @ 2014-04-28 20:20 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Oliver Hartkopp, Wolfgang Grandegger, Pavel Pisa,
	Marc Kleine-Budde, linux-can, linux-kernel

On Mon, Apr 7, 2014 at 1:08 PM, Austin Schuh <austin@peloton-tech.com> wrote:
> On Mon, Apr 7, 2014 at 1:07 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
>> On Mon, 7 Apr 2014, Austin Schuh wrote:
>>> You originally sent the patch out.  I could send your patch out back
>>> to you, but that feels a bit weird ;)
>>
>> Wheee. Let me dig in my archives ....
>
> https://lkml.org/lkml/2013/3/7/222 in case that helps.

Did you find the patch?  I didn't see anything go by (but I'm not on
the main mailing list and didn't find anything with a quick Google
search.)  It would be nice to not need to run a custom kernel to keep
my machine running.  I have what is probably a year split between 2
machines of runtime with the patch applied, and I haven't seen any
problems with it.

Austin

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] genirq: Sanitize spurious interrupt detection of threaded irqs
  2014-04-07 20:07         ` Thomas Gleixner
@ 2014-04-07 20:08           ` Austin Schuh
  2014-04-28 20:20             ` Austin Schuh
  0 siblings, 1 reply; 14+ messages in thread
From: Austin Schuh @ 2014-04-07 20:08 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Oliver Hartkopp, Wolfgang Grandegger, Pavel Pisa,
	Marc Kleine-Budde, linux-can, linux-kernel

On Mon, Apr 7, 2014 at 1:07 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> On Mon, 7 Apr 2014, Austin Schuh wrote:
>> You originally sent the patch out.  I could send your patch out back
>> to you, but that feels a bit weird ;)
>
> Wheee. Let me dig in my archives ....

https://lkml.org/lkml/2013/3/7/222 in case that helps.

Austin

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] genirq: Sanitize spurious interrupt detection of threaded irqs
  2014-04-07 20:05       ` Austin Schuh
@ 2014-04-07 20:07         ` Thomas Gleixner
  2014-04-07 20:08           ` Austin Schuh
  0 siblings, 1 reply; 14+ messages in thread
From: Thomas Gleixner @ 2014-04-07 20:07 UTC (permalink / raw)
  To: Austin Schuh
  Cc: Oliver Hartkopp, Wolfgang Grandegger, Pavel Pisa,
	Marc Kleine-Budde, linux-can, linux-kernel

On Mon, 7 Apr 2014, Austin Schuh wrote:

> On Mon, Apr 7, 2014 at 11:41 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> > On Mon, 7 Apr 2014, Austin Schuh wrote:
> >
> >> Hi Thomas,
> >>
> >> Did anything come of this patch?  Both Oliver and I have found that it
> >> fixes real problems.  I have multiple machines which have been running
> >> with the patch since December with no ill effects.
> >
> > No, sorry. It fell through the cracks. Care to resend ?
> >
> > Thanks,
> >
> >         tglx
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-can" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> You originally sent the patch out.  I could send your patch out back
> to you, but that feels a bit weird ;)

Wheee. Let me dig in my archives ....

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] genirq: Sanitize spurious interrupt detection of threaded irqs
  2014-04-07 18:41     ` Thomas Gleixner
@ 2014-04-07 20:05       ` Austin Schuh
  2014-04-07 20:07         ` Thomas Gleixner
  0 siblings, 1 reply; 14+ messages in thread
From: Austin Schuh @ 2014-04-07 20:05 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Oliver Hartkopp, Wolfgang Grandegger, Pavel Pisa,
	Marc Kleine-Budde, linux-can, linux-kernel

On Mon, Apr 7, 2014 at 11:41 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> On Mon, 7 Apr 2014, Austin Schuh wrote:
>
>> Hi Thomas,
>>
>> Did anything come of this patch?  Both Oliver and I have found that it
>> fixes real problems.  I have multiple machines which have been running
>> with the patch since December with no ill effects.
>
> No, sorry. It fell through the cracks. Care to resend ?
>
> Thanks,
>
>         tglx
> --
> To unsubscribe from this list: send the line "unsubscribe linux-can" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

You originally sent the patch out.  I could send your patch out back
to you, but that feels a bit weird ;)

Austin

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] genirq: Sanitize spurious interrupt detection of threaded irqs
  2014-04-07 18:38   ` [PATCH] " Austin Schuh
@ 2014-04-07 18:41     ` Thomas Gleixner
  2014-04-07 20:05       ` Austin Schuh
  0 siblings, 1 reply; 14+ messages in thread
From: Thomas Gleixner @ 2014-04-07 18:41 UTC (permalink / raw)
  To: Austin Schuh
  Cc: Oliver Hartkopp, Wolfgang Grandegger, Pavel Pisa,
	Marc Kleine-Budde, linux-can, linux-kernel

On Mon, 7 Apr 2014, Austin Schuh wrote:

> Hi Thomas,
> 
> Did anything come of this patch?  Both Oliver and I have found that it
> fixes real problems.  I have multiple machines which have been running
> with the patch since December with no ill effects.

No, sorry. It fell through the cracks. Care to resend ?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] genirq: Sanitize spurious interrupt detection of threaded irqs
       [not found] ` <52CAB05F.4010303@hartkopp.net>
@ 2014-04-07 18:38   ` Austin Schuh
  2014-04-07 18:41     ` Thomas Gleixner
  0 siblings, 1 reply; 14+ messages in thread
From: Austin Schuh @ 2014-04-07 18:38 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Oliver Hartkopp, Wolfgang Grandegger, Pavel Pisa,
	Marc Kleine-Budde, linux-can, linux-kernel

Hi Thomas,

Did anything come of this patch?  Both Oliver and I have found that it
fixes real problems.  I have multiple machines which have been running
with the patch since December with no ill effects.

Thanks,
  Austin

On Mon, Jan 6, 2014 at 5:32 AM, Oliver Hartkopp <socketcan@hartkopp.net> wrote:
> Hi Thomas,
>
> I just wanted to add my
>
> Tested-by: Oliver Hartkopp <socketcan@hartkopp.net>
>
> In my setup with Core i7 and 20 CAN busses SJA1000 PCIe the problem
> disappeared with the discussed patch with the -rt kernel.
>
> The system was running at full CAN bus load over the weekend more than 72
> hours of operation without problems:
>
>            CPU0       CPU1       CPU2       CPU3
>   0:         40          0          0          0   IO-APIC-edge      timer
>   1:          1          0          0          0   IO-APIC-edge      i8042
>   8:          0          0          1          0   IO-APIC-edge      rtc0
>   9:         42         45         45         42   IO-APIC-fasteoi   acpi
>  16:          9          8          8          8   IO-APIC-fasteoi   ahci, ehci_hcd:usb1, can4, can5, can6, can7
>  17:  441468642  443275488  443609061  441436145   IO-APIC-fasteoi   can8, can10, can11, can9
>  18:  441975412  438811422  437317802  441209092   IO-APIC-fasteoi   can12, can13, can14, can15
>  19:  427310388  428661677  429813687  428095739   IO-APIC-fasteoi   can0, can1, can2, can3, can16, can17, can18, can19
> (..)
>
> Before the having the patch, it lasted 1 minutes to 1.5 hours (usually ~3
> minutes) until the irq was killed due to the spurious detection using Linux
> 3.10.11-rt (Debian linux-image-3.10-0.bpo.3-rt-686-pae).
>
> I also tested the patch on different latest 3.13-rc5+ (non-rt) kernels for two
> weeks now without problems.
>
> If you want me to test an improved version (as Austin suggested below) please
> send a patch.
>
> Best regards,
> Oliver
>
> On 23.12.2013 20:25, Austin Schuh wrote:
>> Hi Thomas,
>>
>> Did anything happen with your patch to note_interrupt, originally
>> posted on May 8th of 2013?  (https://lkml.org/lkml/2013/3/7/222)
>>
>> I am seeing an issue on a machine right now running a
>> config-preempt-rt kernel and a SJA1000 CAN card from PEAK.  It works
>> for ~1 day, and then proceeds to die with a "Disabling IRQ #18"
>> message.  I posted on the Linux CAN mailing list, and Oliver Hartkopp
>> was able to reproduce the issue only on a realtime kernel.  A function
>> trace ending when the IRQ was disabled shows that note_interrupt is
>> being called regularly from the IRQ handler threads, and one of the
>> threads is doing work (and therefore calling note_interrupt with
>> IRQ_HANDLED).
>>
>> Oliver Hartkopp and I ran tests over the weekend on numerous machines
>> and verified that the patch that you proposed fixes the problem.  We
>> think that the race condition that Till reported is causing the
>> problem here.
>>
>> In reply to the comment about using the upper bit of
>> threads_handled_last for holding the SPURIOUS_DEFERRED flag, while
>> that may still be an over-optimization, the code should still work.
>> All comparisons are done with the bit set, which just makes it a 31
>> bit counter.  It will take 8 more days for the counter to overflow on
>> my machine, so I won't know for certain until then.
>>
>> My only concern is that there may still be a small race condition with
>> this new code.  If the interrupt handler thread is running at a
>> realtime priority, but lower than another task, it may not get run
>> until a large number of IRQs get triggered, and then process them
>> quickly.  With your new handler code, this would be counted as one
>> single handled interrupt.  With the current constants, this is only a
>> problem if more than 1000 calls to the handler happen between IRQs.  I
>> starved my card's irq threads by running 4 tasks at a higher realtime
>> priority than the handler threads, and saw the number of unhandled
>> IRQs jump from 1/100000 to 3/100000, so that problem may not show up
>> in practice.
>>
>> Austin Schuh
>>
>> Tested-by: Austin Schuh <austin@peloton-tech.com>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-can" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2014-05-03 21:20 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-03-07 13:53 [PATCH] genirq: Sanitize spurious interrupt detection of threaded irqs Thomas Gleixner
2013-03-08 14:47 ` Till Straumann
2013-03-08 16:12   ` Thomas Gleixner
2013-03-08 17:19     ` Till Straumann
2013-03-08 19:41       ` Thomas Gleixner
2013-03-12 13:22         ` Till Straumann
2014-05-03 21:19 ` [tip:irq/core] " tip-bot for Thomas Gleixner
     [not found] <CANGgnMbszHzYe9pF2C6wag4MY_PfBG2qrMCC=rMmQnb-jyXXXw@mail.gmail.com>
     [not found] ` <52CAB05F.4010303@hartkopp.net>
2014-04-07 18:38   ` [PATCH] " Austin Schuh
2014-04-07 18:41     ` Thomas Gleixner
2014-04-07 20:05       ` Austin Schuh
2014-04-07 20:07         ` Thomas Gleixner
2014-04-07 20:08           ` Austin Schuh
2014-04-28 20:20             ` Austin Schuh
2014-04-28 20:44               ` Thomas Gleixner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).