linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC][PATCH -rt] irqd starvation on SMP by a single process?
@ 2006-05-12  2:43 john stultz
  2006-05-12  5:50 ` Ingo Molnar
  0 siblings, 1 reply; 16+ messages in thread
From: john stultz @ 2006-05-12  2:43 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: lkml, Thomas Gleixner, Steven Rostedt

Hey Ingo,

	I've been seeing some odd behavior recently w/ the irq smp_affinity
masks. With a single runaway -rt thread, we were getting hard hangs on
an SMP system.

Clearly, if the runaway -rt thread is running on a cpu that is getting
critical interrupts (scsi, network, etc), a system hang would be
expected. However I started playing with the proc/irq/<num>/smp_affinity
masks and got some odd behavior.

Using taskset and a small script that sets all the irq smp_affinity
values, I got the following.

./irqbind.sh 1 ; taskset 1 ./runaway  = HANG (as expected)
./irqbind.sh 1 ; taskset 2 ./runaway  = NO HANG (as expected)

./irqbind.sh 2 ; taskset 1 ./runaway  = NO HANG (as expected)
./irqbind.sh 2 ; taskset 2 ./runaway  = HANG (as expected)

Everything is cool. However, I figured we could set the irq smp_affinity
mask to multiple cpus so a runaway thread could not cause a hang.

./irqbind.sh 3 ; taskset 1 ./runaway  = Sometimes hang
./irqbind.sh 3 ; taskset 2 ./runaway  = !Sometime hang

Basically it appeared that while the irq smp_affinity mask was set for
both cpus, the irq was really being delivered to only one cpu. Looking
at the code, sure enough, that is the case.

The patch below appears to correct this issue, however it also
repeatedly(on different irqs) causes the following BUG:

BUG: using smp_processor_id() in preemptible [00000000] code: IRQ 24/1518
caller is __do_softirq+0x10/0x51					 
 [<c0263bc3>] debug_smp_processor_id+0x7b/0x88 (8)
 [<c0126936>] __do_softirq+0x10/0x51 (12)	  
 [<c0141195>] do_irqd+0xd8/0x10e (12)	 
 [<c013250f>] kthread+0x7c/0xa6 (24) 
 [<c0132493>] kthread+0x0/0xa6 (20) 
 [<c0101041>] kernel_thread_helper+0x5/0xb (16)
---------------------------		       
| preempt count: 00000001 ]
| 1-level deep critical section nesting:
----------------------------------------
.. [<c0263b83>] .... debug_smp_processor_id+0x3b/0x88
.....[<00000000>] ..   ( <= rest_init+0x3feffd64/0x3d)
						      
------------------------------
| showing all locks held by: |	(IRQ 24/1518 [f7b11070,  60]):


Thoughts?
-john


--- 2.6-rt/kernel/irq/manage.c	2006-05-11 18:37:36.000000000 -0500
+++ dev-rt/kernel/irq/manage.c	2006-05-11 21:09:10.000000000 -0500
@@ -731,10 +731,8 @@
 		/*
 		 * Did IRQ affinities change?
 		 */
-		if (!cpu_isset(smp_processor_id(), irq_affinity[irq])) {
-			mask = cpumask_of_cpu(any_online_cpu(irq_affinity[irq]));
-			set_cpus_allowed(current, mask);
-		}
+		if(!cpus_equal(current->cpus_allowed, irq_affinity[irq]));
+			set_cpus_allowed(current, irq_affinity[irq]);
 #endif
 		schedule();
 	}



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC][PATCH -rt] irqd starvation on SMP by a single process?
  2006-05-12  2:43 [RFC][PATCH -rt] irqd starvation on SMP by a single process? john stultz
@ 2006-05-12  5:50 ` Ingo Molnar
  2006-05-12  7:59   ` Steven Rostedt
                     ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Ingo Molnar @ 2006-05-12  5:50 UTC (permalink / raw)
  To: john stultz; +Cc: lkml, Thomas Gleixner, Steven Rostedt


* john stultz <johnstul@us.ibm.com> wrote:

> -		if (!cpu_isset(smp_processor_id(), irq_affinity[irq])) {
> -			mask = cpumask_of_cpu(any_online_cpu(irq_affinity[irq]));
> -			set_cpus_allowed(current, mask);
> -		}

this is intentional, not a bug. The point of the code above is to ensure 
that every IRQ handler is executed on one CPU. I.e. the irq threads are 
semi-affine - they are strictly affine when executing a handler, but 
they may switch CPUs if the affinity mask points it elsewhere.

but ... we might be able to relax that. Potentially some IRQ handlers 
might assume per-cpu-ness, but that should be uncovered by 
DEBUG_PREEMPT's smp_processor_id() checks.

> +		if(!cpus_equal(current->cpus_allowed, irq_affinity[irq]));
> +			set_cpus_allowed(current, irq_affinity[irq]);

> The patch below appears to correct this issue, however it also
> repeatedly(on different irqs) causes the following BUG:

ah. This actually uncovered a real bug. We were calling __do_softirq() 
with interrupts enabled (and being preemptible) - which is certainly 
bad.

this was hidden before because the smp_processor_id() debugging code 
handles tasks bound to a single CPU as per-cpu-safe.

could you check the (totally untested) patch below and see if that fixes 
things for you? I've also added your affinity change.

	Ingo

Index: linux-rt.q/kernel/irq/manage.c
===================================================================
--- linux-rt.q.orig/kernel/irq/manage.c
+++ linux-rt.q/kernel/irq/manage.c
@@ -717,24 +717,21 @@ static int do_irqd(void * __desc)
 	if (param.sched_priority > 25)
 		curr_irq_prio = param.sched_priority - 1;
 
-//	param.sched_priority = 1;
 	sys_sched_setscheduler(current->pid, SCHED_FIFO, &param);
 
 	while (!kthread_should_stop()) {
 		set_current_state(TASK_INTERRUPTIBLE);
 		do_hardirq(desc);
 		cond_resched_all();
+		local_irq_disable();
 		__do_softirq();
-//		do_softirq_from_hardirq();
 		local_irq_enable();
 #ifdef CONFIG_SMP
 		/*
 		 * Did IRQ affinities change?
 		 */
-		if (!cpu_isset(smp_processor_id(), irq_affinity[irq])) {
-			mask = cpumask_of_cpu(any_online_cpu(irq_affinity[irq]));
-			set_cpus_allowed(current, mask);
-		}
+		if (!cpus_equal(current->cpus_allowed, irq_affinity[irq]));
+			set_cpus_allowed(current, irq_affinity[irq]);
 #endif
 		schedule();
 	}

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC][PATCH -rt] irqd starvation on SMP by a single process?
  2006-05-12  5:50 ` Ingo Molnar
@ 2006-05-12  7:59   ` Steven Rostedt
  2006-05-12  8:10     ` Ingo Molnar
  2006-05-12 11:39     ` Mark Hounschell
  2006-05-12 17:47   ` john stultz
  2006-05-12 18:04   ` john stultz
  2 siblings, 2 replies; 16+ messages in thread
From: Steven Rostedt @ 2006-05-12  7:59 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: john stultz, lkml, Mark Hounschell, Thomas Gleixner



On Fri, 12 May 2006, Ingo Molnar wrote:

> ah. This actually uncovered a real bug. We were calling __do_softirq()
> with interrupts enabled (and being preemptible) - which is certainly
> bad.

Hmm, I wonder if this is also affecting Mark's problem.

But since I showed that if hardirqs_disabled and running PREEMPT not
PREEMPT_RT, disable_irq can call schedule.  This is done in
drivers/net/3c59x.c.  It has a watchdog timeout calling disable_irq, which
calls synchronize_irq which might schedule:

void synchronize_irq(unsigned int irq)
{
	struct irq_desc *desc = irq_desc + irq;

	if (irq >= NR_IRQS)
		return;

	if (hardirq_preemption && !(desc->status & IRQ_NODELAY))
		wait_event(desc->wait_for_handler,
			!(desc->status & IRQ_INPROGRESS));
	else
		while (desc->status & IRQ_INPROGRESS)
			cpu_relax();
}

-- Steve

>
> this was hidden before because the smp_processor_id() debugging code
> handles tasks bound to a single CPU as per-cpu-safe.
>
> could you check the (totally untested) patch below and see if that fixes
> things for you? I've also added your affinity change.
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC][PATCH -rt] irqd starvation on SMP by a single process?
  2006-05-12  7:59   ` Steven Rostedt
@ 2006-05-12  8:10     ` Ingo Molnar
  2006-05-12 11:39     ` Mark Hounschell
  1 sibling, 0 replies; 16+ messages in thread
From: Ingo Molnar @ 2006-05-12  8:10 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: john stultz, lkml, Mark Hounschell, Thomas Gleixner


* Steven Rostedt <rostedt@goodmis.org> wrote:

> On Fri, 12 May 2006, Ingo Molnar wrote:
> 
> > ah. This actually uncovered a real bug. We were calling __do_softirq()
> > with interrupts enabled (and being preemptible) - which is certainly
> > bad.
> 
> Hmm, I wonder if this is also affecting Mark's problem.
> 
> But since I showed that if hardirqs_disabled and running PREEMPT not 
> PREEMPT_RT, disable_irq can call schedule.  This is done in 
> drivers/net/3c59x.c.  It has a watchdog timeout calling disable_irq, 
> which calls synchronize_irq which might schedule:
> 
> void synchronize_irq(unsigned int irq)
> {
> 	struct irq_desc *desc = irq_desc + irq;
> 
> 	if (irq >= NR_IRQS)
> 		return;
> 
> 	if (hardirq_preemption && !(desc->status & IRQ_NODELAY))
> 		wait_event(desc->wait_for_handler,
> 			!(desc->status & IRQ_INPROGRESS));
> 	else
> 		while (desc->status & IRQ_INPROGRESS)
> 			cpu_relax();
> }
> 
> -- Steve
> 
> >
> > this was hidden before because the smp_processor_id() debugging code
> > handles tasks bound to a single CPU as per-cpu-safe.
> >
> > could you check the (totally untested) patch below and see if that fixes
> > things for you? I've also added your affinity change.
> >

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC][PATCH -rt] irqd starvation on SMP by a single process?
  2006-05-12  7:59   ` Steven Rostedt
  2006-05-12  8:10     ` Ingo Molnar
@ 2006-05-12 11:39     ` Mark Hounschell
  2006-05-12 11:56       ` Ingo Molnar
  1 sibling, 1 reply; 16+ messages in thread
From: Mark Hounschell @ 2006-05-12 11:39 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Ingo Molnar, john stultz, lkml, Thomas Gleixner

Steven Rostedt wrote:
> 
> On Fri, 12 May 2006, Ingo Molnar wrote:
> 
>> ah. This actually uncovered a real bug. We were calling __do_softirq()
>> with interrupts enabled (and being preemptible) - which is certainly
>> bad.
> 
> Hmm, I wonder if this is also affecting Mark's problem.
> 

I thought the same thing when I read it??

Mark


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC][PATCH -rt] irqd starvation on SMP by a single process?
  2006-05-12 11:39     ` Mark Hounschell
@ 2006-05-12 11:56       ` Ingo Molnar
  2006-05-12 11:59         ` Mark Hounschell
  2006-05-12 12:53         ` Mark Hounschell
  0 siblings, 2 replies; 16+ messages in thread
From: Ingo Molnar @ 2006-05-12 11:56 UTC (permalink / raw)
  To: Mark Hounschell; +Cc: Steven Rostedt, john stultz, lkml, Thomas Gleixner


* Mark Hounschell <markh@compro.net> wrote:

> Steven Rostedt wrote:
> > 
> > On Fri, 12 May 2006, Ingo Molnar wrote:
> > 
> >> ah. This actually uncovered a real bug. We were calling __do_softirq()
> >> with interrupts enabled (and being preemptible) - which is certainly
> >> bad.
> > 
> > Hmm, I wonder if this is also affecting Mark's problem.
> 
> I thought the same thing when I read it??

could you try the patch below?

	Ingo

----

* Steven Rostedt <rostedt@goodmis.org> wrote:

> > one solution would be to forbid disable_irq() from softirq contexts, and
> > to convert the vortex timeout function to a workqueue and use the
> > *_delayed_work() APIs to drive it - and cross fingers there's not many
> > places to fix.
> 
> I prefer the above. Maybe even add a WARN_ON(in_softirq()) in 
> disable_irq.
> 
> But I must admit, I wouldn't know how to make that change without 
> spending more time on it then I have for this.

the simplest fix for now would be to use the _nosync variant in the 
vortex timeout function.

Mark, does this fix the problem?

	Ingo

Index: linux-rt.q/drivers/net/3c59x.c
===================================================================
--- linux-rt.q.orig/drivers/net/3c59x.c
+++ linux-rt.q/drivers/net/3c59x.c
@@ -1897,7 +1897,8 @@ vortex_timer(unsigned long data)
 
 	if (vp->medialock)
 		goto leave_media_alone;
-	disable_irq(dev->irq);
+	/* hack! */
+	disable_irq_nosync(dev->irq);
 	old_window = ioread16(ioaddr + EL3_CMD) >> 13;
 	EL3WINDOW(4);
 	media_status = ioread16(ioaddr + Wn4_Media);

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC][PATCH -rt] irqd starvation on SMP by a single process?
  2006-05-12 11:56       ` Ingo Molnar
@ 2006-05-12 11:59         ` Mark Hounschell
  2006-05-12 12:53         ` Mark Hounschell
  1 sibling, 0 replies; 16+ messages in thread
From: Mark Hounschell @ 2006-05-12 11:59 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Steven Rostedt, john stultz, lkml, Thomas Gleixner

Ingo Molnar wrote:
> * Mark Hounschell <markh@compro.net> wrote:
> 
>> Steven Rostedt wrote:
>>> On Fri, 12 May 2006, Ingo Molnar wrote:
>>>
>>>> ah. This actually uncovered a real bug. We were calling __do_softirq()
>>>> with interrupts enabled (and being preemptible) - which is certainly
>>>> bad.
>>> Hmm, I wonder if this is also affecting Mark's problem.
>> I thought the same thing when I read it??
> 
> could you try the patch below?
> 
> 	Ingo
> 
> ----
> 
> * Steven Rostedt <rostedt@goodmis.org> wrote:
> 
>>> one solution would be to forbid disable_irq() from softirq contexts, and
>>> to convert the vortex timeout function to a workqueue and use the
>>> *_delayed_work() APIs to drive it - and cross fingers there's not many
>>> places to fix.
>> I prefer the above. Maybe even add a WARN_ON(in_softirq()) in 
>> disable_irq.
>>
>> But I must admit, I wouldn't know how to make that change without 
>> spending more time on it then I have for this.
> 
> the simplest fix for now would be to use the _nosync variant in the 
> vortex timeout function.
> 
> Mark, does this fix the problem?
> 
> 	Ingo
> 
> Index: linux-rt.q/drivers/net/3c59x.c
> ===================================================================
> --- linux-rt.q.orig/drivers/net/3c59x.c
> +++ linux-rt.q/drivers/net/3c59x.c
> @@ -1897,7 +1897,8 @@ vortex_timer(unsigned long data)
>  
>  	if (vp->medialock)
>  		goto leave_media_alone;
> -	disable_irq(dev->irq);
> +	/* hack! */
> +	disable_irq_nosync(dev->irq);
>  	old_window = ioread16(ioaddr + EL3_CMD) >> 13;
>  	EL3WINDOW(4);
>  	media_status = ioread16(ioaddr + Wn4_Media);
> 

I'm trying that right this moment. It may take a while before I know if
it helps.

Mark

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC][PATCH -rt] irqd starvation on SMP by a single process?
  2006-05-12 11:56       ` Ingo Molnar
  2006-05-12 11:59         ` Mark Hounschell
@ 2006-05-12 12:53         ` Mark Hounschell
  2006-05-12 13:03           ` Steven Rostedt
  1 sibling, 1 reply; 16+ messages in thread
From: Mark Hounschell @ 2006-05-12 12:53 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Steven Rostedt, john stultz, lkml, Thomas Gleixner

Ingo Molnar wrote:
> * Mark Hounschell <markh@compro.net> wrote:
> 
>> Steven Rostedt wrote:
>>> On Fri, 12 May 2006, Ingo Molnar wrote:
>>>
>>>> ah. This actually uncovered a real bug. We were calling __do_softirq()
>>>> with interrupts enabled (and being preemptible) - which is certainly
>>>> bad.
>>> Hmm, I wonder if this is also affecting Mark's problem.
>> I thought the same thing when I read it??
> 
> could you try the patch below?
> 
> 	Ingo
> 
> ----
> 
> * Steven Rostedt <rostedt@goodmis.org> wrote:
> 
>>> one solution would be to forbid disable_irq() from softirq contexts, and
>>> to convert the vortex timeout function to a workqueue and use the
>>> *_delayed_work() APIs to drive it - and cross fingers there's not many
>>> places to fix.
>> I prefer the above. Maybe even add a WARN_ON(in_softirq()) in 
>> disable_irq.
>>
>> But I must admit, I wouldn't know how to make that change without 
>> spending more time on it then I have for this.
> 
> the simplest fix for now would be to use the _nosync variant in the 
> vortex timeout function.
> 
> Mark, does this fix the problem?
> 
> 	Ingo
> 
> Index: linux-rt.q/drivers/net/3c59x.c
> ===================================================================
> --- linux-rt.q.orig/drivers/net/3c59x.c
> +++ linux-rt.q/drivers/net/3c59x.c
> @@ -1897,7 +1897,8 @@ vortex_timer(unsigned long data)
>  
>  	if (vp->medialock)
>  		goto leave_media_alone;
> -	disable_irq(dev->irq);
> +	/* hack! */
> +	disable_irq_nosync(dev->irq);
>  	old_window = ioread16(ioaddr + EL3_CMD) >> 13;
>  	EL3WINDOW(4);
>  	media_status = ioread16(ioaddr + Wn4_Media);
> 

Yes it does.

Mark

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC][PATCH -rt] irqd starvation on SMP by a single process?
  2006-05-12 12:53         ` Mark Hounschell
@ 2006-05-12 13:03           ` Steven Rostedt
  2006-05-12 13:06             ` Mark Hounschell
  2006-05-12 13:19             ` Mark Hounschell
  0 siblings, 2 replies; 16+ messages in thread
From: Steven Rostedt @ 2006-05-12 13:03 UTC (permalink / raw)
  To: Mark Hounschell; +Cc: Ingo Molnar, john stultz, lkml, Thomas Gleixner


On Fri, 12 May 2006, Mark Hounschell wrote:

> >
> > Mark, does this fix the problem?
> >
> > 	Ingo
> >
> > Index: linux-rt.q/drivers/net/3c59x.c
> > ===================================================================
> > --- linux-rt.q.orig/drivers/net/3c59x.c
> > +++ linux-rt.q/drivers/net/3c59x.c
> > @@ -1897,7 +1897,8 @@ vortex_timer(unsigned long data)
> >
> >  	if (vp->medialock)
> >  		goto leave_media_alone;
> > -	disable_irq(dev->irq);
> > +	/* hack! */
> > +	disable_irq_nosync(dev->irq);
> >  	old_window = ioread16(ioaddr + EL3_CMD) >> 13;
> >  	EL3WINDOW(4);
> >  	media_status = ioread16(ioaddr + Wn4_Media);
> >
>
> Yes it does.
>


It fixes it for both "complete preemption" and "normal preemption"?

-- Steve


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC][PATCH -rt] irqd starvation on SMP by a single process?
  2006-05-12 13:03           ` Steven Rostedt
@ 2006-05-12 13:06             ` Mark Hounschell
  2006-05-12 13:19             ` Mark Hounschell
  1 sibling, 0 replies; 16+ messages in thread
From: Mark Hounschell @ 2006-05-12 13:06 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Ingo Molnar, john stultz, lkml, Thomas Gleixner

Steven Rostedt wrote:
> On Fri, 12 May 2006, Mark Hounschell wrote:
> 
>>> Mark, does this fix the problem?
>>>
>>> 	Ingo
>>>
>>> Index: linux-rt.q/drivers/net/3c59x.c
>>> ===================================================================
>>> --- linux-rt.q.orig/drivers/net/3c59x.c
>>> +++ linux-rt.q/drivers/net/3c59x.c
>>> @@ -1897,7 +1897,8 @@ vortex_timer(unsigned long data)
>>>
>>>  	if (vp->medialock)
>>>  		goto leave_media_alone;
>>> -	disable_irq(dev->irq);
>>> +	/* hack! */
>>> +	disable_irq_nosync(dev->irq);
>>>  	old_window = ioread16(ioaddr + EL3_CMD) >> 13;
>>>  	EL3WINDOW(4);
>>>  	media_status = ioread16(ioaddr + Wn4_Media);
>>>
>> Yes it does.
>>
> 
> 
> It fixes it for both "complete preemption" and "normal preemption"?
> 
> -- Steve
> 
> 

Normal for sure. I just rebooted to test complete.

Mark

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC][PATCH -rt] irqd starvation on SMP by a single process?
  2006-05-12 13:03           ` Steven Rostedt
  2006-05-12 13:06             ` Mark Hounschell
@ 2006-05-12 13:19             ` Mark Hounschell
  1 sibling, 0 replies; 16+ messages in thread
From: Mark Hounschell @ 2006-05-12 13:19 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Ingo Molnar, john stultz, lkml, Thomas Gleixner

Steven Rostedt wrote:
> On Fri, 12 May 2006, Mark Hounschell wrote:
> 
>>> Mark, does this fix the problem?
>>>
>>> 	Ingo
>>>
>>> Index: linux-rt.q/drivers/net/3c59x.c
>>> ===================================================================
>>> --- linux-rt.q.orig/drivers/net/3c59x.c
>>> +++ linux-rt.q/drivers/net/3c59x.c
>>> @@ -1897,7 +1897,8 @@ vortex_timer(unsigned long data)
>>>
>>>  	if (vp->medialock)
>>>  		goto leave_media_alone;
>>> -	disable_irq(dev->irq);
>>> +	/* hack! */
>>> +	disable_irq_nosync(dev->irq);
>>>  	old_window = ioread16(ioaddr + EL3_CMD) >> 13;
>>>  	EL3WINDOW(4);
>>>  	media_status = ioread16(ioaddr + Wn4_Media);
>>>
>> Yes it does.
>>
> 
> 
> It fixes it for both "complete preemption" and "normal preemption"?
> 
> -- Steve
> 
> 

Nope, I still have the complete preemption problem. My box locked up
right away. Had to reboot it.

You guys know best but I think the problem the above patch fixes is
probably not related to my 'complete preemption' problem or the problem
reported by John. But my 'complete preemption' problem may certainly be
the same as Johns.

Mark

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC][PATCH -rt] irqd starvation on SMP by a single process?
  2006-05-12  5:50 ` Ingo Molnar
  2006-05-12  7:59   ` Steven Rostedt
@ 2006-05-12 17:47   ` john stultz
  2006-05-12 18:04   ` john stultz
  2 siblings, 0 replies; 16+ messages in thread
From: john stultz @ 2006-05-12 17:47 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: lkml, Thomas Gleixner, Steven Rostedt

On Fri, 2006-05-12 at 07:50 +0200, Ingo Molnar wrote:
> * john stultz <johnstul@us.ibm.com> wrote:
> > +		if(!cpus_equal(current->cpus_allowed, irq_affinity[irq]));
> > +			set_cpus_allowed(current, irq_affinity[irq]);
> 
> > The patch below appears to correct this issue, however it also
> > repeatedly(on different irqs) causes the following BUG:
> 
> ah. This actually uncovered a real bug. We were calling __do_softirq() 
> with interrupts enabled (and being preemptible) - which is certainly 
> bad.
> 
> this was hidden before because the smp_processor_id() debugging code 
> handles tasks bound to a single CPU as per-cpu-safe.
> 
> could you check the (totally untested) patch below and see if that fixes 
> things for you? I've also added your affinity change.

Yep, no BUG messages and I get irq affinity behavior that matches what I
echo into the proc interface.

Looks good to me so far. I'll keep running w/ it and let you know if we
see any issues.

thanks
-john


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC][PATCH -rt] irqd starvation on SMP by a single process?
  2006-05-12  5:50 ` Ingo Molnar
  2006-05-12  7:59   ` Steven Rostedt
  2006-05-12 17:47   ` john stultz
@ 2006-05-12 18:04   ` john stultz
  2006-05-12 19:16     ` Mark Hounschell
  2006-05-12 20:04     ` Mark Hounschell
  2 siblings, 2 replies; 16+ messages in thread
From: john stultz @ 2006-05-12 18:04 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: lkml, Thomas Gleixner, Steven Rostedt

On Fri, 2006-05-12 at 07:50 +0200, Ingo Molnar wrote:
> +		if (!cpus_equal(current->cpus_allowed, irq_affinity[irq]));
> +			set_cpus_allowed(current, irq_affinity[irq]);

Gah! I introduced a terrible bug there. 

Note the semi-colon at the end of the if statement! Sorry about that!

The following patch (which I've tested as well) fixes that.

--- 2.6-rt/kernel/irq/manage.c	2006-05-11 18:37:36.000000000 -0500
+++ dev-rt/kernel/irq/manage.c	2006-05-12 12:55:56.000000000 -0500
@@ -724,6 +724,7 @@
 		set_current_state(TASK_INTERRUPTIBLE);
 		do_hardirq(desc);
 		cond_resched_all();
+		local_irq_disable();
 		__do_softirq();
 //		do_softirq_from_hardirq();
 		local_irq_enable();
@@ -731,10 +732,8 @@
 		/*
 		 * Did IRQ affinities change?
 		 */
-		if (!cpu_isset(smp_processor_id(), irq_affinity[irq])) {
-			mask = cpumask_of_cpu(any_online_cpu(irq_affinity[irq]));
-			set_cpus_allowed(current, mask);
-		}
+		if(!cpus_equal(current->cpus_allowed, irq_affinity[irq]))
+			set_cpus_allowed(current, irq_affinity[irq]);
 #endif
 		schedule();
 	}





^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC][PATCH -rt] irqd starvation on SMP by a single process?
  2006-05-12 18:04   ` john stultz
@ 2006-05-12 19:16     ` Mark Hounschell
  2006-05-12 20:04     ` Mark Hounschell
  1 sibling, 0 replies; 16+ messages in thread
From: Mark Hounschell @ 2006-05-12 19:16 UTC (permalink / raw)
  To: john stultz; +Cc: Ingo Molnar, lkml, Thomas Gleixner, Steven Rostedt

john stultz wrote:
 > --- 2.6-rt/kernel/irq/manage.c	2006-05-11 18:37:36.000000000 -0500
> +++ dev-rt/kernel/irq/manage.c	2006-05-12 12:55:56.000000000 -0500
> @@ -724,6 +724,7 @@
>  		set_current_state(TASK_INTERRUPTIBLE);
>  		do_hardirq(desc);
>  		cond_resched_all();
> +		local_irq_disable();
>  		__do_softirq();
>  //		do_softirq_from_hardirq();
>  		local_irq_enable();
> @@ -731,10 +732,8 @@
>  		/*
>  		 * Did IRQ affinities change?
>  		 */
> -		if (!cpu_isset(smp_processor_id(), irq_affinity[irq])) {
> -			mask = cpumask_of_cpu(any_online_cpu(irq_affinity[irq]));
> -			set_cpus_allowed(current, mask);
> -		}
> +		if(!cpus_equal(current->cpus_allowed, irq_affinity[irq]))
> +			set_cpus_allowed(current, irq_affinity[irq]);
>  #endif
>  		schedule();
>  	}
> 
> 
> 
> 

Steve,

 This patch seems to have made a huge difference in my 'complete
preemption' mode 'pauses' problem.

>From many seconds, even minutes at a time, to 5 seconds and less when it
even does happen???

It does still happen however.

Mark




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC][PATCH -rt] irqd starvation on SMP by a single process?
  2006-05-12 18:04   ` john stultz
  2006-05-12 19:16     ` Mark Hounschell
@ 2006-05-12 20:04     ` Mark Hounschell
  1 sibling, 0 replies; 16+ messages in thread
From: Mark Hounschell @ 2006-05-12 20:04 UTC (permalink / raw)
  To: john stultz; +Cc: Ingo Molnar, lkml, Thomas Gleixner, Steven Rostedt

john stultz wrote:
> On Fri, 2006-05-12 at 07:50 +0200, Ingo Molnar wrote:
>> +		if (!cpus_equal(current->cpus_allowed, irq_affinity[irq]));
>> +			set_cpus_allowed(current, irq_affinity[irq]);
> 
> Gah! I introduced a terrible bug there. 
> 
> Note the semi-colon at the end of the if statement! Sorry about that!
> 
> The following patch (which I've tested as well) fixes that.
> 
> --- 2.6-rt/kernel/irq/manage.c	2006-05-11 18:37:36.000000000 -0500
> +++ dev-rt/kernel/irq/manage.c	2006-05-12 12:55:56.000000000 -0500
> @@ -724,6 +724,7 @@
>  		set_current_state(TASK_INTERRUPTIBLE);
>  		do_hardirq(desc);
>  		cond_resched_all();
> +		local_irq_disable();
>  		__do_softirq();
>  //		do_softirq_from_hardirq();
>  		local_irq_enable();
> @@ -731,10 +732,8 @@
>  		/*
>  		 * Did IRQ affinities change?
>  		 */
> -		if (!cpu_isset(smp_processor_id(), irq_affinity[irq])) {
> -			mask = cpumask_of_cpu(any_online_cpu(irq_affinity[irq]));
> -			set_cpus_allowed(current, mask);
> -		}
> +		if(!cpus_equal(current->cpus_allowed, irq_affinity[irq]))
> +			set_cpus_allowed(current, irq_affinity[irq]);
>  #endif
>  		schedule();
>  	}
> 
> 
> 
> 

FYI,

I just looked at rt21 and the first version of this patch seems to be in
it. Not this version.

Mark


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [RFC][PATCH -rt] irqd starvation on SMP by a single process?
@ 2006-05-12  4:22 Edward Killips
  0 siblings, 0 replies; 16+ messages in thread
From: Edward Killips @ 2006-05-12  4:22 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: lkml, Thomas Gleixner, Steven Rostedt

Hey Ingo,

	I've been seeing some odd behavior recently w/ the irq smp_affinity
masks. With a single runaway -rt thread, we were getting hard hangs on
an SMP system.

Clearly, if the runaway -rt thread is running on a cpu that is getting
critical interrupts (scsi, network, etc), a system hang would be
expected. However I started playing with the proc/irq/<num>/smp_affinity
masks and got some odd behavior.

Using taskset and a small script that sets all the irq smp_affinity
values, I got the following.

./irqbind.sh 1 ; taskset 1 ./runaway  = HANG (as expected)
./irqbind.sh 1 ; taskset 2 ./runaway  = NO HANG (as expected)

./irqbind.sh 2 ; taskset 1 ./runaway  = NO HANG (as expected)
./irqbind.sh 2 ; taskset 2 ./runaway  = HANG (as expected)

Everything is cool. However, I figured we could set the irq smp_affinity
mask to multiple cpus so a runaway thread could not cause a hang.

./irqbind.sh 3 ; taskset 1 ./runaway  = Sometimes hang
./irqbind.sh 3 ; taskset 2 ./runaway  = !Sometime hang

Basically it appeared that while the irq smp_affinity mask was set for
both cpus, the irq was really being delivered to only one cpu. Looking
at the code, sure enough, that is the case.

The patch below appears to correct this issue, however it also
repeatedly(on different irqs) causes the following BUG:

BUG: using smp_processor_id() in preemptible [00000000] code: IRQ  
24/1518
caller is __do_softirq+0x10/0x51					
  [<c0263bc3>] debug_smp_processor_id+0x7b/0x88 (8)
  [<c0126936>] __do_softirq+0x10/0x51 (12)	
  [<c0141195>] do_irqd+0xd8/0x10e (12)	
  [<c013250f>] kthread+0x7c/0xa6 (24)
  [<c0132493>] kthread+0x0/0xa6 (20)
  [<c0101041>] kernel_thread_helper+0x5/0xb (16)
---------------------------		
| preempt count: 00000001 ]
| 1-level deep critical section nesting:
----------------------------------------
.. [<c0263b83>] .... debug_smp_processor_id+0x3b/0x88
.....[<00000000>] ..   ( <= rest_init+0x3feffd64/0x3d)
						
------------------------------
| showing all locks held by: |	(IRQ 24/1518 [f7b11070,  60]):


Thoughts?
-john


--- 2.6-rt/kernel/irq/manage.c	2006-05-11 18:37:36.000000000 -0500
+++ dev-rt/kernel/irq/manage.c	2006-05-11 21:09:10.000000000 -0500
@@ -731,10 +731,8 @@
  		/*
  		 * Did IRQ affinities change?
  		 */
-		if (!cpu_isset(smp_processor_id(), irq_affinity[irq])) {
-			mask = cpumask_of_cpu(any_online_cpu(irq_affinity[irq]));
-			set_cpus_allowed(current, mask);
-		}
+		if(!cpus_equal(current->cpus_allowed, irq_affinity[irq]));
+			set_cpus_allowed(current, irq_affinity[irq]);
  #endif
  		schedule();
  	}


-
To unsubscribe from this list: send the line "unsubscribe linux- 
kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2006-05-12 20:05 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-05-12  2:43 [RFC][PATCH -rt] irqd starvation on SMP by a single process? john stultz
2006-05-12  5:50 ` Ingo Molnar
2006-05-12  7:59   ` Steven Rostedt
2006-05-12  8:10     ` Ingo Molnar
2006-05-12 11:39     ` Mark Hounschell
2006-05-12 11:56       ` Ingo Molnar
2006-05-12 11:59         ` Mark Hounschell
2006-05-12 12:53         ` Mark Hounschell
2006-05-12 13:03           ` Steven Rostedt
2006-05-12 13:06             ` Mark Hounschell
2006-05-12 13:19             ` Mark Hounschell
2006-05-12 17:47   ` john stultz
2006-05-12 18:04   ` john stultz
2006-05-12 19:16     ` Mark Hounschell
2006-05-12 20:04     ` Mark Hounschell
2006-05-12  4:22 Edward Killips

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).