linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RT] fix migrating softirq [cause of network hang]
@ 2007-06-13 12:47 Steven Rostedt
  2007-06-14  7:02 ` Darren Hart
  0 siblings, 1 reply; 7+ messages in thread
From: Steven Rostedt @ 2007-06-13 12:47 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: LKML, RT, Thomas Gleixner, john stultz

Softirqs are bound to a single CPU.  That is to say, that once a softirq
function starts to run, it will stay on the CPU that it is running on
while it's running.

In RT, softirqs are threads, and we have a softirq thread per cpu. Each
softirq thread is bound to a single CPU that it represents.

In order to speed things up and lower context switches in RT, if a
softirq thread is of the same priority as an interrupt thread, then when
the interrupt thread is about to exit, it tests to see if any softirq
threads need to be run on that cpu. Instead of running the softirq
thread, it simply performs the functions for the softirq within the
interrupt thread.

The problem is, nothing prevents the interrupt thread from migrating.

So while the interrupt thread is running the softirq function, it may
migrate to another CPU in the middle of that function. This means that
any CPU data that the softirq is touching can be corrupted.

I was experiencing a network hang that sometimes would come back, and
sometimes not. Using my logdev debugger, I started to debug this
problem.  I came across this at the moment of the hang:


[  389.131279] cpu:0 (IRQ-11:427) tcp_rcv_established:4056 rcv_nxt=-1665585797
[  389.131615] cpu:1 192.168.23.72:22 <== 192.168.23.60:41352 ack:2629381499 seq:1773074099 (----A-) len:0 win:790 end_seq:1773074099
[  389.131626] cpu:1 (IRQ-11:427) ip_finish_output2:187 dst->hh=ffff81003b213080
[  389.131635] cpu:1 (IRQ-11:427) ip_finish_output2:189 hh_output=ffffffff80429009


Here we see IRQ-11 in the process of finishing up the softirq-net-tx
function. In the middle of it, we receive a packet, and that must have
pushed the interrupt thread over to CPU 1, and it finished up the
softirq there.

This patch temporarily binds the hardirq thread on the CPU that it runs
the softirqs on.  With this patch I have not seen my network hang. I ran
it over night, doing compiles and such, and it seems fine. I would be
able to cause the hang with various loads within a minute, now I can't
cause it after several minutes.

I'm assuming that this fix may fix other bugs too.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

Index: linux-2.6.21.4-rt12/kernel/softirq.c
===================================================================
--- linux-2.6.21.4-rt12.orig/kernel/softirq.c
+++ linux-2.6.21.4-rt12/kernel/softirq.c
@@ -426,8 +426,25 @@ void do_softirq_from_hardirq(void)
 	current->flags &= ~PF_HARDIRQ;
 	current->flags |= PF_SOFTIRQ;
 
+#ifdef CONFIG_PREEMPT_HARDIRQS
+	{
+	cpumask_t save_cpus_allowed, cpus_allowed;
+
+	/* Don't let ourselves migrate */
+	save_cpus_allowed = current->cpus_allowed;
+	cpu_set(smp_processor_id(), cpus_allowed);
+	set_cpus_allowed(current, cpus_allowed);
+#endif
+
 	___do_softirq(1);
 
+#ifdef CONFIG_PREEMPT_HARDIRQS
+	/* Put back our original mask. */
+	WARN_ON(!cpus_equal(current->cpus_allowed, cpus_allowed));
+	set_cpus_allowed(current, save_cpus_allowed);
+	}
+#endif
+
 #ifndef CONFIG_PREEMPT_SOFTIRQS
 	trace_softirq_exit();
 #endif



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH RT] fix migrating softirq [cause of network hang]
  2007-06-13 12:47 [PATCH RT] fix migrating softirq [cause of network hang] Steven Rostedt
@ 2007-06-14  7:02 ` Darren Hart
  2007-06-14 13:08   ` Steven Rostedt
  0 siblings, 1 reply; 7+ messages in thread
From: Darren Hart @ 2007-06-14  7:02 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Ingo Molnar, LKML, RT, Thomas Gleixner, john stultz

On Wednesday 13 June 2007 05:47:16 Steven Rostedt wrote:

> This patch temporarily binds the hardirq thread on the CPU that it runs
> the softirqs on.  With this patch I have not seen my network hang. I ran
> it over night, doing compiles and such, and it seems fine. I would be
> able to cause the hang with various loads within a minute, now I can't
> cause it after several minutes.
>
> I'm assuming that this fix may fix other bugs too.
>

I ran 2.6.21-rt10 + this patch with kernbench and then loops building a kernel 
(which would previously rapidly exhaust memory).  It successfully built 400 
kernels with "make -j 32" on an 8 opteron machine with 8 GB of RAM and has so 
far completed 25 "make -j" on the same system.  Looks like a winner to me so 
far.

Thanks,

-- 
Darren Hart
IBM Linux Technology Center
Realtime Linux Team

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH RT] fix migrating softirq [cause of network hang]
  2007-06-14  7:02 ` Darren Hart
@ 2007-06-14 13:08   ` Steven Rostedt
  0 siblings, 0 replies; 7+ messages in thread
From: Steven Rostedt @ 2007-06-14 13:08 UTC (permalink / raw)
  To: Darren Hart; +Cc: Ingo Molnar, LKML, RT, Thomas Gleixner, john stultz

On Thu, 14 Jun 2007, Darren Hart wrote:

> On Wednesday 13 June 2007 05:47:16 Steven Rostedt wrote:
>
> >
> > I'm assuming that this fix may fix other bugs too.
> >
>
> I ran 2.6.21-rt10 + this patch with kernbench and then loops building a kernel
> (which would previously rapidly exhaust memory).  It successfully built 400
> kernels with "make -j 32" on an 8 opteron machine with 8 GB of RAM and has so
> far completed 25 "make -j" on the same system.  Looks like a winner to me so
> far.
>


This is great news! Darren (and team), thanks a lot for testing it.

-- Steve


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH RT] fix migrating softirq [cause of network hang]
  2007-06-13 22:20   ` Shane
@ 2007-06-13 23:55     ` Steven Rostedt
  0 siblings, 0 replies; 7+ messages in thread
From: Steven Rostedt @ 2007-06-13 23:55 UTC (permalink / raw)
  To: Shane; +Cc: linux-kernel, linux-rt-users, mingo

On Wed, 13 Jun 2007, Shane wrote:

> On 6/13/07, Steven Rostedt <rostedt@goodmis.org> wrote:
> > > I have applied this patch to 2.6.21.4-rt13 and I think it might have
> > > solved my bizarre nfs hangage issues that cropped up lately. Thanks
> > > for tracking that down it was driving me nutz!
> > >
> > > It has only been a few hours, but if you don't hear back from me in
> > > the next day or so then it's good to go IMVHO. :)
> > >
> >
> > Hi Shane,
> >
> > That's great to hear!  Well, if it does go well, could you post to LKML
> > and linux-rt mailing lists, as well as CC'ng Ingo. Good news like this
> > should be shared ;-)
> >
> > Heh, even if it doesn't go well, you can post to the above too.
>
> Heh, I spoke too soon. This patch doesn't seem to help my problem after all. :/
>

Really?  Darn.

So it seemed to work? Just curious to why you thought it worked and but it
didn't?

-- Steve


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH RT] fix migrating softirq [cause of network hang]
  2007-06-13 21:26 ` Steven Rostedt
@ 2007-06-13 22:20   ` Shane
  2007-06-13 23:55     ` Steven Rostedt
  0 siblings, 1 reply; 7+ messages in thread
From: Shane @ 2007-06-13 22:20 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: linux-kernel, linux-rt-users, mingo

On 6/13/07, Steven Rostedt <rostedt@goodmis.org> wrote:
> > I have applied this patch to 2.6.21.4-rt13 and I think it might have
> > solved my bizarre nfs hangage issues that cropped up lately. Thanks
> > for tracking that down it was driving me nutz!
> >
> > It has only been a few hours, but if you don't hear back from me in
> > the next day or so then it's good to go IMVHO. :)
> >
>
> Hi Shane,
>
> That's great to hear!  Well, if it does go well, could you post to LKML
> and linux-rt mailing lists, as well as CC'ng Ingo. Good news like this
> should be shared ;-)
>
> Heh, even if it doesn't go well, you can post to the above too.

Heh, I spoke too soon. This patch doesn't seem to help my problem after all. :/

Shane

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH RT] fix migrating softirq [cause of network hang]
  2007-06-13 21:14 Shane
@ 2007-06-13 21:26 ` Steven Rostedt
  2007-06-13 22:20   ` Shane
  0 siblings, 1 reply; 7+ messages in thread
From: Steven Rostedt @ 2007-06-13 21:26 UTC (permalink / raw)
  To: Shane; +Cc: linux-kernel

On Wed, 13 Jun 2007, Shane wrote:

> Hi Steven,
>
> I have applied this patch to 2.6.21.4-rt13 and I think it might have
> solved my bizarre nfs hangage issues that cropped up lately. Thanks
> for tracking that down it was driving me nutz!
>
> It has only been a few hours, but if you don't hear back from me in
> the next day or so then it's good to go IMVHO. :)
>

Hi Shane,

That's great to hear!  Well, if it does go well, could you post to LKML
and linux-rt mailing lists, as well as CC'ng Ingo. Good news like this
should be shared ;-)

Heh, even if it doesn't go well, you can post to the above too.

-- Steve


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH RT] fix migrating softirq [cause of network hang]
@ 2007-06-13 21:14 Shane
  2007-06-13 21:26 ` Steven Rostedt
  0 siblings, 1 reply; 7+ messages in thread
From: Shane @ 2007-06-13 21:14 UTC (permalink / raw)
  To: linux-kernel; +Cc: rostedt

Hi Steven,

I have applied this patch to 2.6.21.4-rt13 and I think it might have
solved my bizarre nfs hangage issues that cropped up lately. Thanks
for tracking that down it was driving me nutz!

It has only been a few hours, but if you don't hear back from me in
the next day or so then it's good to go IMVHO. :)

Shane

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2007-06-14 13:09 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-06-13 12:47 [PATCH RT] fix migrating softirq [cause of network hang] Steven Rostedt
2007-06-14  7:02 ` Darren Hart
2007-06-14 13:08   ` Steven Rostedt
2007-06-13 21:14 Shane
2007-06-13 21:26 ` Steven Rostedt
2007-06-13 22:20   ` Shane
2007-06-13 23:55     ` Steven Rostedt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).