All of lore.kernel.org
 help / color / mirror / Atom feed
* question about softirqs
@ 2009-05-08 22:51 Chris Friesen
  2009-05-08 23:05 ` David Miller
  2009-05-08 23:34 ` Paul Mackerras
  0 siblings, 2 replies; 57+ messages in thread
From: Chris Friesen @ 2009-05-08 22:51 UTC (permalink / raw)
  To: linuxppc-dev

Hi all,

I'm trying to figure out where exactly softirqs are called on return 
from a syscall in 64-bit powerpc.  I can see where they get called for a 
normal interrupt via the irq_exit() path, but not for syscalls.

I'm sure I'm missing something obvious...can anyone help?

Thanks,

Chris

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: question about softirqs
  2009-05-08 22:51 question about softirqs Chris Friesen
@ 2009-05-08 23:05 ` David Miller
  2009-05-08 23:34 ` Paul Mackerras
  1 sibling, 0 replies; 57+ messages in thread
From: David Miller @ 2009-05-08 23:05 UTC (permalink / raw)
  To: cfriesen; +Cc: linuxppc-dev

From: "Chris Friesen" <cfriesen@nortel.com>
Date: Fri, 08 May 2009 16:51:25 -0600

> I'm trying to figure out where exactly softirqs are called on return
> from a syscall in 64-bit powerpc.  I can see where they get called for
> a normal interrupt via the irq_exit() path, but not for syscalls.
> 
> I'm sure I'm missing something obvious...can anyone help?

I can't see where it does this either, strange.

That would be a very terrible bug if it's not invoking
pending softirqs before return from system calls.

Although, it might be happening via some clever side effect
of how the software managed hardware interrupt stuff works
on powerpc.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: question about softirqs
  2009-05-08 22:51 question about softirqs Chris Friesen
  2009-05-08 23:05 ` David Miller
@ 2009-05-08 23:34 ` Paul Mackerras
  2009-05-08 23:53   ` David Miller
  2009-05-09  0:28   ` Chris Friesen
  1 sibling, 2 replies; 57+ messages in thread
From: Paul Mackerras @ 2009-05-08 23:34 UTC (permalink / raw)
  To: Chris Friesen; +Cc: linuxppc-dev

Chris Friesen writes:

> I'm trying to figure out where exactly softirqs are called on return 
> from a syscall in 64-bit powerpc.  I can see where they get called for a 
> normal interrupt via the irq_exit() path, but not for syscalls.

If a soft irq is raised in process context, raise_softirq() in
kernel/softirq.c calls wakeup_softirqd() to make sure that ksoftirqd
runs soon to process the soft irq.  So what would happen is that we
would see the TIF_RESCHED_PENDING flag on the current task in the
syscall exit path and call schedule() which would switch to ksoftirqd
to process the soft irq (if it hasn't already been processed by that
stage).

If the soft irq is raised in interrupt context, then the soft irq gets
run via the do_softirq() call in irq_exit(), as you saw.

The soft irq stuff is pretty much all generic code these days, except
for the code to switch to the softirq stack.

Paul.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: question about softirqs
  2009-05-08 23:34 ` Paul Mackerras
@ 2009-05-08 23:53   ` David Miller
  2009-05-09  2:52     ` Benjamin Herrenschmidt
  2009-05-09  3:31     ` Paul Mackerras
  2009-05-09  0:28   ` Chris Friesen
  1 sibling, 2 replies; 57+ messages in thread
From: David Miller @ 2009-05-08 23:53 UTC (permalink / raw)
  To: paulus; +Cc: linuxppc-dev

From: Paul Mackerras <paulus@samba.org>
Date: Sat, 9 May 2009 09:34:29 +1000

> If a soft irq is raised in process context, raise_softirq() in
> kernel/softirq.c calls wakeup_softirqd() to make sure that ksoftirqd
> runs soon to process the soft irq.  So what would happen is that we
> would see the TIF_RESCHED_PENDING flag on the current task in the
> syscall exit path and call schedule() which would switch to ksoftirqd
> to process the soft irq (if it hasn't already been processed by that
> stage).
> 
> If the soft irq is raised in interrupt context, then the soft irq gets
> run via the do_softirq() call in irq_exit(), as you saw.
> 
> The soft irq stuff is pretty much all generic code these days, except
> for the code to switch to the softirq stack.

Grumble, when did that happen :-(

That's horrible for latency compared to handling it directly
in the trap return path.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: question about softirqs
  2009-05-08 23:34 ` Paul Mackerras
  2009-05-08 23:53   ` David Miller
@ 2009-05-09  0:28   ` Chris Friesen
  1 sibling, 0 replies; 57+ messages in thread
From: Chris Friesen @ 2009-05-09  0:28 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev

Paul Mackerras wrote:

> If a soft irq is raised in process context, raise_softirq() in
> kernel/softirq.c calls wakeup_softirqd() to make sure that ksoftirqd
> runs soon to process the soft irq.  So what would happen is that we
> would see the TIF_RESCHED_PENDING flag on the current task in the
> syscall exit path and call schedule() which would switch to ksoftirqd
> to process the soft irq (if it hasn't already been processed by that
> stage).

I think I see a problem with this.  Suppose I have a SCHED_FIFO task 
spinning on recvmsg() with MSG_DONTWAIT set (and maybe doing other stuff 
if there are no messages).  Under the scenario you described, schedule() 
would re-run the spinning task, no?  This could prevent any incoming 
packets from actually being sent up the stack until we get a real 
hardware interrupt--which could be a whole jiffy if interrupt mitigation 
is enabled in the net device.

Chris

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: question about softirqs
  2009-05-08 23:53   ` David Miller
@ 2009-05-09  2:52     ` Benjamin Herrenschmidt
  2009-05-09  3:31     ` Paul Mackerras
  1 sibling, 0 replies; 57+ messages in thread
From: Benjamin Herrenschmidt @ 2009-05-09  2:52 UTC (permalink / raw)
  To: David Miller; +Cc: linuxppc-dev, paulus


> > The soft irq stuff is pretty much all generic code these days, except
> > for the code to switch to the softirq stack.
> 
> Grumble, when did that happen :-(
> 
> That's horrible for latency compared to handling it directly
> in the trap return path.

If it is indeed such a problem, it would be reasonably easy to
handle it in the return-to-userspace path around the same place
where we test for pending signals (isn't what we used to do
anyway ?)

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: question about softirqs
  2009-05-08 23:53   ` David Miller
  2009-05-09  2:52     ` Benjamin Herrenschmidt
@ 2009-05-09  3:31     ` Paul Mackerras
  2009-05-09  6:48       ` David Miller
  1 sibling, 1 reply; 57+ messages in thread
From: Paul Mackerras @ 2009-05-09  3:31 UTC (permalink / raw)
  To: David Miller; +Cc: linuxppc-dev

David Miller writes:

> Grumble, when did that happen :-(

Ages ago (i.e. before the switch to git :).  Talk to Ingo, it's his
doing IIRC.

> That's horrible for latency compared to handling it directly
> in the trap return path.

Actually, I don't know why we ever let there be softirqs pending when
we're in process context.  I would think that we should just call
do_softirq immediately if we raise a softirq when !in_interrupt().
But I might be missing some subtlety.

Paul.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: question about softirqs
  2009-05-09  3:31     ` Paul Mackerras
@ 2009-05-09  6:48       ` David Miller
  2009-05-11 18:25         ` Chris Friesen
  0 siblings, 1 reply; 57+ messages in thread
From: David Miller @ 2009-05-09  6:48 UTC (permalink / raw)
  To: paulus; +Cc: linuxppc-dev

From: Paul Mackerras <paulus@samba.org>
Date: Sat, 9 May 2009 13:31:23 +1000

> David Miller writes:
> 
>> Grumble, when did that happen :-(
> 
> Ages ago (i.e. before the switch to git :).  Talk to Ingo, it's his
> doing IIRC.

I'll first do some data mining before coming to any (further)
conclusions :-)

>> That's horrible for latency compared to handling it directly
>> in the trap return path.
> 
> Actually, I don't know why we ever let there be softirqs pending when
> we're in process context.  I would think that we should just call
> do_softirq immediately if we raise a softirq when !in_interrupt().
> But I might be missing some subtlety.

I bet it was a non-starter before IRQ stacks.

It does seem like a good idea to me.

You know, for networking over loopback (one of the only real cases
that even matters, if we get a hard interrupt then the return from
that would process any softints), we probably make out just fine
anyways.  As long as we hit a local_bh_enable() (and in the return
path from device transmit that's exceedingly likely as all of the
networking locking is BH safe) we'll run the softints from that and
thus long before we get to syscall return.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: question about softirqs
  2009-05-09  6:48       ` David Miller
@ 2009-05-11 18:25         ` Chris Friesen
  2009-05-11 23:24           ` David Miller
  2009-05-11 23:34           ` Paul Mackerras
  0 siblings, 2 replies; 57+ messages in thread
From: Chris Friesen @ 2009-05-11 18:25 UTC (permalink / raw)
  To: David Miller; +Cc: linuxppc-dev, paulus

David Miller wrote:

> You know, for networking over loopback (one of the only real cases
> that even matters, if we get a hard interrupt then the return from
> that would process any softints), we probably make out just fine
> anyways.  As long as we hit a local_bh_enable() (and in the return
> path from device transmit that's exceedingly likely as all of the
> networking locking is BH safe) we'll run the softints from that and
> thus long before we get to syscall return.

What about the issue I raised earlier?  (I don't think you were copied
at that point.)

Suppose I have a SCHED_FIFO task spinning on recvmsg() with MSG_DONTWAIT
set (and maybe doing other stuff if there are no messages). In this
case, schedule() would re-run the spinning task rather than running
ksoftirqd. This could prevent any incoming packets from actually being
sent up the stack until we get a real hardware interrupt--which could be
a whole jiffy if interrupt mitigation is enabled in the net device.
(And maybe longer if NOHZ is enabled.)

Chris

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: question about softirqs
  2009-05-11 18:25         ` Chris Friesen
@ 2009-05-11 23:24           ` David Miller
  2009-05-12  0:43             ` Chris Friesen
  2009-05-11 23:34           ` Paul Mackerras
  1 sibling, 1 reply; 57+ messages in thread
From: David Miller @ 2009-05-11 23:24 UTC (permalink / raw)
  To: cfriesen; +Cc: linuxppc-dev, paulus

From: "Chris Friesen" <cfriesen@nortel.com>
Date: Mon, 11 May 2009 12:25:54 -0600

> David Miller wrote:
> 
>> You know, for networking over loopback (one of the only real cases
>> that even matters, if we get a hard interrupt then the return from
>> that would process any softints), we probably make out just fine
>> anyways.  As long as we hit a local_bh_enable() (and in the return
>> path from device transmit that's exceedingly likely as all of the
>> networking locking is BH safe) we'll run the softints from that and
>> thus long before we get to syscall return.
> 
> What about the issue I raised earlier?  (I don't think you were copied
> at that point.)

I'm sure all of the networking experts on linuxppc-dev will have
an answer.

And yes that was sarcasm :-)  You need to ask this on netdev or similar
list.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: question about softirqs
  2009-05-11 18:25         ` Chris Friesen
  2009-05-11 23:24           ` David Miller
@ 2009-05-11 23:34           ` Paul Mackerras
  1 sibling, 0 replies; 57+ messages in thread
From: Paul Mackerras @ 2009-05-11 23:34 UTC (permalink / raw)
  To: Chris Friesen; +Cc: linuxppc-dev, David Miller

Chris Friesen writes:

> Suppose I have a SCHED_FIFO task spinning on recvmsg() with MSG_DONTWAIT
> set (and maybe doing other stuff if there are no messages). In this
> case, schedule() would re-run the spinning task rather than running
> ksoftirqd. This could prevent any incoming packets from actually being
> sent up the stack until we get a real hardware interrupt--which could be
> a whole jiffy if interrupt mitigation is enabled in the net device.

I suggest you ask Ingo Molnar about that.

> (And maybe longer if NOHZ is enabled.)

We still have a timer interrupt every jiffy when stuff is running; we
only turn off the timer interrupts when idle.

Paul.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: question about softirqs
  2009-05-11 23:24           ` David Miller
@ 2009-05-12  0:43             ` Chris Friesen
  2009-05-12  8:12                 ` Ingo Molnar
  0 siblings, 1 reply; 57+ messages in thread
From: Chris Friesen @ 2009-05-12  0:43 UTC (permalink / raw)
  To: David Miller; +Cc: linuxppc-dev, Ingo Molnar, paulus, netdev


This started out as a thread on the ppc list, but on the suggestion of
DaveM and Paul Mackerras I'm expanding the receiver list a bit.

Currently, if a softirq is raised in process context the
TIF_RESCHED_PENDING flag gets set and on return to userspace we run the
scheduler, expecting it to switch to ksoftirqd to handle the softirqd
processing.

I think I see a possible problem with this. Suppose I have a SCHED_FIFO
task spinning on recvmsg() with MSG_DONTWAIT set. Under the scenario
above, schedule() would re-run the spinning task rather than ksoftirqd,
thus preventing any incoming packets from being sent up the stack until
we get a real hardware interrupt--which could be a whole jiffy if
interrupt mitigation is enabled in the net device.

DaveM pointed out that if we're doing transmits we're likely to hit
local_bh_enable(), which would process the softirq work.  However, I
think we may still have a problem in the above rx-only scenario--or is
it too contrived to matter?

Thanks,

Chris

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: question about softirqs
  2009-05-12  0:43             ` Chris Friesen
@ 2009-05-12  8:12                 ` Ingo Molnar
  0 siblings, 0 replies; 57+ messages in thread
From: Ingo Molnar @ 2009-05-12  8:12 UTC (permalink / raw)
  To: Chris Friesen, Peter Zijlstra, Thomas Gleixner, Steven Rostedt
  Cc: David Miller, linuxppc-dev, paulus, netdev


* Chris Friesen <cfriesen@nortel.com> wrote:

> This started out as a thread on the ppc list, but on the 
> suggestion of DaveM and Paul Mackerras I'm expanding the receiver 
> list a bit.
> 
> Currently, if a softirq is raised in process context the 
> TIF_RESCHED_PENDING flag gets set and on return to userspace we 
> run the scheduler, expecting it to switch to ksoftirqd to handle 
> the softirqd processing.
> 
> I think I see a possible problem with this. Suppose I have a 
> SCHED_FIFO task spinning on recvmsg() with MSG_DONTWAIT set. Under 
> the scenario above, schedule() would re-run the spinning task 
> rather than ksoftirqd, thus preventing any incoming packets from 
> being sent up the stack until we get a real hardware 
> interrupt--which could be a whole jiffy if interrupt mitigation is 
> enabled in the net device.

TIF_RESCHED_PENDING will not be set if a SCHED_FIFO task wakes up a 
SCHED_OTHER ksoftirqd task. But starvation of ksoftirqd processing 
will occur.

> DaveM pointed out that if we're doing transmits we're likely to 
> hit local_bh_enable(), which would process the softirq work.  
> However, I think we may still have a problem in the above rx-only 
> scenario--or is it too contrived to matter?

This could occur, and the problem is really that task priorities do 
not extend across softirq work processing.

This could occur in ordinary SCHED_OTHER tasks as well, if the 
softirq is bounced to ksoftirqd - which it only should be if there's 
serious softirq overload - or, as you describe it above, if the 
softirq is raised in process context:

        if (!in_interrupt())
                wakeup_softirqd();

that's not really clean. We look into eliminating process context 
use of raise_softirq_irqsoff(). Such code sequence:

	local_irq_save(flags);
	...
	raise_softirq_irqsoff(nr);
	...
	local_irq_restore(flags);

should be converted to something like:

	local_irq_save(flags);
	...
	raise_softirq_irqsoff(nr);
	...
	local_irq_restore(flags);
	recheck_softirqs();

If someone does not do proper local_bh_disable()/enable() sequences 
for micro-optimization reasons, then push the check to after the 
critcal section - and dont cause extra reschedules by waking up 
ksoftirqd. raise_softirq_irqsoff() will also be faster.

	Ingo

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: question about softirqs
@ 2009-05-12  8:12                 ` Ingo Molnar
  0 siblings, 0 replies; 57+ messages in thread
From: Ingo Molnar @ 2009-05-12  8:12 UTC (permalink / raw)
  To: Chris Friesen, Peter Zijlstra, Thomas Gleixner, Steven Rostedt
  Cc: linuxppc-dev, paulus, David Miller, netdev


* Chris Friesen <cfriesen@nortel.com> wrote:

> This started out as a thread on the ppc list, but on the 
> suggestion of DaveM and Paul Mackerras I'm expanding the receiver 
> list a bit.
> 
> Currently, if a softirq is raised in process context the 
> TIF_RESCHED_PENDING flag gets set and on return to userspace we 
> run the scheduler, expecting it to switch to ksoftirqd to handle 
> the softirqd processing.
> 
> I think I see a possible problem with this. Suppose I have a 
> SCHED_FIFO task spinning on recvmsg() with MSG_DONTWAIT set. Under 
> the scenario above, schedule() would re-run the spinning task 
> rather than ksoftirqd, thus preventing any incoming packets from 
> being sent up the stack until we get a real hardware 
> interrupt--which could be a whole jiffy if interrupt mitigation is 
> enabled in the net device.

TIF_RESCHED_PENDING will not be set if a SCHED_FIFO task wakes up a 
SCHED_OTHER ksoftirqd task. But starvation of ksoftirqd processing 
will occur.

> DaveM pointed out that if we're doing transmits we're likely to 
> hit local_bh_enable(), which would process the softirq work.  
> However, I think we may still have a problem in the above rx-only 
> scenario--or is it too contrived to matter?

This could occur, and the problem is really that task priorities do 
not extend across softirq work processing.

This could occur in ordinary SCHED_OTHER tasks as well, if the 
softirq is bounced to ksoftirqd - which it only should be if there's 
serious softirq overload - or, as you describe it above, if the 
softirq is raised in process context:

        if (!in_interrupt())
                wakeup_softirqd();

that's not really clean. We look into eliminating process context 
use of raise_softirq_irqsoff(). Such code sequence:

	local_irq_save(flags);
	...
	raise_softirq_irqsoff(nr);
	...
	local_irq_restore(flags);

should be converted to something like:

	local_irq_save(flags);
	...
	raise_softirq_irqsoff(nr);
	...
	local_irq_restore(flags);
	recheck_softirqs();

If someone does not do proper local_bh_disable()/enable() sequences 
for micro-optimization reasons, then push the check to after the 
critcal section - and dont cause extra reschedules by waking up 
ksoftirqd. raise_softirq_irqsoff() will also be faster.

	Ingo

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: question about softirqs
  2009-05-12  8:12                 ` Ingo Molnar
  (?)
@ 2009-05-12  9:12                 ` Peter Zijlstra
  2009-05-12  9:23                   ` Ingo Molnar
  2009-05-13  5:55                     ` Evgeniy Polyakov
  -1 siblings, 2 replies; 57+ messages in thread
From: Peter Zijlstra @ 2009-05-12  9:12 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linuxppc-dev, netdev, Steven Rostedt, paulus, Thomas Gleixner,
	David Miller

On Tue, 2009-05-12 at 10:12 +0200, Ingo Molnar wrote:
> * Chris Friesen <cfriesen@nortel.com> wrote:
> 
> > This started out as a thread on the ppc list, but on the 
> > suggestion of DaveM and Paul Mackerras I'm expanding the receiver 
> > list a bit.
> > 
> > Currently, if a softirq is raised in process context the 
> > TIF_RESCHED_PENDING flag gets set and on return to userspace we 
> > run the scheduler, expecting it to switch to ksoftirqd to handle 
> > the softirqd processing.
> > 
> > I think I see a possible problem with this. Suppose I have a 
> > SCHED_FIFO task spinning on recvmsg() with MSG_DONTWAIT set. Under 
> > the scenario above, schedule() would re-run the spinning task 
> > rather than ksoftirqd, thus preventing any incoming packets from 
> > being sent up the stack until we get a real hardware 
> > interrupt--which could be a whole jiffy if interrupt mitigation is 
> > enabled in the net device.
> 
> TIF_RESCHED_PENDING will not be set if a SCHED_FIFO task wakes up a 
> SCHED_OTHER ksoftirqd task. But starvation of ksoftirqd processing 
> will occur.
> 
> > DaveM pointed out that if we're doing transmits we're likely to 
> > hit local_bh_enable(), which would process the softirq work.  
> > However, I think we may still have a problem in the above rx-only 
> > scenario--or is it too contrived to matter?
> 
> This could occur, and the problem is really that task priorities do 
> not extend across softirq work processing.
> 
> This could occur in ordinary SCHED_OTHER tasks as well, if the 
> softirq is bounced to ksoftirqd - which it only should be if there's 
> serious softirq overload - or, as you describe it above, if the 
> softirq is raised in process context:
> 
>         if (!in_interrupt())
>                 wakeup_softirqd();
> 
> that's not really clean. We look into eliminating process context 
> use of raise_softirq_irqsoff(). Such code sequence:
> 
> 	local_irq_save(flags);
> 	...
> 	raise_softirq_irqsoff(nr);
> 	...
> 	local_irq_restore(flags);
> 
> should be converted to something like:
> 
> 	local_irq_save(flags);
> 	...
> 	raise_softirq_irqsoff(nr);
> 	...
> 	local_irq_restore(flags);
> 	recheck_softirqs();
> 
> If someone does not do proper local_bh_disable()/enable() sequences 
> for micro-optimization reasons, then push the check to after the 
> critcal section - and dont cause extra reschedules by waking up 
> ksoftirqd. raise_softirq_irqsoff() will also be faster.


Wouldn't the even better solution be to get rid of softirqs
all-together?

I see the recent work by Thomas to get threaded interrupts upstream as a
good first step towards that goal, once the RX processing is moved to a
thread (or multiple threads) one can priorize them in the regular
sys_sched_setscheduler() way and its obvious that a FIFO task above the
priority of the network tasks will have network starvation issues.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: question about softirqs
  2009-05-12  9:12                 ` Peter Zijlstra
@ 2009-05-12  9:23                   ` Ingo Molnar
  2009-05-12  9:32                     ` Peter Zijlstra
  2009-05-13  4:44                       ` David Miller
  2009-05-13  5:55                     ` Evgeniy Polyakov
  1 sibling, 2 replies; 57+ messages in thread
From: Ingo Molnar @ 2009-05-12  9:23 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linuxppc-dev, netdev, Steven Rostedt, paulus, Thomas Gleixner,
	David Miller


* Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:

> On Tue, 2009-05-12 at 10:12 +0200, Ingo Molnar wrote:
> > * Chris Friesen <cfriesen@nortel.com> wrote:
> > 
> > > This started out as a thread on the ppc list, but on the 
> > > suggestion of DaveM and Paul Mackerras I'm expanding the receiver 
> > > list a bit.
> > > 
> > > Currently, if a softirq is raised in process context the 
> > > TIF_RESCHED_PENDING flag gets set and on return to userspace we 
> > > run the scheduler, expecting it to switch to ksoftirqd to handle 
> > > the softirqd processing.
> > > 
> > > I think I see a possible problem with this. Suppose I have a 
> > > SCHED_FIFO task spinning on recvmsg() with MSG_DONTWAIT set. Under 
> > > the scenario above, schedule() would re-run the spinning task 
> > > rather than ksoftirqd, thus preventing any incoming packets from 
> > > being sent up the stack until we get a real hardware 
> > > interrupt--which could be a whole jiffy if interrupt mitigation is 
> > > enabled in the net device.
> > 
> > TIF_RESCHED_PENDING will not be set if a SCHED_FIFO task wakes up a 
> > SCHED_OTHER ksoftirqd task. But starvation of ksoftirqd processing 
> > will occur.
> > 
> > > DaveM pointed out that if we're doing transmits we're likely to 
> > > hit local_bh_enable(), which would process the softirq work.  
> > > However, I think we may still have a problem in the above rx-only 
> > > scenario--or is it too contrived to matter?
> > 
> > This could occur, and the problem is really that task priorities do 
> > not extend across softirq work processing.
> > 
> > This could occur in ordinary SCHED_OTHER tasks as well, if the 
> > softirq is bounced to ksoftirqd - which it only should be if there's 
> > serious softirq overload - or, as you describe it above, if the 
> > softirq is raised in process context:
> > 
> >         if (!in_interrupt())
> >                 wakeup_softirqd();
> > 
> > that's not really clean. We look into eliminating process context 
> > use of raise_softirq_irqsoff(). Such code sequence:
> > 
> > 	local_irq_save(flags);
> > 	...
> > 	raise_softirq_irqsoff(nr);
> > 	...
> > 	local_irq_restore(flags);
> > 
> > should be converted to something like:
> > 
> > 	local_irq_save(flags);
> > 	...
> > 	raise_softirq_irqsoff(nr);
> > 	...
> > 	local_irq_restore(flags);
> > 	recheck_softirqs();
> > 
> > If someone does not do proper local_bh_disable()/enable() sequences 
> > for micro-optimization reasons, then push the check to after the 
> > critcal section - and dont cause extra reschedules by waking up 
> > ksoftirqd. raise_softirq_irqsoff() will also be faster.
> 
> 
> Wouldn't the even better solution be to get rid of softirqs 
> all-together?
> 
> I see the recent work by Thomas to get threaded interrupts 
> upstream as a good first step towards that goal, once the RX 
> processing is moved to a thread (or multiple threads) one can 
> priorize them in the regular sys_sched_setscheduler() way and its 
> obvious that a FIFO task above the priority of the network tasks 
> will have network starvation issues.

Yeah, that would be "nice". A single IRQ thread plus the process 
context(s) doing networking might perform well.

Multiple IRQ threads (softirq and hardirq threads mixed) i'm not so 
sure about - it's extra context-switching cost.

Btw, i noticed that using scheduling for work (packet, etc.) flow 
distribution standardizes and evens out the behavior of workloads. 
Softirq scheduling is really quite random currently. We have a 
random processing loop-limit in the core code and various batching 
and work-limit controls at individual usage sites. We sometimes 
piggyback to ksoftirqd. It's far easier to keep performance in check 
when things are more predictable.

But this is not an easy endevour, and performance regressions have 
to be expected and addressed if they occur. There can be random 
packet queuing details in networking drivers that just happen to 
work fine now, and might work worse with a kernel thread in place. 
So there has to be broad buy-in for the concept, and a concerted 
effort to eliminate softirq processing and most of hardirq 
processing by pushing those two elements into a single hardirq 
thread (and the rest into process context).

Not for the faint hearted. Nor is it recommended to be done without 
a good layer of asbestos.

	Ingo

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: question about softirqs
  2009-05-12  9:23                   ` Ingo Molnar
@ 2009-05-12  9:32                     ` Peter Zijlstra
  2009-05-12 12:20                         ` Steven Rostedt
  2009-05-13  4:44                       ` David Miller
  1 sibling, 1 reply; 57+ messages in thread
From: Peter Zijlstra @ 2009-05-12  9:32 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linuxppc-dev, netdev, Steven Rostedt, paulus, Thomas Gleixner,
	David Miller

On Tue, 2009-05-12 at 11:23 +0200, Ingo Molnar wrote:
> 
> Yeah, that would be "nice". A single IRQ thread plus the process 
> context(s) doing networking might perform well.
> 
> Multiple IRQ threads (softirq and hardirq threads mixed) i'm not so 
> sure about - it's extra context-switching cost.

Sure, that was implied by the getting rid of softirqs ;-), on -rt we
currently suffer this hardirq/softirq thread ping-pong, it sucks.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: question about softirqs
  2009-05-12  9:32                     ` Peter Zijlstra
@ 2009-05-12 12:20                         ` Steven Rostedt
  0 siblings, 0 replies; 57+ messages in thread
From: Steven Rostedt @ 2009-05-12 12:20 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Chris Friesen, Thomas Gleixner, David Miller,
	linuxppc-dev, paulus, netdev



On Tue, 12 May 2009, Peter Zijlstra wrote:

> On Tue, 2009-05-12 at 11:23 +0200, Ingo Molnar wrote:
> > 
> > Yeah, that would be "nice". A single IRQ thread plus the process 
> > context(s) doing networking might perform well.
> > 
> > Multiple IRQ threads (softirq and hardirq threads mixed) i'm not so 
> > sure about - it's extra context-switching cost.
> 
> Sure, that was implied by the getting rid of softirqs ;-), on -rt we
> currently suffer this hardirq/softirq thread ping-pong, it sucks.

I'm going to be playing around with bypassing the net-rx/tx with my 
network drivers. I'm going to add threaded irqs for my network cards and 
have the driver threads do the work to get through the tcp/ip stack.

I'll still keep the softirqs for other cards, but I want to see how fast 
it speeds things up if I have the driver thread do it.

-- Steve


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: question about softirqs
@ 2009-05-12 12:20                         ` Steven Rostedt
  0 siblings, 0 replies; 57+ messages in thread
From: Steven Rostedt @ 2009-05-12 12:20 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: netdev, David Miller, linuxppc-dev, paulus, Ingo Molnar, Thomas Gleixner



On Tue, 12 May 2009, Peter Zijlstra wrote:

> On Tue, 2009-05-12 at 11:23 +0200, Ingo Molnar wrote:
> > 
> > Yeah, that would be "nice". A single IRQ thread plus the process 
> > context(s) doing networking might perform well.
> > 
> > Multiple IRQ threads (softirq and hardirq threads mixed) i'm not so 
> > sure about - it's extra context-switching cost.
> 
> Sure, that was implied by the getting rid of softirqs ;-), on -rt we
> currently suffer this hardirq/softirq thread ping-pong, it sucks.

I'm going to be playing around with bypassing the net-rx/tx with my 
network drivers. I'm going to add threaded irqs for my network cards and 
have the driver threads do the work to get through the tcp/ip stack.

I'll still keep the softirqs for other cards, but I want to see how fast 
it speeds things up if I have the driver thread do it.

-- Steve

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: question about softirqs
  2009-05-12  8:12                 ` Ingo Molnar
@ 2009-05-12 15:18                   ` Chris Friesen
  -1 siblings, 0 replies; 57+ messages in thread
From: Chris Friesen @ 2009-05-12 15:18 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Thomas Gleixner, Steven Rostedt, David Miller,
	linuxppc-dev, paulus, netdev

Ingo Molnar wrote:
> * Chris Friesen <cfriesen@nortel.com> wrote:

>>I think I see a possible problem with this. Suppose I have a 
>>SCHED_FIFO task spinning on recvmsg() with MSG_DONTWAIT set. Under 
>>the scenario above, schedule() would re-run the spinning task 
>>rather than ksoftirqd, thus preventing any incoming packets from 
>>being sent up the stack until we get a real hardware 
>>interrupt--which could be a whole jiffy if interrupt mitigation is 
>>enabled in the net device.

>>DaveM pointed out that if we're doing transmits we're likely to 
>>hit local_bh_enable(), which would process the softirq work.  
>>However, I think we may still have a problem in the above rx-only 
>>scenario--or is it too contrived to matter?

> This could occur, and the problem is really that task priorities do 
> not extend across softirq work processing.
> 
> This could occur in ordinary SCHED_OTHER tasks as well, if the 
> softirq is bounced to ksoftirqd - which it only should be if there's 
> serious softirq overload - or, as you describe it above, if the 
> softirq is raised in process context:

One of the reasons I brought up this issue is that there is a lot of
documentation out there that says "softirqs will be processed on return
from a syscall".  The fact that it actually depends on the scheduler
parameters of the task issuing the syscall isn't ever mentioned.

In fact, "Documentation/DocBook/kernel-hacking.tmpl" in the kernel
source still has the following:

    Whenever a system call is about to return to userspace, or a
    hardware interrupt handler exits, any 'software interrupts'
    which are marked pending (usually by hardware interrupts) are
    run (<filename>kernel/softirq.c</filename>).

If anyone is looking at changing this code, it might be good to ensure
that at least the kernel docs are updated.

Chris

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: question about softirqs
@ 2009-05-12 15:18                   ` Chris Friesen
  0 siblings, 0 replies; 57+ messages in thread
From: Chris Friesen @ 2009-05-12 15:18 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, netdev, Steven Rostedt, linuxppc-dev, paulus,
	Thomas Gleixner, David Miller

Ingo Molnar wrote:
> * Chris Friesen <cfriesen@nortel.com> wrote:

>>I think I see a possible problem with this. Suppose I have a 
>>SCHED_FIFO task spinning on recvmsg() with MSG_DONTWAIT set. Under 
>>the scenario above, schedule() would re-run the spinning task 
>>rather than ksoftirqd, thus preventing any incoming packets from 
>>being sent up the stack until we get a real hardware 
>>interrupt--which could be a whole jiffy if interrupt mitigation is 
>>enabled in the net device.

>>DaveM pointed out that if we're doing transmits we're likely to 
>>hit local_bh_enable(), which would process the softirq work.  
>>However, I think we may still have a problem in the above rx-only 
>>scenario--or is it too contrived to matter?

> This could occur, and the problem is really that task priorities do 
> not extend across softirq work processing.
> 
> This could occur in ordinary SCHED_OTHER tasks as well, if the 
> softirq is bounced to ksoftirqd - which it only should be if there's 
> serious softirq overload - or, as you describe it above, if the 
> softirq is raised in process context:

One of the reasons I brought up this issue is that there is a lot of
documentation out there that says "softirqs will be processed on return
from a syscall".  The fact that it actually depends on the scheduler
parameters of the task issuing the syscall isn't ever mentioned.

In fact, "Documentation/DocBook/kernel-hacking.tmpl" in the kernel
source still has the following:

    Whenever a system call is about to return to userspace, or a
    hardware interrupt handler exits, any 'software interrupts'
    which are marked pending (usually by hardware interrupts) are
    run (<filename>kernel/softirq.c</filename>).

If anyone is looking at changing this code, it might be good to ensure
that at least the kernel docs are updated.

Chris

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: question about softirqs
  2009-05-12  9:23                   ` Ingo Molnar
@ 2009-05-13  4:44                       ` David Miller
  2009-05-13  4:44                       ` David Miller
  1 sibling, 0 replies; 57+ messages in thread
From: David Miller @ 2009-05-13  4:44 UTC (permalink / raw)
  To: mingo; +Cc: a.p.zijlstra, cfriesen, tglx, rostedt, linuxppc-dev, paulus, netdev

From: Ingo Molnar <mingo@elte.hu>
Date: Tue, 12 May 2009 11:23:48 +0200

>> Wouldn't the even better solution be to get rid of softirqs 
>> all-together?
>> 
>> I see the recent work by Thomas to get threaded interrupts 
>> upstream as a good first step towards that goal, once the RX 
>> processing is moved to a thread (or multiple threads) one can 
>> priorize them in the regular sys_sched_setscheduler() way and its 
>> obvious that a FIFO task above the priority of the network tasks 
>> will have network starvation issues.
> 
> Yeah, that would be "nice". A single IRQ thread plus the process 
> context(s) doing networking might perform well.

Nice for -rt goals, but not for latency.

So we're going to regress in this area again?  I can't see how
that's so desirable, to be honest with you.

The fact that this discussion started about a task with a certain
priority not being able to make forward progress, even though it
was correct coded, just because softirqs are being processed in
a thread context, should be a big red flag that this is a buggered up
design.

I fully expected us to be, at this point, talking about putting the
pending softirq check back into the trap return path :-/

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: question about softirqs
@ 2009-05-13  4:44                       ` David Miller
  0 siblings, 0 replies; 57+ messages in thread
From: David Miller @ 2009-05-13  4:44 UTC (permalink / raw)
  To: mingo; +Cc: a.p.zijlstra, linuxppc-dev, netdev, rostedt, paulus, tglx

From: Ingo Molnar <mingo@elte.hu>
Date: Tue, 12 May 2009 11:23:48 +0200

>> Wouldn't the even better solution be to get rid of softirqs 
>> all-together?
>> 
>> I see the recent work by Thomas to get threaded interrupts 
>> upstream as a good first step towards that goal, once the RX 
>> processing is moved to a thread (or multiple threads) one can 
>> priorize them in the regular sys_sched_setscheduler() way and its 
>> obvious that a FIFO task above the priority of the network tasks 
>> will have network starvation issues.
> 
> Yeah, that would be "nice". A single IRQ thread plus the process 
> context(s) doing networking might perform well.

Nice for -rt goals, but not for latency.

So we're going to regress in this area again?  I can't see how
that's so desirable, to be honest with you.

The fact that this discussion started about a task with a certain
priority not being able to make forward progress, even though it
was correct coded, just because softirqs are being processed in
a thread context, should be a big red flag that this is a buggered up
design.

I fully expected us to be, at this point, talking about putting the
pending softirq check back into the trap return path :-/

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: question about softirqs
  2009-05-12 12:20                         ` Steven Rostedt
  (?)
@ 2009-05-13  4:45                         ` David Miller
  -1 siblings, 0 replies; 57+ messages in thread
From: David Miller @ 2009-05-13  4:45 UTC (permalink / raw)
  To: rostedt; +Cc: a.p.zijlstra, linuxppc-dev, netdev, paulus, mingo, tglx

From: Steven Rostedt <rostedt@goodmis.org>
Date: Tue, 12 May 2009 08:20:51 -0400 (EDT)

> I'm going to be playing around with bypassing the net-rx/tx with my 
> network drivers. I'm going to add threaded irqs for my network cards and 
> have the driver threads do the work to get through the tcp/ip stack.
> 
> I'll still keep the softirqs for other cards, but I want to see how fast 
> it speeds things up if I have the driver thread do it.

I think your latency is going to be dreadful.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: question about softirqs
  2009-05-13  4:44                       ` David Miller
@ 2009-05-13  5:15                         ` Paul Mackerras
  -1 siblings, 0 replies; 57+ messages in thread
From: Paul Mackerras @ 2009-05-13  5:15 UTC (permalink / raw)
  To: David Miller
  Cc: mingo, a.p.zijlstra, cfriesen, tglx, rostedt, linuxppc-dev, netdev

David Miller writes:

> I fully expected us to be, at this point, talking about putting the
> pending softirq check back into the trap return path :-/

Would that actually do any good, in the case where the system has
decided that ksoftirqd is handling soft irqs at the moment?

Paul.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: question about softirqs
@ 2009-05-13  5:15                         ` Paul Mackerras
  0 siblings, 0 replies; 57+ messages in thread
From: Paul Mackerras @ 2009-05-13  5:15 UTC (permalink / raw)
  To: David Miller; +Cc: a.p.zijlstra, linuxppc-dev, netdev, rostedt, mingo, tglx

David Miller writes:

> I fully expected us to be, at this point, talking about putting the
> pending softirq check back into the trap return path :-/

Would that actually do any good, in the case where the system has
decided that ksoftirqd is handling soft irqs at the moment?

Paul.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: question about softirqs
  2009-05-13  5:15                         ` Paul Mackerras
@ 2009-05-13  5:28                           ` David Miller
  -1 siblings, 0 replies; 57+ messages in thread
From: David Miller @ 2009-05-13  5:28 UTC (permalink / raw)
  To: paulus; +Cc: mingo, a.p.zijlstra, cfriesen, tglx, rostedt, linuxppc-dev, netdev

From: Paul Mackerras <paulus@samba.org>
Date: Wed, 13 May 2009 15:15:34 +1000

> David Miller writes:
> 
>> I fully expected us to be, at this point, talking about putting the
>> pending softirq check back into the trap return path :-/
> 
> Would that actually do any good, in the case where the system has
> decided that ksoftirqd is handling soft irqs at the moment?

Even if ksoftirqd is running, we check and run pending softirqs from
trap return.

Sure, I imagine we could re-enter this "ksoftirq blocked by highprio
thread" situation if we get flooded every single time over and over
again.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: question about softirqs
@ 2009-05-13  5:28                           ` David Miller
  0 siblings, 0 replies; 57+ messages in thread
From: David Miller @ 2009-05-13  5:28 UTC (permalink / raw)
  To: paulus; +Cc: a.p.zijlstra, linuxppc-dev, netdev, rostedt, mingo, tglx

From: Paul Mackerras <paulus@samba.org>
Date: Wed, 13 May 2009 15:15:34 +1000

> David Miller writes:
> 
>> I fully expected us to be, at this point, talking about putting the
>> pending softirq check back into the trap return path :-/
> 
> Would that actually do any good, in the case where the system has
> decided that ksoftirqd is handling soft irqs at the moment?

Even if ksoftirqd is running, we check and run pending softirqs from
trap return.

Sure, I imagine we could re-enter this "ksoftirq blocked by highprio
thread" situation if we get flooded every single time over and over
again.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: question about softirqs
  2009-05-12  9:12                 ` Peter Zijlstra
@ 2009-05-13  5:55                     ` Evgeniy Polyakov
  2009-05-13  5:55                     ` Evgeniy Polyakov
  1 sibling, 0 replies; 57+ messages in thread
From: Evgeniy Polyakov @ 2009-05-13  5:55 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Chris Friesen, Thomas Gleixner, Steven Rostedt,
	David Miller, linuxppc-dev, paulus, netdev

Hi.

On Tue, May 12, 2009 at 11:12:58AM +0200, Peter Zijlstra (a.p.zijlstra@chello.nl) wrote:
> Wouldn't the even better solution be to get rid of softirqs
> all-together?

And move tasklets into some thread context?

Only if we are ready to fix 7 times rescheduling regressions compared to
kernel threads (work queue actually). At least that's how tasklet
behaved compared to work queue 1.5 years ago in the simplest
and quite naive test where tasklet/work rescheduled iself number of
times:

http://marc.info/?l=linux-crypto-vger&m=119462472517405&w=2

-- 
	Evgeniy Polyakov

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: question about softirqs
@ 2009-05-13  5:55                     ` Evgeniy Polyakov
  0 siblings, 0 replies; 57+ messages in thread
From: Evgeniy Polyakov @ 2009-05-13  5:55 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: netdev, Steven Rostedt, David Miller, linuxppc-dev, paulus,
	Ingo Molnar, Thomas Gleixner

Hi.

On Tue, May 12, 2009 at 11:12:58AM +0200, Peter Zijlstra (a.p.zijlstra@chello.nl) wrote:
> Wouldn't the even better solution be to get rid of softirqs
> all-together?

And move tasklets into some thread context?

Only if we are ready to fix 7 times rescheduling regressions compared to
kernel threads (work queue actually). At least that's how tasklet
behaved compared to work queue 1.5 years ago in the simplest
and quite naive test where tasklet/work rescheduled iself number of
times:

http://marc.info/?l=linux-crypto-vger&m=119462472517405&w=2

-- 
	Evgeniy Polyakov

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: question about softirqs
  2009-05-12 15:18                   ` Chris Friesen
@ 2009-05-13  8:34                     ` Andi Kleen
  -1 siblings, 0 replies; 57+ messages in thread
From: Andi Kleen @ 2009-05-13  8:34 UTC (permalink / raw)
  To: Chris Friesen
  Cc: Ingo Molnar, Peter Zijlstra, Thomas Gleixner, Steven Rostedt,
	David Miller, linuxppc-dev, paulus, netdev

"Chris Friesen" <cfriesen@nortel.com> writes:
>
> One of the reasons I brought up this issue is that there is a lot of
> documentation out there that says "softirqs will be processed on return
> from a syscall".  The fact that it actually depends on the scheduler
> parameters of the task issuing the syscall isn't ever mentioned.

It's not mentioned because it is not currently.

However some network TCP RX processing can happen in process context,
which gives you most of the benefit anyways.

> In fact, "Documentation/DocBook/kernel-hacking.tmpl" in the kernel
> source still has the following:
>
>     Whenever a system call is about to return to userspace, or a
>     hardware interrupt handler exits, any 'software interrupts'
>     which are marked pending (usually by hardware interrupts) are
>     run (<filename>kernel/softirq.c</filename>).
>
> If anyone is looking at changing this code, it might be good to ensure
> that at least the kernel docs are updated.

So far the code is not changed in mainline. There have been some
proposals only.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: question about softirqs
@ 2009-05-13  8:34                     ` Andi Kleen
  0 siblings, 0 replies; 57+ messages in thread
From: Andi Kleen @ 2009-05-13  8:34 UTC (permalink / raw)
  To: Chris Friesen
  Cc: Peter Zijlstra, netdev, Steven Rostedt, David Miller,
	linuxppc-dev, paulus, Ingo Molnar, Thomas Gleixner

"Chris Friesen" <cfriesen@nortel.com> writes:
>
> One of the reasons I brought up this issue is that there is a lot of
> documentation out there that says "softirqs will be processed on return
> from a syscall".  The fact that it actually depends on the scheduler
> parameters of the task issuing the syscall isn't ever mentioned.

It's not mentioned because it is not currently.

However some network TCP RX processing can happen in process context,
which gives you most of the benefit anyways.

> In fact, "Documentation/DocBook/kernel-hacking.tmpl" in the kernel
> source still has the following:
>
>     Whenever a system call is about to return to userspace, or a
>     hardware interrupt handler exits, any 'software interrupts'
>     which are marked pending (usually by hardware interrupts) are
>     run (<filename>kernel/softirq.c</filename>).
>
> If anyone is looking at changing this code, it might be good to ensure
> that at least the kernel docs are updated.

So far the code is not changed in mainline. There have been some
proposals only.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: question about softirqs
  2009-05-13  8:34                     ` Andi Kleen
  (?)
@ 2009-05-13 13:23                     ` Chris Friesen
  2009-05-13 14:15                         ` Andi Kleen
  -1 siblings, 1 reply; 57+ messages in thread
From: Chris Friesen @ 2009-05-13 13:23 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Peter Zijlstra, netdev, Steven Rostedt, David Miller,
	linuxppc-dev, paulus, Ingo Molnar, Thomas Gleixner

Andi Kleen wrote:
> "Chris Friesen" <cfriesen@nortel.com> writes:
> 
>>One of the reasons I brought up this issue is that there is a lot of
>>documentation out there that says "softirqs will be processed on return
>>from a syscall".  The fact that it actually depends on the scheduler
>>parameters of the task issuing the syscall isn't ever mentioned.

> It's not mentioned because it is not currently.

Paul Mackerras explained the current behaviour earlier in the thread
(when it was still on the ppc list).  His explanation agrees with my
exporation of the code.

"If a soft irq is raised in process context, raise_softirq() in
kernel/softirq.c calls wakeup_softirqd() to make sure that ksoftirqd
runs soon to process the soft irq.  So what would happen is that we
would see the TIF_RESCHED_PENDING flag on the current task in the
syscall exit path and call schedule() which would switch to ksoftirqd
to process the soft irq (if it hasn't already been processed by that
stage)."

If the current task is of higher priority, ksoftirqd doesn't get a
chance to run and we don't process softirqs on return from a syscall.

Chris

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: question about softirqs
  2009-05-13 13:23                     ` Chris Friesen
@ 2009-05-13 14:15                         ` Andi Kleen
  0 siblings, 0 replies; 57+ messages in thread
From: Andi Kleen @ 2009-05-13 14:15 UTC (permalink / raw)
  To: Chris Friesen
  Cc: Andi Kleen, Ingo Molnar, Peter Zijlstra, Thomas Gleixner,
	Steven Rostedt, David Miller, linuxppc-dev, paulus, netdev

> "If a soft irq is raised in process context, raise_softirq() in
> kernel/softirq.c calls wakeup_softirqd() to make sure that ksoftirqd

softirqd is only used when the softirq runs for too long or when
there are no suitable irq exits for a long time.

In normal situations (not excessive time in softirq) they don't
do anything. 

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: question about softirqs
@ 2009-05-13 14:15                         ` Andi Kleen
  0 siblings, 0 replies; 57+ messages in thread
From: Andi Kleen @ 2009-05-13 14:15 UTC (permalink / raw)
  To: Chris Friesen
  Cc: Peter Zijlstra, netdev, Steven Rostedt, David Miller,
	linuxppc-dev, Andi Kleen, paulus, Ingo Molnar, Thomas Gleixner

> "If a soft irq is raised in process context, raise_softirq() in
> kernel/softirq.c calls wakeup_softirqd() to make sure that ksoftirqd

softirqd is only used when the softirq runs for too long or when
there are no suitable irq exits for a long time.

In normal situations (not excessive time in softirq) they don't
do anything. 

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: question about softirqs
  2009-05-13 14:15                         ` Andi Kleen
@ 2009-05-13 14:17                           ` Thomas Gleixner
  -1 siblings, 0 replies; 57+ messages in thread
From: Thomas Gleixner @ 2009-05-13 14:17 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Chris Friesen, Ingo Molnar, Peter Zijlstra, Steven Rostedt,
	David Miller, linuxppc-dev, paulus, netdev

On Wed, 13 May 2009, Andi Kleen wrote:

> > "If a soft irq is raised in process context, raise_softirq() in
> > kernel/softirq.c calls wakeup_softirqd() to make sure that ksoftirqd
> 
> softirqd is only used when the softirq runs for too long or when
> there are no suitable irq exits for a long time.
> 
> In normal situations (not excessive time in softirq) they don't
> do anything. 

Err, no. Chris is completely correct:

        if (!in_interrupt())
		wakeup_softirqd();

We can not rely on irqs coming in when the softirq is raised from
thread context. An irq_exit might be faster to process it than the
scheduler can schedule ksoftirqd in, but ksoftirqd is woken and runs
nevertheless. If it finds a softirq pending then it processes them in
it's context and irq_exit calls to softirq are returning right away.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: question about softirqs
@ 2009-05-13 14:17                           ` Thomas Gleixner
  0 siblings, 0 replies; 57+ messages in thread
From: Thomas Gleixner @ 2009-05-13 14:17 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Peter Zijlstra, netdev, Steven Rostedt, linuxppc-dev, paulus,
	Ingo Molnar, David Miller

On Wed, 13 May 2009, Andi Kleen wrote:

> > "If a soft irq is raised in process context, raise_softirq() in
> > kernel/softirq.c calls wakeup_softirqd() to make sure that ksoftirqd
> 
> softirqd is only used when the softirq runs for too long or when
> there are no suitable irq exits for a long time.
> 
> In normal situations (not excessive time in softirq) they don't
> do anything. 

Err, no. Chris is completely correct:

        if (!in_interrupt())
		wakeup_softirqd();

We can not rely on irqs coming in when the softirq is raised from
thread context. An irq_exit might be faster to process it than the
scheduler can schedule ksoftirqd in, but ksoftirqd is woken and runs
nevertheless. If it finds a softirq pending then it processes them in
it's context and irq_exit calls to softirq are returning right away.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: question about softirqs
  2009-05-13 14:17                           ` Thomas Gleixner
@ 2009-05-13 14:24                             ` Andi Kleen
  -1 siblings, 0 replies; 57+ messages in thread
From: Andi Kleen @ 2009-05-13 14:24 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Chris Friesen, Ingo Molnar, Peter Zijlstra, Steven Rostedt,
	David Miller, linuxppc-dev, paulus, netdev

Thomas Gleixner <tglx@linutronix.de> writes:


> Err, no. Chris is completely correct:
>
>         if (!in_interrupt())
> 		wakeup_softirqd();

Yes you have to wake it up just in case, but it doesn't normally
process the data because a normal softirq comes in faster. It's
just a safety policy. 

You can check this by checking the accumulated CPU time on your
ksoftirqs.  Mine are all 0 even on long running systems.

The reason Andrea originally added the softirqds was just that
if you have very softirq intensive workloads they would tie
up too much CPU time or not make enough process with the default
"don't loop too often" heuristics. 

> We can not rely on irqs coming in when the softirq is raised from

You can't rely on it, but it happens in near all cases.

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: question about softirqs
@ 2009-05-13 14:24                             ` Andi Kleen
  0 siblings, 0 replies; 57+ messages in thread
From: Andi Kleen @ 2009-05-13 14:24 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Peter Zijlstra, netdev, Steven Rostedt, linuxppc-dev, paulus,
	Ingo Molnar, David Miller

Thomas Gleixner <tglx@linutronix.de> writes:


> Err, no. Chris is completely correct:
>
>         if (!in_interrupt())
> 		wakeup_softirqd();

Yes you have to wake it up just in case, but it doesn't normally
process the data because a normal softirq comes in faster. It's
just a safety policy. 

You can check this by checking the accumulated CPU time on your
ksoftirqs.  Mine are all 0 even on long running systems.

The reason Andrea originally added the softirqds was just that
if you have very softirq intensive workloads they would tie
up too much CPU time or not make enough process with the default
"don't loop too often" heuristics. 

> We can not rely on irqs coming in when the softirq is raised from

You can't rely on it, but it happens in near all cases.

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: question about softirqs
  2009-05-13 14:24                             ` Andi Kleen
@ 2009-05-13 14:54                               ` Eric Dumazet
  -1 siblings, 0 replies; 57+ messages in thread
From: Eric Dumazet @ 2009-05-13 14:54 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Thomas Gleixner, Chris Friesen, Ingo Molnar, Peter Zijlstra,
	Steven Rostedt, David Miller, linuxppc-dev, paulus, netdev

Andi Kleen a écrit :
> Thomas Gleixner <tglx@linutronix.de> writes:
> 
> 
>> Err, no. Chris is completely correct:
>>
>>         if (!in_interrupt())
>> 		wakeup_softirqd();
> 
> Yes you have to wake it up just in case, but it doesn't normally
> process the data because a normal softirq comes in faster. It's
> just a safety policy. 
> 
> You can check this by checking the accumulated CPU time on your
> ksoftirqs.  Mine are all 0 even on long running systems.
> 

Then its a bug Andi. Its quite easy to trigger ksoftirqd with a Gb ethernet link.

commit f5f293a4e3d0a0c52cec31de6762c95050156516 corrected something
(making mpstat and top correctly display softirq on cpu stats),
but apparently we still have a problem to report correct time on processes,
particularly on ksoftirq/x

I have one machine SMP flooded by network frames, CPU0 handling all
the work, inside ksoftirq/0 (napi processing : almost no more hard interrupts delivered)

Still, top or ps reports no more than 30% of cpu time used by
ksoftirqd, while this cpu only runs ksoftirqd/0 (100% in sirq), and has no idle time.

$ps -fp 4 ; mpstat -P 0 1 10 ; ps -fp 4
UID        PID  PPID  C STIME TTY          TIME CMD
root         4     2  1 15:35 ?        00:00:46 [ksoftirqd/0]
Linux 2.6.30-rc5-tip-01595-g6f75dad-dirty (svivoipvnx001)       05/13/2009      _i686_

04:45:01 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
04:45:02 PM    0    0.00    0.00    0.00    0.00    0.00  100.00    0.00    0.00    0.00
04:45:03 PM    0    0.00    0.00    0.00    0.00    0.00   99.01    0.00    0.00    0.99
04:45:04 PM    0    0.00    0.00    0.00    0.00    0.00  100.00    0.00    0.00    0.00
04:45:05 PM    0    0.00    0.00    0.00    0.00    0.00  100.00    0.00    0.00    0.00
04:45:06 PM    0    0.00    0.00    0.00    0.00    0.00  100.00    0.00    0.00    0.00
04:45:07 PM    0    0.00    0.00    0.00    0.00    0.00  100.00    0.00    0.00    0.00
04:45:08 PM    0    0.00    0.00    0.00    0.00    0.00  100.00    0.00    0.00    0.00
04:45:09 PM    0    0.00    0.00    0.00    0.00    0.00  100.00    0.00    0.00    0.00
04:45:10 PM    0    0.00    0.00    0.00    0.00    0.00  100.00    0.00    0.00    0.00
04:45:11 PM    0    0.00    0.00    0.00    0.00    0.00  100.00    0.00    0.00    0.00
Average:       0    0.00    0.00    0.00    0.00    0.00   99.90    0.00    0.00    0.10
UID        PID  PPID  C STIME TTY          TIME CMD
root         4     2  1 15:35 ?        00:00:49 [ksoftirqd/0]

You can see here time consumed by ksoftirqd/0 suring this 10 seconds time frame is *only* 3 seconds.

Therefore, we cannot trust ps, not with current kernel.

# cat /proc/4/stat ; sleep 10 ; cat /proc/4/stat
4 (ksoftirqd/0) R 2 0 0 0 -1 2216730688 0 0 0 0 0 15347 0 0 15 -5 1 0 6 0 0 4294967295 0 0 0 0 0 0 0 2147483647 0 0 0 0 17 0 0 0 0 0 0
4 (ksoftirqd/0) R 2 0 0 0 -1 2216730688 0 0 0 0 0 15670 0 0 15 -5 1 0 6 0 0 4294967295 0 0 0 0 0 0 0 2147483647 0 0 0 0 17 0 0 0 0 0 0


> The reason Andrea originally added the softirqds was just that
> if you have very softirq intensive workloads they would tie
> up too much CPU time or not make enough process with the default
> "don't loop too often" heuristics. 
> 
>> We can not rely on irqs coming in when the softirq is raised from
> 
> You can't rely on it, but it happens in near all cases.
> 
> -Andi



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: question about softirqs
@ 2009-05-13 14:54                               ` Eric Dumazet
  0 siblings, 0 replies; 57+ messages in thread
From: Eric Dumazet @ 2009-05-13 14:54 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Peter Zijlstra, linuxppc-dev, netdev, Ingo Molnar,
	Steven Rostedt, paulus, Thomas Gleixner, David Miller

Andi Kleen a =E9crit :
> Thomas Gleixner <tglx@linutronix.de> writes:
>=20
>=20
>> Err, no. Chris is completely correct:
>>
>>         if (!in_interrupt())
>> 		wakeup_softirqd();
>=20
> Yes you have to wake it up just in case, but it doesn't normally
> process the data because a normal softirq comes in faster. It's
> just a safety policy.=20
>=20
> You can check this by checking the accumulated CPU time on your
> ksoftirqs.  Mine are all 0 even on long running systems.
>=20

Then its a bug Andi. Its quite easy to trigger ksoftirqd with a Gb ethern=
et link.

commit f5f293a4e3d0a0c52cec31de6762c95050156516 corrected something
(making mpstat and top correctly display softirq on cpu stats),
but apparently we still have a problem to report correct time on processe=
s,
particularly on ksoftirq/x

I have one machine SMP flooded by network frames, CPU0 handling all
the work, inside ksoftirq/0 (napi processing : almost no more hard interr=
upts delivered)

Still, top or ps reports no more than 30% of cpu time used by
ksoftirqd, while this cpu only runs ksoftirqd/0 (100% in sirq), and has n=
o idle time.

$ps -fp 4 ; mpstat -P 0 1 10 ; ps -fp 4
UID        PID  PPID  C STIME TTY          TIME CMD
root         4     2  1 15:35 ?        00:00:46 [ksoftirqd/0]
Linux 2.6.30-rc5-tip-01595-g6f75dad-dirty (svivoipvnx001)       05/13/200=
9      _i686_

04:45:01 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal =
 %guest   %idle
04:45:02 PM    0    0.00    0.00    0.00    0.00    0.00  100.00    0.00 =
   0.00    0.00
04:45:03 PM    0    0.00    0.00    0.00    0.00    0.00   99.01    0.00 =
   0.00    0.99
04:45:04 PM    0    0.00    0.00    0.00    0.00    0.00  100.00    0.00 =
   0.00    0.00
04:45:05 PM    0    0.00    0.00    0.00    0.00    0.00  100.00    0.00 =
   0.00    0.00
04:45:06 PM    0    0.00    0.00    0.00    0.00    0.00  100.00    0.00 =
   0.00    0.00
04:45:07 PM    0    0.00    0.00    0.00    0.00    0.00  100.00    0.00 =
   0.00    0.00
04:45:08 PM    0    0.00    0.00    0.00    0.00    0.00  100.00    0.00 =
   0.00    0.00
04:45:09 PM    0    0.00    0.00    0.00    0.00    0.00  100.00    0.00 =
   0.00    0.00
04:45:10 PM    0    0.00    0.00    0.00    0.00    0.00  100.00    0.00 =
   0.00    0.00
04:45:11 PM    0    0.00    0.00    0.00    0.00    0.00  100.00    0.00 =
   0.00    0.00
Average:       0    0.00    0.00    0.00    0.00    0.00   99.90    0.00 =
   0.00    0.10
UID        PID  PPID  C STIME TTY          TIME CMD
root         4     2  1 15:35 ?        00:00:49 [ksoftirqd/0]

You can see here time consumed by ksoftirqd/0 suring this 10 seconds time=
 frame is *only* 3 seconds.

Therefore, we cannot trust ps, not with current kernel.

# cat /proc/4/stat ; sleep 10 ; cat /proc/4/stat
4 (ksoftirqd/0) R 2 0 0 0 -1 2216730688 0 0 0 0 0 15347 0 0 15 -5 1 0 6 0=
 0 4294967295 0 0 0 0 0 0 0 2147483647 0 0 0 0 17 0 0 0 0 0 0
4 (ksoftirqd/0) R 2 0 0 0 -1 2216730688 0 0 0 0 0 15670 0 0 15 -5 1 0 6 0=
 0 4294967295 0 0 0 0 0 0 0 2147483647 0 0 0 0 17 0 0 0 0 0 0


> The reason Andrea originally added the softirqds was just that
> if you have very softirq intensive workloads they would tie
> up too much CPU time or not make enough process with the default
> "don't loop too often" heuristics.=20
>=20
>> We can not rely on irqs coming in when the softirq is raised from
>=20
> You can't rely on it, but it happens in near all cases.
>=20
> -Andi

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: question about softirqs
  2009-05-13 14:54                               ` Eric Dumazet
@ 2009-05-13 15:02                                 ` Andi Kleen
  -1 siblings, 0 replies; 57+ messages in thread
From: Andi Kleen @ 2009-05-13 15:02 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Andi Kleen, Thomas Gleixner, Chris Friesen, Ingo Molnar,
	Peter Zijlstra, Steven Rostedt, David Miller, linuxppc-dev,
	paulus, netdev

> I have one machine SMP flooded by network frames, CPU0 handling all

Yes that's the case softirqd is supposed to handle. When you 
spend a significant part of your CPU time in softirq context it kicks
in to provide somewhat fair additional CPU time.

But most systems (like mine) don't do that.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: question about softirqs
@ 2009-05-13 15:02                                 ` Andi Kleen
  0 siblings, 0 replies; 57+ messages in thread
From: Andi Kleen @ 2009-05-13 15:02 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Peter Zijlstra, linuxppc-dev, netdev, Steven Rostedt,
	David Miller, Andi Kleen, paulus, Thomas Gleixner, Ingo Molnar

> I have one machine SMP flooded by network frames, CPU0 handling all

Yes that's the case softirqd is supposed to handle. When you 
spend a significant part of your CPU time in softirq context it kicks
in to provide somewhat fair additional CPU time.

But most systems (like mine) don't do that.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: question about softirqs
  2009-05-13 14:24                             ` Andi Kleen
  (?)
  (?)
@ 2009-05-13 15:05                             ` Chris Friesen
  2009-05-13 15:54                                 ` Thomas Gleixner
  2009-05-13 17:01                               ` Andi Kleen
  -1 siblings, 2 replies; 57+ messages in thread
From: Chris Friesen @ 2009-05-13 15:05 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Peter Zijlstra, netdev, Ingo Molnar, Steven Rostedt,
	linuxppc-dev, paulus, Thomas Gleixner, David Miller

Andi Kleen wrote:
> Thomas Gleixner <tglx@linutronix.de> writes:

>>Err, no. Chris is completely correct:
>>
>>        if (!in_interrupt())
>>		wakeup_softirqd();
> 
> Yes you have to wake it up just in case, but it doesn't normally
> process the data because a normal softirq comes in faster. It's
> just a safety policy. 

What about the scenario I raised earlier, where we have incoming network
packets, no hardware interrupts coming in other than the timer tick, and
a high-priority userspace app is spinning on recvmsg() with MSG_DONTWAIT
set?

As far as I can tell, in this scenario softirqs may not get processed on
return from a syscall (contradicting the documentation).  In the worst
case, they may not get processed until the next timer tick.

Chris

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: question about softirqs
  2009-05-13 15:05                             ` Chris Friesen
@ 2009-05-13 15:54                                 ` Thomas Gleixner
  2009-05-13 17:01                               ` Andi Kleen
  1 sibling, 0 replies; 57+ messages in thread
From: Thomas Gleixner @ 2009-05-13 15:54 UTC (permalink / raw)
  To: Chris Friesen
  Cc: Andi Kleen, Ingo Molnar, Peter Zijlstra, Steven Rostedt,
	David Miller, linuxppc-dev, paulus, netdev

On Wed, 13 May 2009, Chris Friesen wrote:
> Andi Kleen wrote:
> > Thomas Gleixner <tglx@linutronix.de> writes:
> 
> >>Err, no. Chris is completely correct:
> >>
> >>        if (!in_interrupt())
> >>		wakeup_softirqd();
> > 
> > Yes you have to wake it up just in case, but it doesn't normally
> > process the data because a normal softirq comes in faster. It's
> > just a safety policy. 
> 
> What about the scenario I raised earlier, where we have incoming network
> packets, no hardware interrupts coming in other than the timer tick, and
> a high-priority userspace app is spinning on recvmsg() with MSG_DONTWAIT
> set?
> 
> As far as I can tell, in this scenario softirqs may not get processed on
> return from a syscall (contradicting the documentation).  In the worst
> case, they may not get processed until the next timer tick.

Right because your high prio tasks prevents that ksoftirqd runs,
because it can not preempt the high priority task.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: question about softirqs
@ 2009-05-13 15:54                                 ` Thomas Gleixner
  0 siblings, 0 replies; 57+ messages in thread
From: Thomas Gleixner @ 2009-05-13 15:54 UTC (permalink / raw)
  To: Chris Friesen
  Cc: Peter Zijlstra, netdev, Steven Rostedt, linuxppc-dev, Andi Kleen,
	paulus, Ingo Molnar, David Miller

On Wed, 13 May 2009, Chris Friesen wrote:
> Andi Kleen wrote:
> > Thomas Gleixner <tglx@linutronix.de> writes:
> 
> >>Err, no. Chris is completely correct:
> >>
> >>        if (!in_interrupt())
> >>		wakeup_softirqd();
> > 
> > Yes you have to wake it up just in case, but it doesn't normally
> > process the data because a normal softirq comes in faster. It's
> > just a safety policy. 
> 
> What about the scenario I raised earlier, where we have incoming network
> packets, no hardware interrupts coming in other than the timer tick, and
> a high-priority userspace app is spinning on recvmsg() with MSG_DONTWAIT
> set?
> 
> As far as I can tell, in this scenario softirqs may not get processed on
> return from a syscall (contradicting the documentation).  In the worst
> case, they may not get processed until the next timer tick.

Right because your high prio tasks prevents that ksoftirqd runs,
because it can not preempt the high priority task.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: question about softirqs
  2009-05-13 15:54                                 ` Thomas Gleixner
  (?)
@ 2009-05-13 16:10                                 ` Chris Friesen
  -1 siblings, 0 replies; 57+ messages in thread
From: Chris Friesen @ 2009-05-13 16:10 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Peter Zijlstra, netdev, Steven Rostedt, linuxppc-dev, Andi Kleen,
	paulus, Ingo Molnar, David Miller

Thomas Gleixner wrote:
> On Wed, 13 May 2009, Chris Friesen wrote:

>> As far as I can tell, in this scenario softirqs may not get processed on
>> return from a syscall (contradicting the documentation).  In the worst
>> case, they may not get processed until the next timer tick.
> 
> Right because your high prio tasks prevents that ksoftirqd runs,
> because it can not preempt the high priority task.

Exactly.

I'm suggesting that this point (the idea that softirqs may or may not
get processed on return from syscall depending on relative task
priority) should probably be documented somewhere, because the current
documentation (in the kernel and on the web) doesn't mention it at all.

Maybe I should just submit a patch to
Documentation/DocBook/kernel-hacking.tmpl.

Chris

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: question about softirqs
  2009-05-13 15:05                             ` Chris Friesen
  2009-05-13 15:54                                 ` Thomas Gleixner
@ 2009-05-13 17:01                               ` Andi Kleen
  2009-05-13 19:04                                   ` Chris Friesen
  1 sibling, 1 reply; 57+ messages in thread
From: Andi Kleen @ 2009-05-13 17:01 UTC (permalink / raw)
  To: Chris Friesen
  Cc: Peter Zijlstra, netdev, Ingo Molnar, Steven Rostedt,
	linuxppc-dev, Andi Kleen, paulus, Thomas Gleixner, David Miller

On Wed, May 13, 2009 at 09:05:01AM -0600, Chris Friesen wrote:
> Andi Kleen wrote:
> > Thomas Gleixner <tglx@linutronix.de> writes:
> 
> >>Err, no. Chris is completely correct:
> >>
> >>        if (!in_interrupt())
> >>		wakeup_softirqd();
> > 
> > Yes you have to wake it up just in case, but it doesn't normally
> > process the data because a normal softirq comes in faster. It's
> > just a safety policy. 
> 
> What about the scenario I raised earlier, where we have incoming network
> packets,

network packets are normally processed by the network packet interrupt's
softirq or alternatively in the NAPI poll loop.

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: question about softirqs
  2009-05-13 17:01                               ` Andi Kleen
@ 2009-05-13 19:04                                   ` Chris Friesen
  0 siblings, 0 replies; 57+ messages in thread
From: Chris Friesen @ 2009-05-13 19:04 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Steven Rostedt,
	David Miller, linuxppc-dev, paulus, netdev

Andi Kleen wrote:

> network packets are normally processed by the network packet interrupt's
> softirq or alternatively in the NAPI poll loop.

If we have a high priority task, ksoftirqd may not get a chance to run.

My point is simply that the documentation says that softirqs are
processed on return from a syscall, and this is not necessarily the case.

Chris

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: question about softirqs
@ 2009-05-13 19:04                                   ` Chris Friesen
  0 siblings, 0 replies; 57+ messages in thread
From: Chris Friesen @ 2009-05-13 19:04 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Peter Zijlstra, netdev, Ingo Molnar, Steven Rostedt,
	linuxppc-dev, paulus, Thomas Gleixner, David Miller

Andi Kleen wrote:

> network packets are normally processed by the network packet interrupt's
> softirq or alternatively in the NAPI poll loop.

If we have a high priority task, ksoftirqd may not get a chance to run.

My point is simply that the documentation says that softirqs are
processed on return from a syscall, and this is not necessarily the case.

Chris

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: question about softirqs
  2009-05-13 19:04                                   ` Chris Friesen
@ 2009-05-13 19:13                                     ` Andi Kleen
  -1 siblings, 0 replies; 57+ messages in thread
From: Andi Kleen @ 2009-05-13 19:13 UTC (permalink / raw)
  To: Chris Friesen
  Cc: Andi Kleen, Thomas Gleixner, Ingo Molnar, Peter Zijlstra,
	Steven Rostedt, David Miller, linuxppc-dev, paulus, netdev

On Wed, May 13, 2009 at 01:04:09PM -0600, Chris Friesen wrote:
> Andi Kleen wrote:
> 
> > network packets are normally processed by the network packet interrupt's
> > softirq or alternatively in the NAPI poll loop.
> 
> If we have a high priority task, ksoftirqd may not get a chance to run.

In this case the next interrupt will also process them. It will just
go more slowly because interrupts limit the work compared to ksoftirqd.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: question about softirqs
@ 2009-05-13 19:13                                     ` Andi Kleen
  0 siblings, 0 replies; 57+ messages in thread
From: Andi Kleen @ 2009-05-13 19:13 UTC (permalink / raw)
  To: Chris Friesen
  Cc: Peter Zijlstra, netdev, Ingo Molnar, Steven Rostedt,
	linuxppc-dev, Andi Kleen, paulus, Thomas Gleixner, David Miller

On Wed, May 13, 2009 at 01:04:09PM -0600, Chris Friesen wrote:
> Andi Kleen wrote:
> 
> > network packets are normally processed by the network packet interrupt's
> > softirq or alternatively in the NAPI poll loop.
> 
> If we have a high priority task, ksoftirqd may not get a chance to run.

In this case the next interrupt will also process them. It will just
go more slowly because interrupts limit the work compared to ksoftirqd.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: question about softirqs
  2009-05-13 19:13                                     ` Andi Kleen
  (?)
@ 2009-05-13 19:44                                     ` Chris Friesen
  2009-05-13 19:53                                         ` Andi Kleen
  -1 siblings, 1 reply; 57+ messages in thread
From: Chris Friesen @ 2009-05-13 19:44 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Peter Zijlstra, netdev, Ingo Molnar, Steven Rostedt,
	linuxppc-dev, paulus, Thomas Gleixner, David Miller

Andi Kleen wrote:
> On Wed, May 13, 2009 at 01:04:09PM -0600, Chris Friesen wrote:
>> Andi Kleen wrote:
>>
>>> network packets are normally processed by the network packet interrupt's
>>> softirq or alternatively in the NAPI poll loop.
>> If we have a high priority task, ksoftirqd may not get a chance to run.
> 
> In this case the next interrupt will also process them. It will just
> go more slowly because interrupts limit the work compared to ksoftirqd.

I realize that they will eventually get processed.  My point is that the
documentation (in-kernel, online, and in various books) says that
softirqs will be processed _on the return from a syscall_.  As we all
agree, this is not necessarily the case.

Chris

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: question about softirqs
  2009-05-13 19:44                                     ` Chris Friesen
@ 2009-05-13 19:53                                         ` Andi Kleen
  0 siblings, 0 replies; 57+ messages in thread
From: Andi Kleen @ 2009-05-13 19:53 UTC (permalink / raw)
  To: Chris Friesen
  Cc: Andi Kleen, Thomas Gleixner, Ingo Molnar, Peter Zijlstra,
	Steven Rostedt, David Miller, linuxppc-dev, paulus, netdev

On Wed, May 13, 2009 at 01:44:59PM -0600, Chris Friesen wrote:
> Andi Kleen wrote:
> > On Wed, May 13, 2009 at 01:04:09PM -0600, Chris Friesen wrote:
> >> Andi Kleen wrote:
> >>
> >>> network packets are normally processed by the network packet interrupt's
> >>> softirq or alternatively in the NAPI poll loop.
> >> If we have a high priority task, ksoftirqd may not get a chance to run.
> > 
> > In this case the next interrupt will also process them. It will just
> > go more slowly because interrupts limit the work compared to ksoftirqd.
> 
> I realize that they will eventually get processed.  My point is that the
> documentation (in-kernel, online, and in various books) says that
> softirqs will be processed _on the return from a syscall_. 

They are. The documentation is correct. 

What might not be all processed is all packets that are in the per CPU
backlog queue when the network softirq runs (for non NAPI, for NAPI that's 
obsolete anyways). That's because there are limits.

Or when new work comes in in parallel it doesn't process it all.

But that's always the case -- no queue is infinite, so you have
always situations where it can drop or delay items.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: question about softirqs
@ 2009-05-13 19:53                                         ` Andi Kleen
  0 siblings, 0 replies; 57+ messages in thread
From: Andi Kleen @ 2009-05-13 19:53 UTC (permalink / raw)
  To: Chris Friesen
  Cc: Peter Zijlstra, netdev, Ingo Molnar, Steven Rostedt,
	linuxppc-dev, Andi Kleen, paulus, Thomas Gleixner, David Miller

On Wed, May 13, 2009 at 01:44:59PM -0600, Chris Friesen wrote:
> Andi Kleen wrote:
> > On Wed, May 13, 2009 at 01:04:09PM -0600, Chris Friesen wrote:
> >> Andi Kleen wrote:
> >>
> >>> network packets are normally processed by the network packet interrupt's
> >>> softirq or alternatively in the NAPI poll loop.
> >> If we have a high priority task, ksoftirqd may not get a chance to run.
> > 
> > In this case the next interrupt will also process them. It will just
> > go more slowly because interrupts limit the work compared to ksoftirqd.
> 
> I realize that they will eventually get processed.  My point is that the
> documentation (in-kernel, online, and in various books) says that
> softirqs will be processed _on the return from a syscall_. 

They are. The documentation is correct. 

What might not be all processed is all packets that are in the per CPU
backlog queue when the network softirq runs (for non NAPI, for NAPI that's 
obsolete anyways). That's because there are limits.

Or when new work comes in in parallel it doesn't process it all.

But that's always the case -- no queue is infinite, so you have
always situations where it can drop or delay items.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: question about softirqs
  2009-05-13 19:53                                         ` Andi Kleen
@ 2009-05-13 20:55                                           ` Thomas Gleixner
  -1 siblings, 0 replies; 57+ messages in thread
From: Thomas Gleixner @ 2009-05-13 20:55 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Chris Friesen, Ingo Molnar, Peter Zijlstra, Steven Rostedt,
	David Miller, linuxppc-dev, paulus, netdev

On Wed, 13 May 2009, Andi Kleen wrote:
> On Wed, May 13, 2009 at 01:44:59PM -0600, Chris Friesen wrote:
> > Andi Kleen wrote:
> > > On Wed, May 13, 2009 at 01:04:09PM -0600, Chris Friesen wrote:
> > >> Andi Kleen wrote:
> > >>
> > >>> network packets are normally processed by the network packet interrupt's
> > >>> softirq or alternatively in the NAPI poll loop.
> > >> If we have a high priority task, ksoftirqd may not get a chance to run.
> > > 
> > > In this case the next interrupt will also process them. It will just
> > > go more slowly because interrupts limit the work compared to ksoftirqd.
> > 
> > I realize that they will eventually get processed.  My point is that the
> > documentation (in-kernel, online, and in various books) says that
> > softirqs will be processed _on the return from a syscall_. 
> 
> They are. The documentation is correct. 

No, the documentation is wrong for the case that the task, which
raised the softirq and therefor woke up ksoftirqd, has a higher
priority than ksoftirqd. In that case the kernel does _NOT_ schedule
ksoftirqd in the return from syscall path.

And that's all what Chris is pointing out.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: question about softirqs
@ 2009-05-13 20:55                                           ` Thomas Gleixner
  0 siblings, 0 replies; 57+ messages in thread
From: Thomas Gleixner @ 2009-05-13 20:55 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Peter Zijlstra, netdev, Steven Rostedt, linuxppc-dev, paulus,
	Ingo Molnar, David Miller

On Wed, 13 May 2009, Andi Kleen wrote:
> On Wed, May 13, 2009 at 01:44:59PM -0600, Chris Friesen wrote:
> > Andi Kleen wrote:
> > > On Wed, May 13, 2009 at 01:04:09PM -0600, Chris Friesen wrote:
> > >> Andi Kleen wrote:
> > >>
> > >>> network packets are normally processed by the network packet interrupt's
> > >>> softirq or alternatively in the NAPI poll loop.
> > >> If we have a high priority task, ksoftirqd may not get a chance to run.
> > > 
> > > In this case the next interrupt will also process them. It will just
> > > go more slowly because interrupts limit the work compared to ksoftirqd.
> > 
> > I realize that they will eventually get processed.  My point is that the
> > documentation (in-kernel, online, and in various books) says that
> > softirqs will be processed _on the return from a syscall_. 
> 
> They are. The documentation is correct. 

No, the documentation is wrong for the case that the task, which
raised the softirq and therefor woke up ksoftirqd, has a higher
priority than ksoftirqd. In that case the kernel does _NOT_ schedule
ksoftirqd in the return from syscall path.

And that's all what Chris is pointing out.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 57+ messages in thread

end of thread, other threads:[~2009-05-13 20:56 UTC | newest]

Thread overview: 57+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-05-08 22:51 question about softirqs Chris Friesen
2009-05-08 23:05 ` David Miller
2009-05-08 23:34 ` Paul Mackerras
2009-05-08 23:53   ` David Miller
2009-05-09  2:52     ` Benjamin Herrenschmidt
2009-05-09  3:31     ` Paul Mackerras
2009-05-09  6:48       ` David Miller
2009-05-11 18:25         ` Chris Friesen
2009-05-11 23:24           ` David Miller
2009-05-12  0:43             ` Chris Friesen
2009-05-12  8:12               ` Ingo Molnar
2009-05-12  8:12                 ` Ingo Molnar
2009-05-12  9:12                 ` Peter Zijlstra
2009-05-12  9:23                   ` Ingo Molnar
2009-05-12  9:32                     ` Peter Zijlstra
2009-05-12 12:20                       ` Steven Rostedt
2009-05-12 12:20                         ` Steven Rostedt
2009-05-13  4:45                         ` David Miller
2009-05-13  4:44                     ` David Miller
2009-05-13  4:44                       ` David Miller
2009-05-13  5:15                       ` Paul Mackerras
2009-05-13  5:15                         ` Paul Mackerras
2009-05-13  5:28                         ` David Miller
2009-05-13  5:28                           ` David Miller
2009-05-13  5:55                   ` Evgeniy Polyakov
2009-05-13  5:55                     ` Evgeniy Polyakov
2009-05-12 15:18                 ` Chris Friesen
2009-05-12 15:18                   ` Chris Friesen
2009-05-13  8:34                   ` Andi Kleen
2009-05-13  8:34                     ` Andi Kleen
2009-05-13 13:23                     ` Chris Friesen
2009-05-13 14:15                       ` Andi Kleen
2009-05-13 14:15                         ` Andi Kleen
2009-05-13 14:17                         ` Thomas Gleixner
2009-05-13 14:17                           ` Thomas Gleixner
2009-05-13 14:24                           ` Andi Kleen
2009-05-13 14:24                             ` Andi Kleen
2009-05-13 14:54                             ` Eric Dumazet
2009-05-13 14:54                               ` Eric Dumazet
2009-05-13 15:02                               ` Andi Kleen
2009-05-13 15:02                                 ` Andi Kleen
2009-05-13 15:05                             ` Chris Friesen
2009-05-13 15:54                               ` Thomas Gleixner
2009-05-13 15:54                                 ` Thomas Gleixner
2009-05-13 16:10                                 ` Chris Friesen
2009-05-13 17:01                               ` Andi Kleen
2009-05-13 19:04                                 ` Chris Friesen
2009-05-13 19:04                                   ` Chris Friesen
2009-05-13 19:13                                   ` Andi Kleen
2009-05-13 19:13                                     ` Andi Kleen
2009-05-13 19:44                                     ` Chris Friesen
2009-05-13 19:53                                       ` Andi Kleen
2009-05-13 19:53                                         ` Andi Kleen
2009-05-13 20:55                                         ` Thomas Gleixner
2009-05-13 20:55                                           ` Thomas Gleixner
2009-05-11 23:34           ` Paul Mackerras
2009-05-09  0:28   ` Chris Friesen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.