* Re: question about softirqs
@ 2009-05-12 8:12 ` Ingo Molnar
0 siblings, 0 replies; 57+ messages in thread
From: Ingo Molnar @ 2009-05-12 8:12 UTC (permalink / raw)
To: Chris Friesen, Peter Zijlstra, Thomas Gleixner, Steven Rostedt
Cc: linuxppc-dev, paulus, David Miller, netdev
* Chris Friesen <cfriesen@nortel.com> wrote:
> This started out as a thread on the ppc list, but on the
> suggestion of DaveM and Paul Mackerras I'm expanding the receiver
> list a bit.
>
> Currently, if a softirq is raised in process context the
> TIF_RESCHED_PENDING flag gets set and on return to userspace we
> run the scheduler, expecting it to switch to ksoftirqd to handle
> the softirqd processing.
>
> I think I see a possible problem with this. Suppose I have a
> SCHED_FIFO task spinning on recvmsg() with MSG_DONTWAIT set. Under
> the scenario above, schedule() would re-run the spinning task
> rather than ksoftirqd, thus preventing any incoming packets from
> being sent up the stack until we get a real hardware
> interrupt--which could be a whole jiffy if interrupt mitigation is
> enabled in the net device.
TIF_RESCHED_PENDING will not be set if a SCHED_FIFO task wakes up a
SCHED_OTHER ksoftirqd task. But starvation of ksoftirqd processing
will occur.
> DaveM pointed out that if we're doing transmits we're likely to
> hit local_bh_enable(), which would process the softirq work.
> However, I think we may still have a problem in the above rx-only
> scenario--or is it too contrived to matter?
This could occur, and the problem is really that task priorities do
not extend across softirq work processing.
This could occur in ordinary SCHED_OTHER tasks as well, if the
softirq is bounced to ksoftirqd - which it only should be if there's
serious softirq overload - or, as you describe it above, if the
softirq is raised in process context:
if (!in_interrupt())
wakeup_softirqd();
that's not really clean. We look into eliminating process context
use of raise_softirq_irqsoff(). Such code sequence:
local_irq_save(flags);
...
raise_softirq_irqsoff(nr);
...
local_irq_restore(flags);
should be converted to something like:
local_irq_save(flags);
...
raise_softirq_irqsoff(nr);
...
local_irq_restore(flags);
recheck_softirqs();
If someone does not do proper local_bh_disable()/enable() sequences
for micro-optimization reasons, then push the check to after the
critcal section - and dont cause extra reschedules by waking up
ksoftirqd. raise_softirq_irqsoff() will also be faster.
Ingo
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: question about softirqs
2009-05-12 8:12 ` Ingo Molnar
(?)
@ 2009-05-12 9:12 ` Peter Zijlstra
2009-05-12 9:23 ` Ingo Molnar
2009-05-13 5:55 ` Evgeniy Polyakov
-1 siblings, 2 replies; 57+ messages in thread
From: Peter Zijlstra @ 2009-05-12 9:12 UTC (permalink / raw)
To: Ingo Molnar
Cc: linuxppc-dev, netdev, Steven Rostedt, paulus, Thomas Gleixner,
David Miller
On Tue, 2009-05-12 at 10:12 +0200, Ingo Molnar wrote:
> * Chris Friesen <cfriesen@nortel.com> wrote:
>
> > This started out as a thread on the ppc list, but on the
> > suggestion of DaveM and Paul Mackerras I'm expanding the receiver
> > list a bit.
> >
> > Currently, if a softirq is raised in process context the
> > TIF_RESCHED_PENDING flag gets set and on return to userspace we
> > run the scheduler, expecting it to switch to ksoftirqd to handle
> > the softirqd processing.
> >
> > I think I see a possible problem with this. Suppose I have a
> > SCHED_FIFO task spinning on recvmsg() with MSG_DONTWAIT set. Under
> > the scenario above, schedule() would re-run the spinning task
> > rather than ksoftirqd, thus preventing any incoming packets from
> > being sent up the stack until we get a real hardware
> > interrupt--which could be a whole jiffy if interrupt mitigation is
> > enabled in the net device.
>
> TIF_RESCHED_PENDING will not be set if a SCHED_FIFO task wakes up a
> SCHED_OTHER ksoftirqd task. But starvation of ksoftirqd processing
> will occur.
>
> > DaveM pointed out that if we're doing transmits we're likely to
> > hit local_bh_enable(), which would process the softirq work.
> > However, I think we may still have a problem in the above rx-only
> > scenario--or is it too contrived to matter?
>
> This could occur, and the problem is really that task priorities do
> not extend across softirq work processing.
>
> This could occur in ordinary SCHED_OTHER tasks as well, if the
> softirq is bounced to ksoftirqd - which it only should be if there's
> serious softirq overload - or, as you describe it above, if the
> softirq is raised in process context:
>
> if (!in_interrupt())
> wakeup_softirqd();
>
> that's not really clean. We look into eliminating process context
> use of raise_softirq_irqsoff(). Such code sequence:
>
> local_irq_save(flags);
> ...
> raise_softirq_irqsoff(nr);
> ...
> local_irq_restore(flags);
>
> should be converted to something like:
>
> local_irq_save(flags);
> ...
> raise_softirq_irqsoff(nr);
> ...
> local_irq_restore(flags);
> recheck_softirqs();
>
> If someone does not do proper local_bh_disable()/enable() sequences
> for micro-optimization reasons, then push the check to after the
> critcal section - and dont cause extra reschedules by waking up
> ksoftirqd. raise_softirq_irqsoff() will also be faster.
Wouldn't the even better solution be to get rid of softirqs
all-together?
I see the recent work by Thomas to get threaded interrupts upstream as a
good first step towards that goal, once the RX processing is moved to a
thread (or multiple threads) one can priorize them in the regular
sys_sched_setscheduler() way and its obvious that a FIFO task above the
priority of the network tasks will have network starvation issues.
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: question about softirqs
2009-05-12 9:12 ` Peter Zijlstra
@ 2009-05-12 9:23 ` Ingo Molnar
2009-05-12 9:32 ` Peter Zijlstra
2009-05-13 4:44 ` David Miller
2009-05-13 5:55 ` Evgeniy Polyakov
1 sibling, 2 replies; 57+ messages in thread
From: Ingo Molnar @ 2009-05-12 9:23 UTC (permalink / raw)
To: Peter Zijlstra
Cc: linuxppc-dev, netdev, Steven Rostedt, paulus, Thomas Gleixner,
David Miller
* Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> On Tue, 2009-05-12 at 10:12 +0200, Ingo Molnar wrote:
> > * Chris Friesen <cfriesen@nortel.com> wrote:
> >
> > > This started out as a thread on the ppc list, but on the
> > > suggestion of DaveM and Paul Mackerras I'm expanding the receiver
> > > list a bit.
> > >
> > > Currently, if a softirq is raised in process context the
> > > TIF_RESCHED_PENDING flag gets set and on return to userspace we
> > > run the scheduler, expecting it to switch to ksoftirqd to handle
> > > the softirqd processing.
> > >
> > > I think I see a possible problem with this. Suppose I have a
> > > SCHED_FIFO task spinning on recvmsg() with MSG_DONTWAIT set. Under
> > > the scenario above, schedule() would re-run the spinning task
> > > rather than ksoftirqd, thus preventing any incoming packets from
> > > being sent up the stack until we get a real hardware
> > > interrupt--which could be a whole jiffy if interrupt mitigation is
> > > enabled in the net device.
> >
> > TIF_RESCHED_PENDING will not be set if a SCHED_FIFO task wakes up a
> > SCHED_OTHER ksoftirqd task. But starvation of ksoftirqd processing
> > will occur.
> >
> > > DaveM pointed out that if we're doing transmits we're likely to
> > > hit local_bh_enable(), which would process the softirq work.
> > > However, I think we may still have a problem in the above rx-only
> > > scenario--or is it too contrived to matter?
> >
> > This could occur, and the problem is really that task priorities do
> > not extend across softirq work processing.
> >
> > This could occur in ordinary SCHED_OTHER tasks as well, if the
> > softirq is bounced to ksoftirqd - which it only should be if there's
> > serious softirq overload - or, as you describe it above, if the
> > softirq is raised in process context:
> >
> > if (!in_interrupt())
> > wakeup_softirqd();
> >
> > that's not really clean. We look into eliminating process context
> > use of raise_softirq_irqsoff(). Such code sequence:
> >
> > local_irq_save(flags);
> > ...
> > raise_softirq_irqsoff(nr);
> > ...
> > local_irq_restore(flags);
> >
> > should be converted to something like:
> >
> > local_irq_save(flags);
> > ...
> > raise_softirq_irqsoff(nr);
> > ...
> > local_irq_restore(flags);
> > recheck_softirqs();
> >
> > If someone does not do proper local_bh_disable()/enable() sequences
> > for micro-optimization reasons, then push the check to after the
> > critcal section - and dont cause extra reschedules by waking up
> > ksoftirqd. raise_softirq_irqsoff() will also be faster.
>
>
> Wouldn't the even better solution be to get rid of softirqs
> all-together?
>
> I see the recent work by Thomas to get threaded interrupts
> upstream as a good first step towards that goal, once the RX
> processing is moved to a thread (or multiple threads) one can
> priorize them in the regular sys_sched_setscheduler() way and its
> obvious that a FIFO task above the priority of the network tasks
> will have network starvation issues.
Yeah, that would be "nice". A single IRQ thread plus the process
context(s) doing networking might perform well.
Multiple IRQ threads (softirq and hardirq threads mixed) i'm not so
sure about - it's extra context-switching cost.
Btw, i noticed that using scheduling for work (packet, etc.) flow
distribution standardizes and evens out the behavior of workloads.
Softirq scheduling is really quite random currently. We have a
random processing loop-limit in the core code and various batching
and work-limit controls at individual usage sites. We sometimes
piggyback to ksoftirqd. It's far easier to keep performance in check
when things are more predictable.
But this is not an easy endevour, and performance regressions have
to be expected and addressed if they occur. There can be random
packet queuing details in networking drivers that just happen to
work fine now, and might work worse with a kernel thread in place.
So there has to be broad buy-in for the concept, and a concerted
effort to eliminate softirq processing and most of hardirq
processing by pushing those two elements into a single hardirq
thread (and the rest into process context).
Not for the faint hearted. Nor is it recommended to be done without
a good layer of asbestos.
Ingo
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: question about softirqs
2009-05-12 9:23 ` Ingo Molnar
@ 2009-05-12 9:32 ` Peter Zijlstra
2009-05-12 12:20 ` Steven Rostedt
2009-05-13 4:44 ` David Miller
1 sibling, 1 reply; 57+ messages in thread
From: Peter Zijlstra @ 2009-05-12 9:32 UTC (permalink / raw)
To: Ingo Molnar
Cc: linuxppc-dev, netdev, Steven Rostedt, paulus, Thomas Gleixner,
David Miller
On Tue, 2009-05-12 at 11:23 +0200, Ingo Molnar wrote:
>
> Yeah, that would be "nice". A single IRQ thread plus the process
> context(s) doing networking might perform well.
>
> Multiple IRQ threads (softirq and hardirq threads mixed) i'm not so
> sure about - it's extra context-switching cost.
Sure, that was implied by the getting rid of softirqs ;-), on -rt we
currently suffer this hardirq/softirq thread ping-pong, it sucks.
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: question about softirqs
2009-05-12 9:32 ` Peter Zijlstra
@ 2009-05-12 12:20 ` Steven Rostedt
0 siblings, 0 replies; 57+ messages in thread
From: Steven Rostedt @ 2009-05-12 12:20 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Ingo Molnar, Chris Friesen, Thomas Gleixner, David Miller,
linuxppc-dev, paulus, netdev
On Tue, 12 May 2009, Peter Zijlstra wrote:
> On Tue, 2009-05-12 at 11:23 +0200, Ingo Molnar wrote:
> >
> > Yeah, that would be "nice". A single IRQ thread plus the process
> > context(s) doing networking might perform well.
> >
> > Multiple IRQ threads (softirq and hardirq threads mixed) i'm not so
> > sure about - it's extra context-switching cost.
>
> Sure, that was implied by the getting rid of softirqs ;-), on -rt we
> currently suffer this hardirq/softirq thread ping-pong, it sucks.
I'm going to be playing around with bypassing the net-rx/tx with my
network drivers. I'm going to add threaded irqs for my network cards and
have the driver threads do the work to get through the tcp/ip stack.
I'll still keep the softirqs for other cards, but I want to see how fast
it speeds things up if I have the driver thread do it.
-- Steve
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: question about softirqs
@ 2009-05-12 12:20 ` Steven Rostedt
0 siblings, 0 replies; 57+ messages in thread
From: Steven Rostedt @ 2009-05-12 12:20 UTC (permalink / raw)
To: Peter Zijlstra
Cc: netdev, David Miller, linuxppc-dev, paulus, Ingo Molnar, Thomas Gleixner
On Tue, 12 May 2009, Peter Zijlstra wrote:
> On Tue, 2009-05-12 at 11:23 +0200, Ingo Molnar wrote:
> >
> > Yeah, that would be "nice". A single IRQ thread plus the process
> > context(s) doing networking might perform well.
> >
> > Multiple IRQ threads (softirq and hardirq threads mixed) i'm not so
> > sure about - it's extra context-switching cost.
>
> Sure, that was implied by the getting rid of softirqs ;-), on -rt we
> currently suffer this hardirq/softirq thread ping-pong, it sucks.
I'm going to be playing around with bypassing the net-rx/tx with my
network drivers. I'm going to add threaded irqs for my network cards and
have the driver threads do the work to get through the tcp/ip stack.
I'll still keep the softirqs for other cards, but I want to see how fast
it speeds things up if I have the driver thread do it.
-- Steve
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: question about softirqs
2009-05-12 12:20 ` Steven Rostedt
(?)
@ 2009-05-13 4:45 ` David Miller
-1 siblings, 0 replies; 57+ messages in thread
From: David Miller @ 2009-05-13 4:45 UTC (permalink / raw)
To: rostedt; +Cc: a.p.zijlstra, linuxppc-dev, netdev, paulus, mingo, tglx
From: Steven Rostedt <rostedt@goodmis.org>
Date: Tue, 12 May 2009 08:20:51 -0400 (EDT)
> I'm going to be playing around with bypassing the net-rx/tx with my
> network drivers. I'm going to add threaded irqs for my network cards and
> have the driver threads do the work to get through the tcp/ip stack.
>
> I'll still keep the softirqs for other cards, but I want to see how fast
> it speeds things up if I have the driver thread do it.
I think your latency is going to be dreadful.
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: question about softirqs
2009-05-12 9:23 ` Ingo Molnar
@ 2009-05-13 4:44 ` David Miller
2009-05-13 4:44 ` David Miller
1 sibling, 0 replies; 57+ messages in thread
From: David Miller @ 2009-05-13 4:44 UTC (permalink / raw)
To: mingo; +Cc: a.p.zijlstra, cfriesen, tglx, rostedt, linuxppc-dev, paulus, netdev
From: Ingo Molnar <mingo@elte.hu>
Date: Tue, 12 May 2009 11:23:48 +0200
>> Wouldn't the even better solution be to get rid of softirqs
>> all-together?
>>
>> I see the recent work by Thomas to get threaded interrupts
>> upstream as a good first step towards that goal, once the RX
>> processing is moved to a thread (or multiple threads) one can
>> priorize them in the regular sys_sched_setscheduler() way and its
>> obvious that a FIFO task above the priority of the network tasks
>> will have network starvation issues.
>
> Yeah, that would be "nice". A single IRQ thread plus the process
> context(s) doing networking might perform well.
Nice for -rt goals, but not for latency.
So we're going to regress in this area again? I can't see how
that's so desirable, to be honest with you.
The fact that this discussion started about a task with a certain
priority not being able to make forward progress, even though it
was correct coded, just because softirqs are being processed in
a thread context, should be a big red flag that this is a buggered up
design.
I fully expected us to be, at this point, talking about putting the
pending softirq check back into the trap return path :-/
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: question about softirqs
@ 2009-05-13 4:44 ` David Miller
0 siblings, 0 replies; 57+ messages in thread
From: David Miller @ 2009-05-13 4:44 UTC (permalink / raw)
To: mingo; +Cc: a.p.zijlstra, linuxppc-dev, netdev, rostedt, paulus, tglx
From: Ingo Molnar <mingo@elte.hu>
Date: Tue, 12 May 2009 11:23:48 +0200
>> Wouldn't the even better solution be to get rid of softirqs
>> all-together?
>>
>> I see the recent work by Thomas to get threaded interrupts
>> upstream as a good first step towards that goal, once the RX
>> processing is moved to a thread (or multiple threads) one can
>> priorize them in the regular sys_sched_setscheduler() way and its
>> obvious that a FIFO task above the priority of the network tasks
>> will have network starvation issues.
>
> Yeah, that would be "nice". A single IRQ thread plus the process
> context(s) doing networking might perform well.
Nice for -rt goals, but not for latency.
So we're going to regress in this area again? I can't see how
that's so desirable, to be honest with you.
The fact that this discussion started about a task with a certain
priority not being able to make forward progress, even though it
was correct coded, just because softirqs are being processed in
a thread context, should be a big red flag that this is a buggered up
design.
I fully expected us to be, at this point, talking about putting the
pending softirq check back into the trap return path :-/
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: question about softirqs
2009-05-13 4:44 ` David Miller
@ 2009-05-13 5:15 ` Paul Mackerras
-1 siblings, 0 replies; 57+ messages in thread
From: Paul Mackerras @ 2009-05-13 5:15 UTC (permalink / raw)
To: David Miller
Cc: mingo, a.p.zijlstra, cfriesen, tglx, rostedt, linuxppc-dev, netdev
David Miller writes:
> I fully expected us to be, at this point, talking about putting the
> pending softirq check back into the trap return path :-/
Would that actually do any good, in the case where the system has
decided that ksoftirqd is handling soft irqs at the moment?
Paul.
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: question about softirqs
@ 2009-05-13 5:15 ` Paul Mackerras
0 siblings, 0 replies; 57+ messages in thread
From: Paul Mackerras @ 2009-05-13 5:15 UTC (permalink / raw)
To: David Miller; +Cc: a.p.zijlstra, linuxppc-dev, netdev, rostedt, mingo, tglx
David Miller writes:
> I fully expected us to be, at this point, talking about putting the
> pending softirq check back into the trap return path :-/
Would that actually do any good, in the case where the system has
decided that ksoftirqd is handling soft irqs at the moment?
Paul.
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: question about softirqs
2009-05-13 5:15 ` Paul Mackerras
@ 2009-05-13 5:28 ` David Miller
-1 siblings, 0 replies; 57+ messages in thread
From: David Miller @ 2009-05-13 5:28 UTC (permalink / raw)
To: paulus; +Cc: mingo, a.p.zijlstra, cfriesen, tglx, rostedt, linuxppc-dev, netdev
From: Paul Mackerras <paulus@samba.org>
Date: Wed, 13 May 2009 15:15:34 +1000
> David Miller writes:
>
>> I fully expected us to be, at this point, talking about putting the
>> pending softirq check back into the trap return path :-/
>
> Would that actually do any good, in the case where the system has
> decided that ksoftirqd is handling soft irqs at the moment?
Even if ksoftirqd is running, we check and run pending softirqs from
trap return.
Sure, I imagine we could re-enter this "ksoftirq blocked by highprio
thread" situation if we get flooded every single time over and over
again.
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: question about softirqs
@ 2009-05-13 5:28 ` David Miller
0 siblings, 0 replies; 57+ messages in thread
From: David Miller @ 2009-05-13 5:28 UTC (permalink / raw)
To: paulus; +Cc: a.p.zijlstra, linuxppc-dev, netdev, rostedt, mingo, tglx
From: Paul Mackerras <paulus@samba.org>
Date: Wed, 13 May 2009 15:15:34 +1000
> David Miller writes:
>
>> I fully expected us to be, at this point, talking about putting the
>> pending softirq check back into the trap return path :-/
>
> Would that actually do any good, in the case where the system has
> decided that ksoftirqd is handling soft irqs at the moment?
Even if ksoftirqd is running, we check and run pending softirqs from
trap return.
Sure, I imagine we could re-enter this "ksoftirq blocked by highprio
thread" situation if we get flooded every single time over and over
again.
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: question about softirqs
2009-05-12 9:12 ` Peter Zijlstra
@ 2009-05-13 5:55 ` Evgeniy Polyakov
2009-05-13 5:55 ` Evgeniy Polyakov
1 sibling, 0 replies; 57+ messages in thread
From: Evgeniy Polyakov @ 2009-05-13 5:55 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Ingo Molnar, Chris Friesen, Thomas Gleixner, Steven Rostedt,
David Miller, linuxppc-dev, paulus, netdev
Hi.
On Tue, May 12, 2009 at 11:12:58AM +0200, Peter Zijlstra (a.p.zijlstra@chello.nl) wrote:
> Wouldn't the even better solution be to get rid of softirqs
> all-together?
And move tasklets into some thread context?
Only if we are ready to fix 7 times rescheduling regressions compared to
kernel threads (work queue actually). At least that's how tasklet
behaved compared to work queue 1.5 years ago in the simplest
and quite naive test where tasklet/work rescheduled iself number of
times:
http://marc.info/?l=linux-crypto-vger&m=119462472517405&w=2
--
Evgeniy Polyakov
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: question about softirqs
@ 2009-05-13 5:55 ` Evgeniy Polyakov
0 siblings, 0 replies; 57+ messages in thread
From: Evgeniy Polyakov @ 2009-05-13 5:55 UTC (permalink / raw)
To: Peter Zijlstra
Cc: netdev, Steven Rostedt, David Miller, linuxppc-dev, paulus,
Ingo Molnar, Thomas Gleixner
Hi.
On Tue, May 12, 2009 at 11:12:58AM +0200, Peter Zijlstra (a.p.zijlstra@chello.nl) wrote:
> Wouldn't the even better solution be to get rid of softirqs
> all-together?
And move tasklets into some thread context?
Only if we are ready to fix 7 times rescheduling regressions compared to
kernel threads (work queue actually). At least that's how tasklet
behaved compared to work queue 1.5 years ago in the simplest
and quite naive test where tasklet/work rescheduled iself number of
times:
http://marc.info/?l=linux-crypto-vger&m=119462472517405&w=2
--
Evgeniy Polyakov
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: question about softirqs
2009-05-12 8:12 ` Ingo Molnar
@ 2009-05-12 15:18 ` Chris Friesen
-1 siblings, 0 replies; 57+ messages in thread
From: Chris Friesen @ 2009-05-12 15:18 UTC (permalink / raw)
To: Ingo Molnar
Cc: Peter Zijlstra, Thomas Gleixner, Steven Rostedt, David Miller,
linuxppc-dev, paulus, netdev
Ingo Molnar wrote:
> * Chris Friesen <cfriesen@nortel.com> wrote:
>>I think I see a possible problem with this. Suppose I have a
>>SCHED_FIFO task spinning on recvmsg() with MSG_DONTWAIT set. Under
>>the scenario above, schedule() would re-run the spinning task
>>rather than ksoftirqd, thus preventing any incoming packets from
>>being sent up the stack until we get a real hardware
>>interrupt--which could be a whole jiffy if interrupt mitigation is
>>enabled in the net device.
>>DaveM pointed out that if we're doing transmits we're likely to
>>hit local_bh_enable(), which would process the softirq work.
>>However, I think we may still have a problem in the above rx-only
>>scenario--or is it too contrived to matter?
> This could occur, and the problem is really that task priorities do
> not extend across softirq work processing.
>
> This could occur in ordinary SCHED_OTHER tasks as well, if the
> softirq is bounced to ksoftirqd - which it only should be if there's
> serious softirq overload - or, as you describe it above, if the
> softirq is raised in process context:
One of the reasons I brought up this issue is that there is a lot of
documentation out there that says "softirqs will be processed on return
from a syscall". The fact that it actually depends on the scheduler
parameters of the task issuing the syscall isn't ever mentioned.
In fact, "Documentation/DocBook/kernel-hacking.tmpl" in the kernel
source still has the following:
Whenever a system call is about to return to userspace, or a
hardware interrupt handler exits, any 'software interrupts'
which are marked pending (usually by hardware interrupts) are
run (<filename>kernel/softirq.c</filename>).
If anyone is looking at changing this code, it might be good to ensure
that at least the kernel docs are updated.
Chris
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: question about softirqs
@ 2009-05-12 15:18 ` Chris Friesen
0 siblings, 0 replies; 57+ messages in thread
From: Chris Friesen @ 2009-05-12 15:18 UTC (permalink / raw)
To: Ingo Molnar
Cc: Peter Zijlstra, netdev, Steven Rostedt, linuxppc-dev, paulus,
Thomas Gleixner, David Miller
Ingo Molnar wrote:
> * Chris Friesen <cfriesen@nortel.com> wrote:
>>I think I see a possible problem with this. Suppose I have a
>>SCHED_FIFO task spinning on recvmsg() with MSG_DONTWAIT set. Under
>>the scenario above, schedule() would re-run the spinning task
>>rather than ksoftirqd, thus preventing any incoming packets from
>>being sent up the stack until we get a real hardware
>>interrupt--which could be a whole jiffy if interrupt mitigation is
>>enabled in the net device.
>>DaveM pointed out that if we're doing transmits we're likely to
>>hit local_bh_enable(), which would process the softirq work.
>>However, I think we may still have a problem in the above rx-only
>>scenario--or is it too contrived to matter?
> This could occur, and the problem is really that task priorities do
> not extend across softirq work processing.
>
> This could occur in ordinary SCHED_OTHER tasks as well, if the
> softirq is bounced to ksoftirqd - which it only should be if there's
> serious softirq overload - or, as you describe it above, if the
> softirq is raised in process context:
One of the reasons I brought up this issue is that there is a lot of
documentation out there that says "softirqs will be processed on return
from a syscall". The fact that it actually depends on the scheduler
parameters of the task issuing the syscall isn't ever mentioned.
In fact, "Documentation/DocBook/kernel-hacking.tmpl" in the kernel
source still has the following:
Whenever a system call is about to return to userspace, or a
hardware interrupt handler exits, any 'software interrupts'
which are marked pending (usually by hardware interrupts) are
run (<filename>kernel/softirq.c</filename>).
If anyone is looking at changing this code, it might be good to ensure
that at least the kernel docs are updated.
Chris
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: question about softirqs
2009-05-12 15:18 ` Chris Friesen
@ 2009-05-13 8:34 ` Andi Kleen
-1 siblings, 0 replies; 57+ messages in thread
From: Andi Kleen @ 2009-05-13 8:34 UTC (permalink / raw)
To: Chris Friesen
Cc: Ingo Molnar, Peter Zijlstra, Thomas Gleixner, Steven Rostedt,
David Miller, linuxppc-dev, paulus, netdev
"Chris Friesen" <cfriesen@nortel.com> writes:
>
> One of the reasons I brought up this issue is that there is a lot of
> documentation out there that says "softirqs will be processed on return
> from a syscall". The fact that it actually depends on the scheduler
> parameters of the task issuing the syscall isn't ever mentioned.
It's not mentioned because it is not currently.
However some network TCP RX processing can happen in process context,
which gives you most of the benefit anyways.
> In fact, "Documentation/DocBook/kernel-hacking.tmpl" in the kernel
> source still has the following:
>
> Whenever a system call is about to return to userspace, or a
> hardware interrupt handler exits, any 'software interrupts'
> which are marked pending (usually by hardware interrupts) are
> run (<filename>kernel/softirq.c</filename>).
>
> If anyone is looking at changing this code, it might be good to ensure
> that at least the kernel docs are updated.
So far the code is not changed in mainline. There have been some
proposals only.
-Andi
--
ak@linux.intel.com -- Speaking for myself only.
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: question about softirqs
@ 2009-05-13 8:34 ` Andi Kleen
0 siblings, 0 replies; 57+ messages in thread
From: Andi Kleen @ 2009-05-13 8:34 UTC (permalink / raw)
To: Chris Friesen
Cc: Peter Zijlstra, netdev, Steven Rostedt, David Miller,
linuxppc-dev, paulus, Ingo Molnar, Thomas Gleixner
"Chris Friesen" <cfriesen@nortel.com> writes:
>
> One of the reasons I brought up this issue is that there is a lot of
> documentation out there that says "softirqs will be processed on return
> from a syscall". The fact that it actually depends on the scheduler
> parameters of the task issuing the syscall isn't ever mentioned.
It's not mentioned because it is not currently.
However some network TCP RX processing can happen in process context,
which gives you most of the benefit anyways.
> In fact, "Documentation/DocBook/kernel-hacking.tmpl" in the kernel
> source still has the following:
>
> Whenever a system call is about to return to userspace, or a
> hardware interrupt handler exits, any 'software interrupts'
> which are marked pending (usually by hardware interrupts) are
> run (<filename>kernel/softirq.c</filename>).
>
> If anyone is looking at changing this code, it might be good to ensure
> that at least the kernel docs are updated.
So far the code is not changed in mainline. There have been some
proposals only.
-Andi
--
ak@linux.intel.com -- Speaking for myself only.
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: question about softirqs
2009-05-13 8:34 ` Andi Kleen
(?)
@ 2009-05-13 13:23 ` Chris Friesen
2009-05-13 14:15 ` Andi Kleen
-1 siblings, 1 reply; 57+ messages in thread
From: Chris Friesen @ 2009-05-13 13:23 UTC (permalink / raw)
To: Andi Kleen
Cc: Peter Zijlstra, netdev, Steven Rostedt, David Miller,
linuxppc-dev, paulus, Ingo Molnar, Thomas Gleixner
Andi Kleen wrote:
> "Chris Friesen" <cfriesen@nortel.com> writes:
>
>>One of the reasons I brought up this issue is that there is a lot of
>>documentation out there that says "softirqs will be processed on return
>>from a syscall". The fact that it actually depends on the scheduler
>>parameters of the task issuing the syscall isn't ever mentioned.
> It's not mentioned because it is not currently.
Paul Mackerras explained the current behaviour earlier in the thread
(when it was still on the ppc list). His explanation agrees with my
exporation of the code.
"If a soft irq is raised in process context, raise_softirq() in
kernel/softirq.c calls wakeup_softirqd() to make sure that ksoftirqd
runs soon to process the soft irq. So what would happen is that we
would see the TIF_RESCHED_PENDING flag on the current task in the
syscall exit path and call schedule() which would switch to ksoftirqd
to process the soft irq (if it hasn't already been processed by that
stage)."
If the current task is of higher priority, ksoftirqd doesn't get a
chance to run and we don't process softirqs on return from a syscall.
Chris
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: question about softirqs
2009-05-13 13:23 ` Chris Friesen
@ 2009-05-13 14:15 ` Andi Kleen
0 siblings, 0 replies; 57+ messages in thread
From: Andi Kleen @ 2009-05-13 14:15 UTC (permalink / raw)
To: Chris Friesen
Cc: Andi Kleen, Ingo Molnar, Peter Zijlstra, Thomas Gleixner,
Steven Rostedt, David Miller, linuxppc-dev, paulus, netdev
> "If a soft irq is raised in process context, raise_softirq() in
> kernel/softirq.c calls wakeup_softirqd() to make sure that ksoftirqd
softirqd is only used when the softirq runs for too long or when
there are no suitable irq exits for a long time.
In normal situations (not excessive time in softirq) they don't
do anything.
-Andi
--
ak@linux.intel.com -- Speaking for myself only.
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: question about softirqs
@ 2009-05-13 14:15 ` Andi Kleen
0 siblings, 0 replies; 57+ messages in thread
From: Andi Kleen @ 2009-05-13 14:15 UTC (permalink / raw)
To: Chris Friesen
Cc: Peter Zijlstra, netdev, Steven Rostedt, David Miller,
linuxppc-dev, Andi Kleen, paulus, Ingo Molnar, Thomas Gleixner
> "If a soft irq is raised in process context, raise_softirq() in
> kernel/softirq.c calls wakeup_softirqd() to make sure that ksoftirqd
softirqd is only used when the softirq runs for too long or when
there are no suitable irq exits for a long time.
In normal situations (not excessive time in softirq) they don't
do anything.
-Andi
--
ak@linux.intel.com -- Speaking for myself only.
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: question about softirqs
2009-05-13 14:15 ` Andi Kleen
@ 2009-05-13 14:17 ` Thomas Gleixner
-1 siblings, 0 replies; 57+ messages in thread
From: Thomas Gleixner @ 2009-05-13 14:17 UTC (permalink / raw)
To: Andi Kleen
Cc: Chris Friesen, Ingo Molnar, Peter Zijlstra, Steven Rostedt,
David Miller, linuxppc-dev, paulus, netdev
On Wed, 13 May 2009, Andi Kleen wrote:
> > "If a soft irq is raised in process context, raise_softirq() in
> > kernel/softirq.c calls wakeup_softirqd() to make sure that ksoftirqd
>
> softirqd is only used when the softirq runs for too long or when
> there are no suitable irq exits for a long time.
>
> In normal situations (not excessive time in softirq) they don't
> do anything.
Err, no. Chris is completely correct:
if (!in_interrupt())
wakeup_softirqd();
We can not rely on irqs coming in when the softirq is raised from
thread context. An irq_exit might be faster to process it than the
scheduler can schedule ksoftirqd in, but ksoftirqd is woken and runs
nevertheless. If it finds a softirq pending then it processes them in
it's context and irq_exit calls to softirq are returning right away.
Thanks,
tglx
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: question about softirqs
@ 2009-05-13 14:17 ` Thomas Gleixner
0 siblings, 0 replies; 57+ messages in thread
From: Thomas Gleixner @ 2009-05-13 14:17 UTC (permalink / raw)
To: Andi Kleen
Cc: Peter Zijlstra, netdev, Steven Rostedt, linuxppc-dev, paulus,
Ingo Molnar, David Miller
On Wed, 13 May 2009, Andi Kleen wrote:
> > "If a soft irq is raised in process context, raise_softirq() in
> > kernel/softirq.c calls wakeup_softirqd() to make sure that ksoftirqd
>
> softirqd is only used when the softirq runs for too long or when
> there are no suitable irq exits for a long time.
>
> In normal situations (not excessive time in softirq) they don't
> do anything.
Err, no. Chris is completely correct:
if (!in_interrupt())
wakeup_softirqd();
We can not rely on irqs coming in when the softirq is raised from
thread context. An irq_exit might be faster to process it than the
scheduler can schedule ksoftirqd in, but ksoftirqd is woken and runs
nevertheless. If it finds a softirq pending then it processes them in
it's context and irq_exit calls to softirq are returning right away.
Thanks,
tglx
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: question about softirqs
2009-05-13 14:17 ` Thomas Gleixner
@ 2009-05-13 14:24 ` Andi Kleen
-1 siblings, 0 replies; 57+ messages in thread
From: Andi Kleen @ 2009-05-13 14:24 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Chris Friesen, Ingo Molnar, Peter Zijlstra, Steven Rostedt,
David Miller, linuxppc-dev, paulus, netdev
Thomas Gleixner <tglx@linutronix.de> writes:
> Err, no. Chris is completely correct:
>
> if (!in_interrupt())
> wakeup_softirqd();
Yes you have to wake it up just in case, but it doesn't normally
process the data because a normal softirq comes in faster. It's
just a safety policy.
You can check this by checking the accumulated CPU time on your
ksoftirqs. Mine are all 0 even on long running systems.
The reason Andrea originally added the softirqds was just that
if you have very softirq intensive workloads they would tie
up too much CPU time or not make enough process with the default
"don't loop too often" heuristics.
> We can not rely on irqs coming in when the softirq is raised from
You can't rely on it, but it happens in near all cases.
-Andi
--
ak@linux.intel.com -- Speaking for myself only.
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: question about softirqs
@ 2009-05-13 14:24 ` Andi Kleen
0 siblings, 0 replies; 57+ messages in thread
From: Andi Kleen @ 2009-05-13 14:24 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Peter Zijlstra, netdev, Steven Rostedt, linuxppc-dev, paulus,
Ingo Molnar, David Miller
Thomas Gleixner <tglx@linutronix.de> writes:
> Err, no. Chris is completely correct:
>
> if (!in_interrupt())
> wakeup_softirqd();
Yes you have to wake it up just in case, but it doesn't normally
process the data because a normal softirq comes in faster. It's
just a safety policy.
You can check this by checking the accumulated CPU time on your
ksoftirqs. Mine are all 0 even on long running systems.
The reason Andrea originally added the softirqds was just that
if you have very softirq intensive workloads they would tie
up too much CPU time or not make enough process with the default
"don't loop too often" heuristics.
> We can not rely on irqs coming in when the softirq is raised from
You can't rely on it, but it happens in near all cases.
-Andi
--
ak@linux.intel.com -- Speaking for myself only.
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: question about softirqs
2009-05-13 14:24 ` Andi Kleen
@ 2009-05-13 14:54 ` Eric Dumazet
-1 siblings, 0 replies; 57+ messages in thread
From: Eric Dumazet @ 2009-05-13 14:54 UTC (permalink / raw)
To: Andi Kleen
Cc: Thomas Gleixner, Chris Friesen, Ingo Molnar, Peter Zijlstra,
Steven Rostedt, David Miller, linuxppc-dev, paulus, netdev
Andi Kleen a écrit :
> Thomas Gleixner <tglx@linutronix.de> writes:
>
>
>> Err, no. Chris is completely correct:
>>
>> if (!in_interrupt())
>> wakeup_softirqd();
>
> Yes you have to wake it up just in case, but it doesn't normally
> process the data because a normal softirq comes in faster. It's
> just a safety policy.
>
> You can check this by checking the accumulated CPU time on your
> ksoftirqs. Mine are all 0 even on long running systems.
>
Then its a bug Andi. Its quite easy to trigger ksoftirqd with a Gb ethernet link.
commit f5f293a4e3d0a0c52cec31de6762c95050156516 corrected something
(making mpstat and top correctly display softirq on cpu stats),
but apparently we still have a problem to report correct time on processes,
particularly on ksoftirq/x
I have one machine SMP flooded by network frames, CPU0 handling all
the work, inside ksoftirq/0 (napi processing : almost no more hard interrupts delivered)
Still, top or ps reports no more than 30% of cpu time used by
ksoftirqd, while this cpu only runs ksoftirqd/0 (100% in sirq), and has no idle time.
$ps -fp 4 ; mpstat -P 0 1 10 ; ps -fp 4
UID PID PPID C STIME TTY TIME CMD
root 4 2 1 15:35 ? 00:00:46 [ksoftirqd/0]
Linux 2.6.30-rc5-tip-01595-g6f75dad-dirty (svivoipvnx001) 05/13/2009 _i686_
04:45:01 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle
04:45:02 PM 0 0.00 0.00 0.00 0.00 0.00 100.00 0.00 0.00 0.00
04:45:03 PM 0 0.00 0.00 0.00 0.00 0.00 99.01 0.00 0.00 0.99
04:45:04 PM 0 0.00 0.00 0.00 0.00 0.00 100.00 0.00 0.00 0.00
04:45:05 PM 0 0.00 0.00 0.00 0.00 0.00 100.00 0.00 0.00 0.00
04:45:06 PM 0 0.00 0.00 0.00 0.00 0.00 100.00 0.00 0.00 0.00
04:45:07 PM 0 0.00 0.00 0.00 0.00 0.00 100.00 0.00 0.00 0.00
04:45:08 PM 0 0.00 0.00 0.00 0.00 0.00 100.00 0.00 0.00 0.00
04:45:09 PM 0 0.00 0.00 0.00 0.00 0.00 100.00 0.00 0.00 0.00
04:45:10 PM 0 0.00 0.00 0.00 0.00 0.00 100.00 0.00 0.00 0.00
04:45:11 PM 0 0.00 0.00 0.00 0.00 0.00 100.00 0.00 0.00 0.00
Average: 0 0.00 0.00 0.00 0.00 0.00 99.90 0.00 0.00 0.10
UID PID PPID C STIME TTY TIME CMD
root 4 2 1 15:35 ? 00:00:49 [ksoftirqd/0]
You can see here time consumed by ksoftirqd/0 suring this 10 seconds time frame is *only* 3 seconds.
Therefore, we cannot trust ps, not with current kernel.
# cat /proc/4/stat ; sleep 10 ; cat /proc/4/stat
4 (ksoftirqd/0) R 2 0 0 0 -1 2216730688 0 0 0 0 0 15347 0 0 15 -5 1 0 6 0 0 4294967295 0 0 0 0 0 0 0 2147483647 0 0 0 0 17 0 0 0 0 0 0
4 (ksoftirqd/0) R 2 0 0 0 -1 2216730688 0 0 0 0 0 15670 0 0 15 -5 1 0 6 0 0 4294967295 0 0 0 0 0 0 0 2147483647 0 0 0 0 17 0 0 0 0 0 0
> The reason Andrea originally added the softirqds was just that
> if you have very softirq intensive workloads they would tie
> up too much CPU time or not make enough process with the default
> "don't loop too often" heuristics.
>
>> We can not rely on irqs coming in when the softirq is raised from
>
> You can't rely on it, but it happens in near all cases.
>
> -Andi
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: question about softirqs
@ 2009-05-13 14:54 ` Eric Dumazet
0 siblings, 0 replies; 57+ messages in thread
From: Eric Dumazet @ 2009-05-13 14:54 UTC (permalink / raw)
To: Andi Kleen
Cc: Peter Zijlstra, linuxppc-dev, netdev, Ingo Molnar,
Steven Rostedt, paulus, Thomas Gleixner, David Miller
Andi Kleen a =E9crit :
> Thomas Gleixner <tglx@linutronix.de> writes:
>=20
>=20
>> Err, no. Chris is completely correct:
>>
>> if (!in_interrupt())
>> wakeup_softirqd();
>=20
> Yes you have to wake it up just in case, but it doesn't normally
> process the data because a normal softirq comes in faster. It's
> just a safety policy.=20
>=20
> You can check this by checking the accumulated CPU time on your
> ksoftirqs. Mine are all 0 even on long running systems.
>=20
Then its a bug Andi. Its quite easy to trigger ksoftirqd with a Gb ethern=
et link.
commit f5f293a4e3d0a0c52cec31de6762c95050156516 corrected something
(making mpstat and top correctly display softirq on cpu stats),
but apparently we still have a problem to report correct time on processe=
s,
particularly on ksoftirq/x
I have one machine SMP flooded by network frames, CPU0 handling all
the work, inside ksoftirq/0 (napi processing : almost no more hard interr=
upts delivered)
Still, top or ps reports no more than 30% of cpu time used by
ksoftirqd, while this cpu only runs ksoftirqd/0 (100% in sirq), and has n=
o idle time.
$ps -fp 4 ; mpstat -P 0 1 10 ; ps -fp 4
UID PID PPID C STIME TTY TIME CMD
root 4 2 1 15:35 ? 00:00:46 [ksoftirqd/0]
Linux 2.6.30-rc5-tip-01595-g6f75dad-dirty (svivoipvnx001) 05/13/200=
9 _i686_
04:45:01 PM CPU %usr %nice %sys %iowait %irq %soft %steal =
%guest %idle
04:45:02 PM 0 0.00 0.00 0.00 0.00 0.00 100.00 0.00 =
0.00 0.00
04:45:03 PM 0 0.00 0.00 0.00 0.00 0.00 99.01 0.00 =
0.00 0.99
04:45:04 PM 0 0.00 0.00 0.00 0.00 0.00 100.00 0.00 =
0.00 0.00
04:45:05 PM 0 0.00 0.00 0.00 0.00 0.00 100.00 0.00 =
0.00 0.00
04:45:06 PM 0 0.00 0.00 0.00 0.00 0.00 100.00 0.00 =
0.00 0.00
04:45:07 PM 0 0.00 0.00 0.00 0.00 0.00 100.00 0.00 =
0.00 0.00
04:45:08 PM 0 0.00 0.00 0.00 0.00 0.00 100.00 0.00 =
0.00 0.00
04:45:09 PM 0 0.00 0.00 0.00 0.00 0.00 100.00 0.00 =
0.00 0.00
04:45:10 PM 0 0.00 0.00 0.00 0.00 0.00 100.00 0.00 =
0.00 0.00
04:45:11 PM 0 0.00 0.00 0.00 0.00 0.00 100.00 0.00 =
0.00 0.00
Average: 0 0.00 0.00 0.00 0.00 0.00 99.90 0.00 =
0.00 0.10
UID PID PPID C STIME TTY TIME CMD
root 4 2 1 15:35 ? 00:00:49 [ksoftirqd/0]
You can see here time consumed by ksoftirqd/0 suring this 10 seconds time=
frame is *only* 3 seconds.
Therefore, we cannot trust ps, not with current kernel.
# cat /proc/4/stat ; sleep 10 ; cat /proc/4/stat
4 (ksoftirqd/0) R 2 0 0 0 -1 2216730688 0 0 0 0 0 15347 0 0 15 -5 1 0 6 0=
0 4294967295 0 0 0 0 0 0 0 2147483647 0 0 0 0 17 0 0 0 0 0 0
4 (ksoftirqd/0) R 2 0 0 0 -1 2216730688 0 0 0 0 0 15670 0 0 15 -5 1 0 6 0=
0 4294967295 0 0 0 0 0 0 0 2147483647 0 0 0 0 17 0 0 0 0 0 0
> The reason Andrea originally added the softirqds was just that
> if you have very softirq intensive workloads they would tie
> up too much CPU time or not make enough process with the default
> "don't loop too often" heuristics.=20
>=20
>> We can not rely on irqs coming in when the softirq is raised from
>=20
> You can't rely on it, but it happens in near all cases.
>=20
> -Andi
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: question about softirqs
2009-05-13 14:54 ` Eric Dumazet
@ 2009-05-13 15:02 ` Andi Kleen
-1 siblings, 0 replies; 57+ messages in thread
From: Andi Kleen @ 2009-05-13 15:02 UTC (permalink / raw)
To: Eric Dumazet
Cc: Andi Kleen, Thomas Gleixner, Chris Friesen, Ingo Molnar,
Peter Zijlstra, Steven Rostedt, David Miller, linuxppc-dev,
paulus, netdev
> I have one machine SMP flooded by network frames, CPU0 handling all
Yes that's the case softirqd is supposed to handle. When you
spend a significant part of your CPU time in softirq context it kicks
in to provide somewhat fair additional CPU time.
But most systems (like mine) don't do that.
-Andi
--
ak@linux.intel.com -- Speaking for myself only.
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: question about softirqs
@ 2009-05-13 15:02 ` Andi Kleen
0 siblings, 0 replies; 57+ messages in thread
From: Andi Kleen @ 2009-05-13 15:02 UTC (permalink / raw)
To: Eric Dumazet
Cc: Peter Zijlstra, linuxppc-dev, netdev, Steven Rostedt,
David Miller, Andi Kleen, paulus, Thomas Gleixner, Ingo Molnar
> I have one machine SMP flooded by network frames, CPU0 handling all
Yes that's the case softirqd is supposed to handle. When you
spend a significant part of your CPU time in softirq context it kicks
in to provide somewhat fair additional CPU time.
But most systems (like mine) don't do that.
-Andi
--
ak@linux.intel.com -- Speaking for myself only.
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: question about softirqs
2009-05-13 14:24 ` Andi Kleen
(?)
(?)
@ 2009-05-13 15:05 ` Chris Friesen
2009-05-13 15:54 ` Thomas Gleixner
2009-05-13 17:01 ` Andi Kleen
-1 siblings, 2 replies; 57+ messages in thread
From: Chris Friesen @ 2009-05-13 15:05 UTC (permalink / raw)
To: Andi Kleen
Cc: Peter Zijlstra, netdev, Ingo Molnar, Steven Rostedt,
linuxppc-dev, paulus, Thomas Gleixner, David Miller
Andi Kleen wrote:
> Thomas Gleixner <tglx@linutronix.de> writes:
>>Err, no. Chris is completely correct:
>>
>> if (!in_interrupt())
>> wakeup_softirqd();
>
> Yes you have to wake it up just in case, but it doesn't normally
> process the data because a normal softirq comes in faster. It's
> just a safety policy.
What about the scenario I raised earlier, where we have incoming network
packets, no hardware interrupts coming in other than the timer tick, and
a high-priority userspace app is spinning on recvmsg() with MSG_DONTWAIT
set?
As far as I can tell, in this scenario softirqs may not get processed on
return from a syscall (contradicting the documentation). In the worst
case, they may not get processed until the next timer tick.
Chris
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: question about softirqs
2009-05-13 15:05 ` Chris Friesen
@ 2009-05-13 15:54 ` Thomas Gleixner
2009-05-13 17:01 ` Andi Kleen
1 sibling, 0 replies; 57+ messages in thread
From: Thomas Gleixner @ 2009-05-13 15:54 UTC (permalink / raw)
To: Chris Friesen
Cc: Andi Kleen, Ingo Molnar, Peter Zijlstra, Steven Rostedt,
David Miller, linuxppc-dev, paulus, netdev
On Wed, 13 May 2009, Chris Friesen wrote:
> Andi Kleen wrote:
> > Thomas Gleixner <tglx@linutronix.de> writes:
>
> >>Err, no. Chris is completely correct:
> >>
> >> if (!in_interrupt())
> >> wakeup_softirqd();
> >
> > Yes you have to wake it up just in case, but it doesn't normally
> > process the data because a normal softirq comes in faster. It's
> > just a safety policy.
>
> What about the scenario I raised earlier, where we have incoming network
> packets, no hardware interrupts coming in other than the timer tick, and
> a high-priority userspace app is spinning on recvmsg() with MSG_DONTWAIT
> set?
>
> As far as I can tell, in this scenario softirqs may not get processed on
> return from a syscall (contradicting the documentation). In the worst
> case, they may not get processed until the next timer tick.
Right because your high prio tasks prevents that ksoftirqd runs,
because it can not preempt the high priority task.
Thanks,
tglx
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: question about softirqs
@ 2009-05-13 15:54 ` Thomas Gleixner
0 siblings, 0 replies; 57+ messages in thread
From: Thomas Gleixner @ 2009-05-13 15:54 UTC (permalink / raw)
To: Chris Friesen
Cc: Peter Zijlstra, netdev, Steven Rostedt, linuxppc-dev, Andi Kleen,
paulus, Ingo Molnar, David Miller
On Wed, 13 May 2009, Chris Friesen wrote:
> Andi Kleen wrote:
> > Thomas Gleixner <tglx@linutronix.de> writes:
>
> >>Err, no. Chris is completely correct:
> >>
> >> if (!in_interrupt())
> >> wakeup_softirqd();
> >
> > Yes you have to wake it up just in case, but it doesn't normally
> > process the data because a normal softirq comes in faster. It's
> > just a safety policy.
>
> What about the scenario I raised earlier, where we have incoming network
> packets, no hardware interrupts coming in other than the timer tick, and
> a high-priority userspace app is spinning on recvmsg() with MSG_DONTWAIT
> set?
>
> As far as I can tell, in this scenario softirqs may not get processed on
> return from a syscall (contradicting the documentation). In the worst
> case, they may not get processed until the next timer tick.
Right because your high prio tasks prevents that ksoftirqd runs,
because it can not preempt the high priority task.
Thanks,
tglx
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: question about softirqs
2009-05-13 15:54 ` Thomas Gleixner
(?)
@ 2009-05-13 16:10 ` Chris Friesen
-1 siblings, 0 replies; 57+ messages in thread
From: Chris Friesen @ 2009-05-13 16:10 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Peter Zijlstra, netdev, Steven Rostedt, linuxppc-dev, Andi Kleen,
paulus, Ingo Molnar, David Miller
Thomas Gleixner wrote:
> On Wed, 13 May 2009, Chris Friesen wrote:
>> As far as I can tell, in this scenario softirqs may not get processed on
>> return from a syscall (contradicting the documentation). In the worst
>> case, they may not get processed until the next timer tick.
>
> Right because your high prio tasks prevents that ksoftirqd runs,
> because it can not preempt the high priority task.
Exactly.
I'm suggesting that this point (the idea that softirqs may or may not
get processed on return from syscall depending on relative task
priority) should probably be documented somewhere, because the current
documentation (in the kernel and on the web) doesn't mention it at all.
Maybe I should just submit a patch to
Documentation/DocBook/kernel-hacking.tmpl.
Chris
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: question about softirqs
2009-05-13 15:05 ` Chris Friesen
2009-05-13 15:54 ` Thomas Gleixner
@ 2009-05-13 17:01 ` Andi Kleen
2009-05-13 19:04 ` Chris Friesen
1 sibling, 1 reply; 57+ messages in thread
From: Andi Kleen @ 2009-05-13 17:01 UTC (permalink / raw)
To: Chris Friesen
Cc: Peter Zijlstra, netdev, Ingo Molnar, Steven Rostedt,
linuxppc-dev, Andi Kleen, paulus, Thomas Gleixner, David Miller
On Wed, May 13, 2009 at 09:05:01AM -0600, Chris Friesen wrote:
> Andi Kleen wrote:
> > Thomas Gleixner <tglx@linutronix.de> writes:
>
> >>Err, no. Chris is completely correct:
> >>
> >> if (!in_interrupt())
> >> wakeup_softirqd();
> >
> > Yes you have to wake it up just in case, but it doesn't normally
> > process the data because a normal softirq comes in faster. It's
> > just a safety policy.
>
> What about the scenario I raised earlier, where we have incoming network
> packets,
network packets are normally processed by the network packet interrupt's
softirq or alternatively in the NAPI poll loop.
-Andi
--
ak@linux.intel.com -- Speaking for myself only.
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: question about softirqs
2009-05-13 17:01 ` Andi Kleen
@ 2009-05-13 19:04 ` Chris Friesen
0 siblings, 0 replies; 57+ messages in thread
From: Chris Friesen @ 2009-05-13 19:04 UTC (permalink / raw)
To: Andi Kleen
Cc: Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Steven Rostedt,
David Miller, linuxppc-dev, paulus, netdev
Andi Kleen wrote:
> network packets are normally processed by the network packet interrupt's
> softirq or alternatively in the NAPI poll loop.
If we have a high priority task, ksoftirqd may not get a chance to run.
My point is simply that the documentation says that softirqs are
processed on return from a syscall, and this is not necessarily the case.
Chris
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: question about softirqs
@ 2009-05-13 19:04 ` Chris Friesen
0 siblings, 0 replies; 57+ messages in thread
From: Chris Friesen @ 2009-05-13 19:04 UTC (permalink / raw)
To: Andi Kleen
Cc: Peter Zijlstra, netdev, Ingo Molnar, Steven Rostedt,
linuxppc-dev, paulus, Thomas Gleixner, David Miller
Andi Kleen wrote:
> network packets are normally processed by the network packet interrupt's
> softirq or alternatively in the NAPI poll loop.
If we have a high priority task, ksoftirqd may not get a chance to run.
My point is simply that the documentation says that softirqs are
processed on return from a syscall, and this is not necessarily the case.
Chris
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: question about softirqs
2009-05-13 19:04 ` Chris Friesen
@ 2009-05-13 19:13 ` Andi Kleen
-1 siblings, 0 replies; 57+ messages in thread
From: Andi Kleen @ 2009-05-13 19:13 UTC (permalink / raw)
To: Chris Friesen
Cc: Andi Kleen, Thomas Gleixner, Ingo Molnar, Peter Zijlstra,
Steven Rostedt, David Miller, linuxppc-dev, paulus, netdev
On Wed, May 13, 2009 at 01:04:09PM -0600, Chris Friesen wrote:
> Andi Kleen wrote:
>
> > network packets are normally processed by the network packet interrupt's
> > softirq or alternatively in the NAPI poll loop.
>
> If we have a high priority task, ksoftirqd may not get a chance to run.
In this case the next interrupt will also process them. It will just
go more slowly because interrupts limit the work compared to ksoftirqd.
-Andi
--
ak@linux.intel.com -- Speaking for myself only.
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: question about softirqs
@ 2009-05-13 19:13 ` Andi Kleen
0 siblings, 0 replies; 57+ messages in thread
From: Andi Kleen @ 2009-05-13 19:13 UTC (permalink / raw)
To: Chris Friesen
Cc: Peter Zijlstra, netdev, Ingo Molnar, Steven Rostedt,
linuxppc-dev, Andi Kleen, paulus, Thomas Gleixner, David Miller
On Wed, May 13, 2009 at 01:04:09PM -0600, Chris Friesen wrote:
> Andi Kleen wrote:
>
> > network packets are normally processed by the network packet interrupt's
> > softirq or alternatively in the NAPI poll loop.
>
> If we have a high priority task, ksoftirqd may not get a chance to run.
In this case the next interrupt will also process them. It will just
go more slowly because interrupts limit the work compared to ksoftirqd.
-Andi
--
ak@linux.intel.com -- Speaking for myself only.
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: question about softirqs
2009-05-13 19:13 ` Andi Kleen
(?)
@ 2009-05-13 19:44 ` Chris Friesen
2009-05-13 19:53 ` Andi Kleen
-1 siblings, 1 reply; 57+ messages in thread
From: Chris Friesen @ 2009-05-13 19:44 UTC (permalink / raw)
To: Andi Kleen
Cc: Peter Zijlstra, netdev, Ingo Molnar, Steven Rostedt,
linuxppc-dev, paulus, Thomas Gleixner, David Miller
Andi Kleen wrote:
> On Wed, May 13, 2009 at 01:04:09PM -0600, Chris Friesen wrote:
>> Andi Kleen wrote:
>>
>>> network packets are normally processed by the network packet interrupt's
>>> softirq or alternatively in the NAPI poll loop.
>> If we have a high priority task, ksoftirqd may not get a chance to run.
>
> In this case the next interrupt will also process them. It will just
> go more slowly because interrupts limit the work compared to ksoftirqd.
I realize that they will eventually get processed. My point is that the
documentation (in-kernel, online, and in various books) says that
softirqs will be processed _on the return from a syscall_. As we all
agree, this is not necessarily the case.
Chris
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: question about softirqs
2009-05-13 19:44 ` Chris Friesen
@ 2009-05-13 19:53 ` Andi Kleen
0 siblings, 0 replies; 57+ messages in thread
From: Andi Kleen @ 2009-05-13 19:53 UTC (permalink / raw)
To: Chris Friesen
Cc: Andi Kleen, Thomas Gleixner, Ingo Molnar, Peter Zijlstra,
Steven Rostedt, David Miller, linuxppc-dev, paulus, netdev
On Wed, May 13, 2009 at 01:44:59PM -0600, Chris Friesen wrote:
> Andi Kleen wrote:
> > On Wed, May 13, 2009 at 01:04:09PM -0600, Chris Friesen wrote:
> >> Andi Kleen wrote:
> >>
> >>> network packets are normally processed by the network packet interrupt's
> >>> softirq or alternatively in the NAPI poll loop.
> >> If we have a high priority task, ksoftirqd may not get a chance to run.
> >
> > In this case the next interrupt will also process them. It will just
> > go more slowly because interrupts limit the work compared to ksoftirqd.
>
> I realize that they will eventually get processed. My point is that the
> documentation (in-kernel, online, and in various books) says that
> softirqs will be processed _on the return from a syscall_.
They are. The documentation is correct.
What might not be all processed is all packets that are in the per CPU
backlog queue when the network softirq runs (for non NAPI, for NAPI that's
obsolete anyways). That's because there are limits.
Or when new work comes in in parallel it doesn't process it all.
But that's always the case -- no queue is infinite, so you have
always situations where it can drop or delay items.
-Andi
--
ak@linux.intel.com -- Speaking for myself only.
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: question about softirqs
@ 2009-05-13 19:53 ` Andi Kleen
0 siblings, 0 replies; 57+ messages in thread
From: Andi Kleen @ 2009-05-13 19:53 UTC (permalink / raw)
To: Chris Friesen
Cc: Peter Zijlstra, netdev, Ingo Molnar, Steven Rostedt,
linuxppc-dev, Andi Kleen, paulus, Thomas Gleixner, David Miller
On Wed, May 13, 2009 at 01:44:59PM -0600, Chris Friesen wrote:
> Andi Kleen wrote:
> > On Wed, May 13, 2009 at 01:04:09PM -0600, Chris Friesen wrote:
> >> Andi Kleen wrote:
> >>
> >>> network packets are normally processed by the network packet interrupt's
> >>> softirq or alternatively in the NAPI poll loop.
> >> If we have a high priority task, ksoftirqd may not get a chance to run.
> >
> > In this case the next interrupt will also process them. It will just
> > go more slowly because interrupts limit the work compared to ksoftirqd.
>
> I realize that they will eventually get processed. My point is that the
> documentation (in-kernel, online, and in various books) says that
> softirqs will be processed _on the return from a syscall_.
They are. The documentation is correct.
What might not be all processed is all packets that are in the per CPU
backlog queue when the network softirq runs (for non NAPI, for NAPI that's
obsolete anyways). That's because there are limits.
Or when new work comes in in parallel it doesn't process it all.
But that's always the case -- no queue is infinite, so you have
always situations where it can drop or delay items.
-Andi
--
ak@linux.intel.com -- Speaking for myself only.
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: question about softirqs
2009-05-13 19:53 ` Andi Kleen
@ 2009-05-13 20:55 ` Thomas Gleixner
-1 siblings, 0 replies; 57+ messages in thread
From: Thomas Gleixner @ 2009-05-13 20:55 UTC (permalink / raw)
To: Andi Kleen
Cc: Chris Friesen, Ingo Molnar, Peter Zijlstra, Steven Rostedt,
David Miller, linuxppc-dev, paulus, netdev
On Wed, 13 May 2009, Andi Kleen wrote:
> On Wed, May 13, 2009 at 01:44:59PM -0600, Chris Friesen wrote:
> > Andi Kleen wrote:
> > > On Wed, May 13, 2009 at 01:04:09PM -0600, Chris Friesen wrote:
> > >> Andi Kleen wrote:
> > >>
> > >>> network packets are normally processed by the network packet interrupt's
> > >>> softirq or alternatively in the NAPI poll loop.
> > >> If we have a high priority task, ksoftirqd may not get a chance to run.
> > >
> > > In this case the next interrupt will also process them. It will just
> > > go more slowly because interrupts limit the work compared to ksoftirqd.
> >
> > I realize that they will eventually get processed. My point is that the
> > documentation (in-kernel, online, and in various books) says that
> > softirqs will be processed _on the return from a syscall_.
>
> They are. The documentation is correct.
No, the documentation is wrong for the case that the task, which
raised the softirq and therefor woke up ksoftirqd, has a higher
priority than ksoftirqd. In that case the kernel does _NOT_ schedule
ksoftirqd in the return from syscall path.
And that's all what Chris is pointing out.
Thanks,
tglx
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: question about softirqs
@ 2009-05-13 20:55 ` Thomas Gleixner
0 siblings, 0 replies; 57+ messages in thread
From: Thomas Gleixner @ 2009-05-13 20:55 UTC (permalink / raw)
To: Andi Kleen
Cc: Peter Zijlstra, netdev, Steven Rostedt, linuxppc-dev, paulus,
Ingo Molnar, David Miller
On Wed, 13 May 2009, Andi Kleen wrote:
> On Wed, May 13, 2009 at 01:44:59PM -0600, Chris Friesen wrote:
> > Andi Kleen wrote:
> > > On Wed, May 13, 2009 at 01:04:09PM -0600, Chris Friesen wrote:
> > >> Andi Kleen wrote:
> > >>
> > >>> network packets are normally processed by the network packet interrupt's
> > >>> softirq or alternatively in the NAPI poll loop.
> > >> If we have a high priority task, ksoftirqd may not get a chance to run.
> > >
> > > In this case the next interrupt will also process them. It will just
> > > go more slowly because interrupts limit the work compared to ksoftirqd.
> >
> > I realize that they will eventually get processed. My point is that the
> > documentation (in-kernel, online, and in various books) says that
> > softirqs will be processed _on the return from a syscall_.
>
> They are. The documentation is correct.
No, the documentation is wrong for the case that the task, which
raised the softirq and therefor woke up ksoftirqd, has a higher
priority than ksoftirqd. In that case the kernel does _NOT_ schedule
ksoftirqd in the return from syscall path.
And that's all what Chris is pointing out.
Thanks,
tglx
^ permalink raw reply [flat|nested] 57+ messages in thread