All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ingo Molnar <mingo@elte.hu>
To: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: linuxppc-dev@ozlabs.org, netdev@vger.kernel.org,
	Steven Rostedt <rostedt@goodmis.org>,
	paulus@samba.org, Thomas Gleixner <tglx@linutronix.de>,
	David Miller <davem@davemloft.net>
Subject: Re: question about softirqs
Date: Tue, 12 May 2009 11:23:48 +0200	[thread overview]
Message-ID: <20090512092348.GA29796@elte.hu> (raw)
In-Reply-To: <1242119578.11251.321.camel@twins>


* Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:

> On Tue, 2009-05-12 at 10:12 +0200, Ingo Molnar wrote:
> > * Chris Friesen <cfriesen@nortel.com> wrote:
> > 
> > > This started out as a thread on the ppc list, but on the 
> > > suggestion of DaveM and Paul Mackerras I'm expanding the receiver 
> > > list a bit.
> > > 
> > > Currently, if a softirq is raised in process context the 
> > > TIF_RESCHED_PENDING flag gets set and on return to userspace we 
> > > run the scheduler, expecting it to switch to ksoftirqd to handle 
> > > the softirqd processing.
> > > 
> > > I think I see a possible problem with this. Suppose I have a 
> > > SCHED_FIFO task spinning on recvmsg() with MSG_DONTWAIT set. Under 
> > > the scenario above, schedule() would re-run the spinning task 
> > > rather than ksoftirqd, thus preventing any incoming packets from 
> > > being sent up the stack until we get a real hardware 
> > > interrupt--which could be a whole jiffy if interrupt mitigation is 
> > > enabled in the net device.
> > 
> > TIF_RESCHED_PENDING will not be set if a SCHED_FIFO task wakes up a 
> > SCHED_OTHER ksoftirqd task. But starvation of ksoftirqd processing 
> > will occur.
> > 
> > > DaveM pointed out that if we're doing transmits we're likely to 
> > > hit local_bh_enable(), which would process the softirq work.  
> > > However, I think we may still have a problem in the above rx-only 
> > > scenario--or is it too contrived to matter?
> > 
> > This could occur, and the problem is really that task priorities do 
> > not extend across softirq work processing.
> > 
> > This could occur in ordinary SCHED_OTHER tasks as well, if the 
> > softirq is bounced to ksoftirqd - which it only should be if there's 
> > serious softirq overload - or, as you describe it above, if the 
> > softirq is raised in process context:
> > 
> >         if (!in_interrupt())
> >                 wakeup_softirqd();
> > 
> > that's not really clean. We look into eliminating process context 
> > use of raise_softirq_irqsoff(). Such code sequence:
> > 
> > 	local_irq_save(flags);
> > 	...
> > 	raise_softirq_irqsoff(nr);
> > 	...
> > 	local_irq_restore(flags);
> > 
> > should be converted to something like:
> > 
> > 	local_irq_save(flags);
> > 	...
> > 	raise_softirq_irqsoff(nr);
> > 	...
> > 	local_irq_restore(flags);
> > 	recheck_softirqs();
> > 
> > If someone does not do proper local_bh_disable()/enable() sequences 
> > for micro-optimization reasons, then push the check to after the 
> > critcal section - and dont cause extra reschedules by waking up 
> > ksoftirqd. raise_softirq_irqsoff() will also be faster.
> 
> 
> Wouldn't the even better solution be to get rid of softirqs 
> all-together?
> 
> I see the recent work by Thomas to get threaded interrupts 
> upstream as a good first step towards that goal, once the RX 
> processing is moved to a thread (or multiple threads) one can 
> priorize them in the regular sys_sched_setscheduler() way and its 
> obvious that a FIFO task above the priority of the network tasks 
> will have network starvation issues.

Yeah, that would be "nice". A single IRQ thread plus the process 
context(s) doing networking might perform well.

Multiple IRQ threads (softirq and hardirq threads mixed) i'm not so 
sure about - it's extra context-switching cost.

Btw, i noticed that using scheduling for work (packet, etc.) flow 
distribution standardizes and evens out the behavior of workloads. 
Softirq scheduling is really quite random currently. We have a 
random processing loop-limit in the core code and various batching 
and work-limit controls at individual usage sites. We sometimes 
piggyback to ksoftirqd. It's far easier to keep performance in check 
when things are more predictable.

But this is not an easy endevour, and performance regressions have 
to be expected and addressed if they occur. There can be random 
packet queuing details in networking drivers that just happen to 
work fine now, and might work worse with a kernel thread in place. 
So there has to be broad buy-in for the concept, and a concerted 
effort to eliminate softirq processing and most of hardirq 
processing by pushing those two elements into a single hardirq 
thread (and the rest into process context).

Not for the faint hearted. Nor is it recommended to be done without 
a good layer of asbestos.

	Ingo

  reply	other threads:[~2009-05-12  9:23 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-05-08 22:51 question about softirqs Chris Friesen
2009-05-08 23:05 ` David Miller
2009-05-08 23:34 ` Paul Mackerras
2009-05-08 23:53   ` David Miller
2009-05-09  2:52     ` Benjamin Herrenschmidt
2009-05-09  3:31     ` Paul Mackerras
2009-05-09  6:48       ` David Miller
2009-05-11 18:25         ` Chris Friesen
2009-05-11 23:24           ` David Miller
2009-05-12  0:43             ` Chris Friesen
2009-05-12  8:12               ` Ingo Molnar
2009-05-12  8:12                 ` Ingo Molnar
2009-05-12  9:12                 ` Peter Zijlstra
2009-05-12  9:23                   ` Ingo Molnar [this message]
2009-05-12  9:32                     ` Peter Zijlstra
2009-05-12 12:20                       ` Steven Rostedt
2009-05-12 12:20                         ` Steven Rostedt
2009-05-13  4:45                         ` David Miller
2009-05-13  4:44                     ` David Miller
2009-05-13  4:44                       ` David Miller
2009-05-13  5:15                       ` Paul Mackerras
2009-05-13  5:15                         ` Paul Mackerras
2009-05-13  5:28                         ` David Miller
2009-05-13  5:28                           ` David Miller
2009-05-13  5:55                   ` Evgeniy Polyakov
2009-05-13  5:55                     ` Evgeniy Polyakov
2009-05-12 15:18                 ` Chris Friesen
2009-05-12 15:18                   ` Chris Friesen
2009-05-13  8:34                   ` Andi Kleen
2009-05-13  8:34                     ` Andi Kleen
2009-05-13 13:23                     ` Chris Friesen
2009-05-13 14:15                       ` Andi Kleen
2009-05-13 14:15                         ` Andi Kleen
2009-05-13 14:17                         ` Thomas Gleixner
2009-05-13 14:17                           ` Thomas Gleixner
2009-05-13 14:24                           ` Andi Kleen
2009-05-13 14:24                             ` Andi Kleen
2009-05-13 14:54                             ` Eric Dumazet
2009-05-13 14:54                               ` Eric Dumazet
2009-05-13 15:02                               ` Andi Kleen
2009-05-13 15:02                                 ` Andi Kleen
2009-05-13 15:05                             ` Chris Friesen
2009-05-13 15:54                               ` Thomas Gleixner
2009-05-13 15:54                                 ` Thomas Gleixner
2009-05-13 16:10                                 ` Chris Friesen
2009-05-13 17:01                               ` Andi Kleen
2009-05-13 19:04                                 ` Chris Friesen
2009-05-13 19:04                                   ` Chris Friesen
2009-05-13 19:13                                   ` Andi Kleen
2009-05-13 19:13                                     ` Andi Kleen
2009-05-13 19:44                                     ` Chris Friesen
2009-05-13 19:53                                       ` Andi Kleen
2009-05-13 19:53                                         ` Andi Kleen
2009-05-13 20:55                                         ` Thomas Gleixner
2009-05-13 20:55                                           ` Thomas Gleixner
2009-05-11 23:34           ` Paul Mackerras
2009-05-09  0:28   ` Chris Friesen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090512092348.GA29796@elte.hu \
    --to=mingo@elte.hu \
    --cc=a.p.zijlstra@chello.nl \
    --cc=davem@davemloft.net \
    --cc=linuxppc-dev@ozlabs.org \
    --cc=netdev@vger.kernel.org \
    --cc=paulus@samba.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.