linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* RE: userspace irq balancer
@ 2003-05-21 21:43 Nakajima, Jun
  2003-05-22  0:29 ` Gerrit Huizenga
  0 siblings, 1 reply; 76+ messages in thread
From: Nakajima, Jun @ 2003-05-21 21:43 UTC (permalink / raw)
  To: jamesclv; +Cc: haveblue, pbadari, linux-kernel, gh, johnstul, mannthey

Again, since the userland is using /proc/irq/N/smp_affinity, the in-kernel one won't touch whatever settings done by the userlannd. So I don't think we have issues here - if the userland has more knowledge, then it simply uses binding. If not, use the generic but dumb one in the kernel. Same thing as scheduling. If the dumb one has a critical problem, we should fix it.

At the same time, I don't believe a single almighty userland policy exists, either. One might need to write or modify his program to do the best for the system anyway. Or a very simple script might just work fine. 

Jun
> -----Original Message-----
> From: James Cleverdon [mailto:jamesclv@us.ibm.com]
> Sent: Wednesday, May 21, 2003 7:27 AM
> To: David S. Miller; akpm@digeo.com
> Cc: arjanv@redhat.com; haveblue@us.ibm.com; wli@holomorphy.com;
> pbadari@us.ibm.com; linux-kernel@vger.kernel.org; gh@us.ibm.com;
> johnstul@us.ibm.com; mannthey@us.ibm.com
> Subject: Re: userspace irq balancer
> 
> On Tuesday 20 May 2003 05:22 pm, David S. Miller wrote:
> >    From: Andrew Morton <akpm@digeo.com>
> >    Date: Tue, 20 May 2003 02:17:12 -0700
> >
> >    Concerns have been expressed that the /proc interface may be a bit
> racy.
> >    One thing we do need to do is to write a /proc stresstest tool which
> > pokes numbers into the /proc files at high rates, run that under traffic
> > for a few hours.
> >
> > This issue is %100 independant of whether policy belongs in the
> > kernel or not.  Also, the /proc race problem exists and should be
> > fixed regardless.
> >
> >    Nobody has tried improving the current balancer.
> >
> > Policy does not belong in the kernel.  I don't care what algorithm
> > people decide to use, but such decisions do NOT belong in the kernel.
> 
> You keep saying that, but suppose I want to try HW IRQ balancing using the
> TPR
> registers.  How could I do that from userspace?  And if I could, wouldn't
> the
> benefit of real time IRQ routing be lost?
> 
> It seems to me that only long term interrupt policy can be done from
> userland.
> Anything that does fast responses to fluctuating load must be inside the
> kernel.
> 
> At the moment we don't do any fast IRQ policy.  Even the original
> irq_balance
> only looked for idle CPUs after an interrupt was serviced.  However,
> suppose
> you had a P4 with hyperthreading turned on.  If an IRQ is to be delivered
> to
> the main thread but it is busy and its sibling is idle, why shouldn't we
> deliver the interrupt to the idle sibling?  They both share the same
> caches,
> etc, so cache warmth isn't a problem.
> 
> --
> James Cleverdon
> IBM xSeries Linux Solutions
> {jamesclv(Unix, preferred), cleverdj(Notes)} at us dot ibm dot com
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 76+ messages in thread
* RE: userspace irq balancer
@ 2003-05-24  1:10 Nakajima, Jun
  0 siblings, 0 replies; 76+ messages in thread
From: Nakajima, Jun @ 2003-05-24  1:10 UTC (permalink / raw)
  To: jamesclv
  Cc: Gerrit Huizenga, haveblue, pbadari, linux-kernel, johnstul,
	mannthey, Andrew Theurer

> So, I suppose an argument could be made for setting the TPR to the vector
> number on entry of do_IRQ.  I don't think that would be a good idea.  

I agree. If we start spl-like ranking of interrupts, we need to modify disable/enable_irq(), etc. as well, causing possible impacts to device derivers.

One thing that might be helpful here is to have 4-level of priorities, for example:
Idle (0)
User (0x10)
Kernel (0x20)
Interrupt (0x30)

Jun

> -----Original Message-----
> From: James Cleverdon [mailto:jamesclv@us.ibm.com]
> Sent: Thursday, May 22, 2003 8:30 AM
> To: William Lee Irwin III
> Cc: Gerrit Huizenga; Nakajima, Jun; haveblue@us.ibm.com;
> pbadari@us.ibm.com; linux-kernel@vger.kernel.org; johnstul@us.ibm.com;
> mannthey@us.ibm.com; Andrew Theurer
> Subject: Re: userspace irq balancer
> 
> On Thursday 22 May 2003 07:43 am, William Lee Irwin III wrote:
> > On Thu, May 22, 2003 at 07:18:06AM -0700, James Cleverdon wrote:
> > > Here's my old very stupid TPR patch .  It lacks TPRing soft ints for
> > > kernel preemption, etc.  Because the xTPR logic only compares the top
> > > nibble of the TPR and I don't want to mask out IRQs unnecessarily, it
> > > only tracks busy/idle and IRQ/no-IRQ.
> > > Simple enough for you, Bill?   8^)
> >
> > Simple enough, yes. But I hesitate to endorse it without making sure
> > it's not too simple.
> >
> > It's much closer to the right direction, which is actually following
> > hardware docs and then punting the fancy (potentially more performant)
> > bits up into userspace. When properly tuned, it should actually have a
> > useful interaction with explicit irq balancing via retargeting IO-APIC
> > RTE destinations as interrupts targeted at a destination specifying
> > multiple cpus won't always target a single cpu when TPR's are adjusted.
> >
> > The only real issue with the TPR is that it's an spl-like ranking of
> > interrupts, assuming a static prioritization based on vector number.
> > That doesn't really agree with the Linux model and is undesirable in
> > various scenarios; however, it's how the hardware works and so can't
> > be avoided (and the disastrous attempt to avoid it didn't DTRT anyway).
> >
> >
> > -- wli
> 
> Serial APICs have always had a spl-like effect built into them.  The
> effective
> TPR value of a given local APIC is:
> 	max(TPR, highest vector currently in progress) & 0xF0
> Parallel APICs don't do that because they don't have serial priority
> arbitration; instead they use the xTPRs in the bridge chips.
> 
> So, I suppose an argument could be made for setting the TPR to the vector
> number on entry of do_IRQ.  I don't think that would be a good idea.  It
> could interfere with IRQ nesting during a non-DMA IDE interrupt handler.
> And
> of course, an IRQ's vector has little to do with the IRQ itself, thanks to
> the vector hashing scheme used to avoid the (stupid) 2 latches per APIC
> level
> HW limitation of most i586 and i686 CPUs.
> 
> 
> --
> James Cleverdon
> IBM xSeries Linux Solutions
> {jamesclv(Unix, preferred), cleverdj(Notes)} at us dot ibm dot com

^ permalink raw reply	[flat|nested] 76+ messages in thread
* Re: userspace irq balancer
@ 2003-05-21 16:31 James Bottomley
  2003-05-21 20:16 ` Arjan van de Ven
  0 siblings, 1 reply; 76+ messages in thread
From: James Bottomley @ 2003-05-21 16:31 UTC (permalink / raw)
  To: Linux Kernel

I'm interested in using this for voyager.  However, I have a problem in
that voyager may have CPUs that can't accept interrupts (this is global
on voyager, but may be per-interrupt on NUMA like systems).  I think
before we move to a userspace solution, some thought about how to cope
with this is needed.

I have several suggestions:

1. Place the masks into /proc/irq/<n>/smp_affinity at start of day and
have the userspace irqbalancer take this as the maximal mask

2. Have a separate file /proc/irq/<n>/mask(?) to expose the mask always

3. Some other method...

Comments would be welcome

James



^ permalink raw reply	[flat|nested] 76+ messages in thread
* RE: userspace irq balancer
@ 2003-05-20 15:41 Nakajima, Jun
  2003-05-21 13:54 ` James Cleverdon
  0 siblings, 1 reply; 76+ messages in thread
From: Nakajima, Jun @ 2003-05-20 15:41 UTC (permalink / raw)
  To: Martin J. Bligh, David S. Miller
  Cc: haveblue, wli, arjanv, pbadari, linux-kernel, gh, johnstul,
	jamesclv, akpm, mannthey

The in-kernel load balance does not move IRQs that are bound to particular CPUs. If the user-level can do it better, just set affinity, and I believe that is the current implementation.

The in-kernel one is simply trying to emulate functionality in the chipset, thus it's not so intelligent, of course. The major reason we need to do it in software is that x86 Linux does not update TPR(s).

Thanks,
Jun

> -----Original Message-----
> From: Martin J. Bligh [mailto:mbligh@aracnet.com]
> Sent: Tuesday, May 20, 2003 7:01 AM
> To: David S. Miller
> Cc: haveblue@us.ibm.com; wli@holomorphy.com; arjanv@redhat.com;
> pbadari@us.ibm.com; linux-kernel@vger.kernel.org; gh@us.ibm.com;
> johnstul@us.ibm.com; jamesclv@us.ibm.com; akpm@digeo.com;
> mannthey@us.ibm.com
> Subject: Re: userspace irq balancer
> 
> > How does the in-kernel IRQ load balancing measure "load" and
> > "busyness"?  Herein lies the most absolutely fundamental problem with
> > this code, it fails to recognize that we end up with most of our
> > networking "load" from softint context.
> 
> OK, that's a great observation, and probably fixable. What were the
> author's comments when you told him that?
> 
> > rm -rf in-kernel-irqbalance;
> 
> It's *very* late in the day to be ripping out such chunks of code.
> 1. Prove new code works better for you => make it a config option.
> 2. Prove new code works better for everyone => rip it out.
> 
> I think we're at 1, not 2.
> 
> Note that the userspace stuff doesn't even require that the kernel
> stuff be disabled ... it should just override it (I can believe
> there maybe is a bug that needs fixing, but it works by design).
> 
> M.
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 76+ messages in thread
[parent not found: <200305191314.06216.pbadari@us.ibm.com>]

end of thread, other threads:[~2003-06-13 18:08 UTC | newest]

Thread overview: 76+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-05-21 21:43 userspace irq balancer Nakajima, Jun
2003-05-22  0:29 ` Gerrit Huizenga
2003-05-22  1:28   ` Martin J. Bligh
2003-05-22  1:44     ` Gerrit Huizenga
2003-05-22  2:03       ` William Lee Irwin III
2003-05-22  2:04   ` William Lee Irwin III
2003-05-22  2:12     ` Zwane Mwaikambo
2003-05-22  3:57     ` Martin J. Bligh
2003-05-22 17:24       ` Bill Davidsen
2003-05-22 22:44         ` David S. Miller
2003-05-26 22:24           ` Andrea Arcangeli
2003-05-26 23:26             ` Andrew Morton
2003-05-26 23:34               ` Andrea Arcangeli
2003-05-26 23:43                 ` David S. Miller
     [not found]                   ` <20030527000639.GA3767@dualathlon.random>
2003-05-27  0:15                     ` David S. Miller
2003-05-27  0:41                       ` Andrea Arcangeli
2003-05-27  0:48                         ` David S. Miller
2003-05-27  1:09                           ` Andrea Arcangeli
2003-05-27  1:13                             ` David S. Miller
2003-05-27  1:26                               ` Andrea Arcangeli
2003-05-27  6:11                                 ` David S. Miller
2003-05-27 11:53                                   ` Andrea Arcangeli
2003-05-27 22:04                                     ` David S. Miller
2003-05-27 22:27                                       ` Andrea Arcangeli
2003-05-27 23:55                                         ` David S. Miller
2003-06-13  6:22                                         ` David S. Miller
2003-06-13 18:23                                           ` Andrea Arcangeli
2003-05-27  1:16                             ` Dave Jones
2003-05-27  1:17                               ` David S. Miller
2003-05-27  9:07                                 ` Arjan van de Ven
2003-05-27  9:10                                   ` David S. Miller
2003-05-27  1:28                               ` Andrea Arcangeli
2003-05-27  1:53                           ` William Lee Irwin III
2003-05-27  1:59                             ` Andrew Morton
2003-05-27  2:10                               ` William Lee Irwin III
2003-05-27  2:15                                 ` Zwane Mwaikambo
2003-05-27  2:44                                   ` William Lee Irwin III
2003-05-27  2:45                                     ` Zwane Mwaikambo
2003-05-27  4:22                                       ` William Lee Irwin III
2003-05-27  2:15                               ` Andrea Arcangeli
2003-05-27  2:14                             ` Andrea Arcangeli
2003-05-27  2:26                               ` William Lee Irwin III
2003-05-27  1:17                 ` Andrea Arcangeli
2003-05-27  1:20                   ` David S. Miller
2003-05-27  1:33                     ` Andrea Arcangeli
2003-05-22 14:18     ` James Cleverdon
2003-05-22 14:43       ` William Lee Irwin III
2003-05-22 15:30         ` James Cleverdon
2003-05-22 15:45           ` William Lee Irwin III
  -- strict thread matches above, loose matches on Subject: below --
2003-05-24  1:10 Nakajima, Jun
2003-05-21 16:31 James Bottomley
2003-05-21 20:16 ` Arjan van de Ven
2003-05-20 15:41 Nakajima, Jun
2003-05-21 13:54 ` James Cleverdon
2003-05-21 22:56   ` Zwane Mwaikambo
     [not found] <200305191314.06216.pbadari@us.ibm.com>
2003-05-19 22:07 ` Dave Hansen
2003-05-19 22:11   ` Arjan van de Ven
2003-05-19 22:22     ` Dave Hansen
2003-05-20  3:25       ` David S. Miller
2003-05-20  3:46         ` William Lee Irwin III
2003-05-20  5:03           ` Dave Hansen
2003-05-20  5:53             ` Martin J. Bligh
2003-05-20  6:13               ` David S. Miller
2003-05-20  6:36                 ` Dave Hansen
2003-05-20  6:40                   ` David S. Miller
2003-05-20 14:07                     ` Andrew Theurer
2003-05-20 14:21                       ` Jeff Garzik
2003-05-20 14:35                         ` Andrew Theurer
     [not found]                       ` <20030520.163833.104040023.davem@redhat.com>
2003-05-21 14:58                         ` Martin J. Bligh
2003-05-21 22:55                           ` David S. Miller
2003-05-21 11:00                     ` Kai Bankett
2003-05-20 14:01                 ` Martin J. Bligh
2003-05-20  9:00             ` Arjan van de Ven
2003-05-20  9:14               ` William Lee Irwin III
2003-05-20  9:17               ` Andrew Morton
     [not found]                 ` <20030520.172230.102567463.davem@redhat.com>
2003-05-21 14:27                   ` James Cleverdon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).