linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* CONFIG_IRQBALANCE for 64-bit x86 ?
@ 2007-11-20  4:12 Mark Lord
  2007-11-20  4:15 ` Ismail Dönmez
  2007-11-20  4:17 ` Nick Piggin
  0 siblings, 2 replies; 38+ messages in thread
From: Mark Lord @ 2007-11-20  4:12 UTC (permalink / raw)
  To: Andrew Morton, Linus Torvalds, Ingo Molnar, Linux Kernel

On 32-bit x86, we have CONFIG_IRQBALANCE available,
but not on 64-bit x86.  Why not?

I ask, because this feature seems almost essential to obtaining
reasonable latencies during heavy I/O with fast devices.

My 32-bit Core2Duo MythTV box drops audio frames without it,
but works perfectly *with* IRQBALANCE.

My QuadCore box works very well in 32-bit mode with IRQBALANCE,
but responsiveness sucks bigtime when run in 64-bit mode (no IRQBALANCE)
during periods of multiple heavy I/O streams (USB flash drives).

That's with both the 32 and 64 bit versions of Kubuntu Gutsy,
so the software uses pretty much identical versions either way.

As near as I can tell, when IRQBALANCE is not configured,
all I/O device interrupts go to CPU#0.

I don't think our CPU scheduler takes that into account when assigning
tasks to CPUs, so anything sent to CPU0 runs with very high latencies.

Or something like that.

Why no IRQ_BALANCE in 64-bit mode ?

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: CONFIG_IRQBALANCE for 64-bit x86 ?
  2007-11-20  4:12 CONFIG_IRQBALANCE for 64-bit x86 ? Mark Lord
@ 2007-11-20  4:15 ` Ismail Dönmez
  2007-11-20  4:17 ` Nick Piggin
  1 sibling, 0 replies; 38+ messages in thread
From: Ismail Dönmez @ 2007-11-20  4:15 UTC (permalink / raw)
  To: Mark Lord; +Cc: Andrew Morton, Linus Torvalds, Ingo Molnar, Linux Kernel

Tuesday 20 November 2007 Tarihinde 06:12:21 yazmıştı:
> On 32-bit x86, we have CONFIG_IRQBALANCE available,
> but not on 64-bit x86.  Why not?
>
> I ask, because this feature seems almost essential to obtaining
> reasonable latencies during heavy I/O with fast devices.
>
> My 32-bit Core2Duo MythTV box drops audio frames without it,
> but works perfectly *with* IRQBALANCE.
>
> My QuadCore box works very well in 32-bit mode with IRQBALANCE,
> but responsiveness sucks bigtime when run in 64-bit mode (no IRQBALANCE)
> during periods of multiple heavy I/O streams (USB flash drives).
>
> That's with both the 32 and 64 bit versions of Kubuntu Gutsy,
> so the software uses pretty much identical versions either way.
>
> As near as I can tell, when IRQBALANCE is not configured,
> all I/O device interrupts go to CPU#0.
>
> I don't think our CPU scheduler takes that into account when assigning
> tasks to CPUs, so anything sent to CPU0 runs with very high latencies.
>
> Or something like that.
>
> Why no IRQ_BALANCE in 64-bit mode ?

Have you tried running irqbalance on userspace? Checkout 
http://irqbalance.org/ . AFAIK CONFIG_IRQBALANCE is deprecated and eats 
battery power.

Regards,
ismail

-- 
Faith is believing what you know isn't so -- Mark Twain

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: CONFIG_IRQBALANCE for 64-bit x86 ?
  2007-11-20  4:12 CONFIG_IRQBALANCE for 64-bit x86 ? Mark Lord
  2007-11-20  4:15 ` Ismail Dönmez
@ 2007-11-20  4:17 ` Nick Piggin
  2007-11-20  4:29   ` Willy Tarreau
                     ` (2 more replies)
  1 sibling, 3 replies; 38+ messages in thread
From: Nick Piggin @ 2007-11-20  4:17 UTC (permalink / raw)
  To: Mark Lord; +Cc: Andrew Morton, Linus Torvalds, Ingo Molnar, Linux Kernel

On Tuesday 20 November 2007 15:12, Mark Lord wrote:
> On 32-bit x86, we have CONFIG_IRQBALANCE available,
> but not on 64-bit x86.  Why not?
>
> I ask, because this feature seems almost essential to obtaining
> reasonable latencies during heavy I/O with fast devices.
>
> My 32-bit Core2Duo MythTV box drops audio frames without it,
> but works perfectly *with* IRQBALANCE.
>
> My QuadCore box works very well in 32-bit mode with IRQBALANCE,
> but responsiveness sucks bigtime when run in 64-bit mode (no IRQBALANCE)
> during periods of multiple heavy I/O streams (USB flash drives).
>
> That's with both the 32 and 64 bit versions of Kubuntu Gutsy,
> so the software uses pretty much identical versions either way.
>
> As near as I can tell, when IRQBALANCE is not configured,
> all I/O device interrupts go to CPU#0.
>
> I don't think our CPU scheduler takes that into account when assigning
> tasks to CPUs, so anything sent to CPU0 runs with very high latencies.
>
> Or something like that.
>
> Why no IRQ_BALANCE in 64-bit mode ?

For that matter, I'd like to know why it has been decided that the
best place for IRQ balancing is in userspace. It should be in kernel
IMO, and it would probably allow better power saving, performance,
fairness, etc. if it were to be integrated with the task balancer as
well.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: CONFIG_IRQBALANCE for 64-bit x86 ?
  2007-11-20  4:17 ` Nick Piggin
@ 2007-11-20  4:29   ` Willy Tarreau
  2007-11-20  4:37     ` Adrian Bunk
  2007-11-20  5:37   ` Arjan van de Ven
  2007-11-20 19:17   ` Andi Kleen
  2 siblings, 1 reply; 38+ messages in thread
From: Willy Tarreau @ 2007-11-20  4:29 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Mark Lord, Andrew Morton, Linus Torvalds, Ingo Molnar, Linux Kernel

On Tue, Nov 20, 2007 at 03:17:15PM +1100, Nick Piggin wrote:
> On Tuesday 20 November 2007 15:12, Mark Lord wrote:
> > On 32-bit x86, we have CONFIG_IRQBALANCE available,
> > but not on 64-bit x86.  Why not?
> >
> > I ask, because this feature seems almost essential to obtaining
> > reasonable latencies during heavy I/O with fast devices.
> >
> > My 32-bit Core2Duo MythTV box drops audio frames without it,
> > but works perfectly *with* IRQBALANCE.
> >
> > My QuadCore box works very well in 32-bit mode with IRQBALANCE,
> > but responsiveness sucks bigtime when run in 64-bit mode (no IRQBALANCE)
> > during periods of multiple heavy I/O streams (USB flash drives).
> >
> > That's with both the 32 and 64 bit versions of Kubuntu Gutsy,
> > so the software uses pretty much identical versions either way.
> >
> > As near as I can tell, when IRQBALANCE is not configured,
> > all I/O device interrupts go to CPU#0.
> >
> > I don't think our CPU scheduler takes that into account when assigning
> > tasks to CPUs, so anything sent to CPU0 runs with very high latencies.
> >
> > Or something like that.
> >
> > Why no IRQ_BALANCE in 64-bit mode ?
> 
> For that matter, I'd like to know why it has been decided that the
> best place for IRQ balancing is in userspace. It should be in kernel
> IMO, and it would probably allow better power saving, performance,
> fairness, etc. if it were to be integrated with the task balancer as
> well.

Agreed. When userspace has something to do with the way IRQs are
delivered, it's going to smell as bad as micro-kernels...

Willy


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: CONFIG_IRQBALANCE for 64-bit x86 ?
  2007-11-20  4:29   ` Willy Tarreau
@ 2007-11-20  4:37     ` Adrian Bunk
  2007-11-20  5:24       ` Nick Piggin
  0 siblings, 1 reply; 38+ messages in thread
From: Adrian Bunk @ 2007-11-20  4:37 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Nick Piggin, Mark Lord, Andrew Morton, Linus Torvalds,
	Ingo Molnar, Linux Kernel

On Tue, Nov 20, 2007 at 05:29:29AM +0100, Willy Tarreau wrote:
> On Tue, Nov 20, 2007 at 03:17:15PM +1100, Nick Piggin wrote:
> > On Tuesday 20 November 2007 15:12, Mark Lord wrote:
> > > On 32-bit x86, we have CONFIG_IRQBALANCE available,
> > > but not on 64-bit x86.  Why not?
> > >
> > > I ask, because this feature seems almost essential to obtaining
> > > reasonable latencies during heavy I/O with fast devices.
> > >
> > > My 32-bit Core2Duo MythTV box drops audio frames without it,
> > > but works perfectly *with* IRQBALANCE.
> > >
> > > My QuadCore box works very well in 32-bit mode with IRQBALANCE,
> > > but responsiveness sucks bigtime when run in 64-bit mode (no IRQBALANCE)
> > > during periods of multiple heavy I/O streams (USB flash drives).
> > >
> > > That's with both the 32 and 64 bit versions of Kubuntu Gutsy,
> > > so the software uses pretty much identical versions either way.
> > >
> > > As near as I can tell, when IRQBALANCE is not configured,
> > > all I/O device interrupts go to CPU#0.
> > >
> > > I don't think our CPU scheduler takes that into account when assigning
> > > tasks to CPUs, so anything sent to CPU0 runs with very high latencies.
> > >
> > > Or something like that.
> > >
> > > Why no IRQ_BALANCE in 64-bit mode ?
> > 
> > For that matter, I'd like to know why it has been decided that the
> > best place for IRQ balancing is in userspace. It should be in kernel
> > IMO, and it would probably allow better power saving, performance,
> > fairness, etc. if it were to be integrated with the task balancer as
> > well.
> 
> Agreed. When userspace has something to do with the way IRQs are
> delivered, it's going to smell as bad as micro-kernels...

The next step to a micro-kernel would then be hardware drivers and file 
systems in userspace?  ;-)

> Willy

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: CONFIG_IRQBALANCE for 64-bit x86 ?
  2007-11-20  4:37     ` Adrian Bunk
@ 2007-11-20  5:24       ` Nick Piggin
  2007-11-20  5:28         ` H. Peter Anvin
  0 siblings, 1 reply; 38+ messages in thread
From: Nick Piggin @ 2007-11-20  5:24 UTC (permalink / raw)
  To: Adrian Bunk
  Cc: Willy Tarreau, Mark Lord, Andrew Morton, Linus Torvalds,
	Ingo Molnar, Linux Kernel

On Tuesday 20 November 2007 15:37, Adrian Bunk wrote:
> On Tue, Nov 20, 2007 at 05:29:29AM +0100, Willy Tarreau wrote:

> > Agreed. When userspace has something to do with the way IRQs are
> > delivered, it's going to smell as bad as micro-kernels...
>
> The next step to a micro-kernel would then be hardware drivers and file
> systems in userspace?  ;-)

We already have those. So the next step would be to pretend the
performance critical ones can be in userspace and remain competitive,
wouldn't it? ;)

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: CONFIG_IRQBALANCE for 64-bit x86 ?
  2007-11-20  5:24       ` Nick Piggin
@ 2007-11-20  5:28         ` H. Peter Anvin
  0 siblings, 0 replies; 38+ messages in thread
From: H. Peter Anvin @ 2007-11-20  5:28 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Adrian Bunk, Willy Tarreau, Mark Lord, Andrew Morton,
	Linus Torvalds, Ingo Molnar, Linux Kernel

Nick Piggin wrote:
> On Tuesday 20 November 2007 15:37, Adrian Bunk wrote:
>> On Tue, Nov 20, 2007 at 05:29:29AM +0100, Willy Tarreau wrote:
> 
>>> Agreed. When userspace has something to do with the way IRQs are
>>> delivered, it's going to smell as bad as micro-kernels...
>> The next step to a micro-kernel would then be hardware drivers and file
>> systems in userspace?  ;-)
> 
> We already have those. So the next step would be to pretend the
> performance critical ones can be in userspace and remain competitive,
> wouldn't it? ;)

Hey, I have a great idea... we can create a microkernel^W hypervisor and 
make a single process^W domain do all the I/O...

	-hpa

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: CONFIG_IRQBALANCE for 64-bit x86 ?
  2007-11-20  4:17 ` Nick Piggin
  2007-11-20  4:29   ` Willy Tarreau
@ 2007-11-20  5:37   ` Arjan van de Ven
  2007-11-20  7:37     ` Nick Piggin
  2007-11-20 19:17   ` Andi Kleen
  2 siblings, 1 reply; 38+ messages in thread
From: Arjan van de Ven @ 2007-11-20  5:37 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Mark Lord, Andrew Morton, Linus Torvalds, Ingo Molnar, Linux Kernel

On Tue, 20 Nov 2007 15:17:15 +1100
Nick Piggin <nickpiggin@yahoo.com.au> wrote:

> On Tuesday 20 November 2007 15:12, Mark Lord wrote:
> > On 32-bit x86, we have CONFIG_IRQBALANCE available,
> > but not on 64-bit x86.  Why not?

because the in-kernel one is actually quite bad.


> > My QuadCore box works very well in 32-bit mode with IRQBALANCE,
> > but responsiveness sucks bigtime when run in 64-bit mode (no
> > IRQBALANCE) during periods of multiple heavy I/O streams (USB flash
> > drives).

please run the userspace irq balancer, see http://www.irqbalance.org
afaik most distros ship that by default anyway.


> > As near as I can tell, when IRQBALANCE is not configured,
> > all I/O device interrupts go to CPU#0.

that depends on your chipset; some chipsets do worse than that.

>
> > I don't think our CPU scheduler takes that into account when
> > assigning tasks to CPUs, so anything sent to CPU0 runs with very
> > high latencies.
> >
> > Or something like that.
> >
> > Why no IRQ_BALANCE in 64-bit mode ?
> 
> For that matter, I'd like to know why it has been decided that the
> best place for IRQ balancing is in userspace. It should be in kernel
> IMO, and it would probably allow better power saving, performance,
> fairness, etc. if it were to be integrated with the task balancer as
> well.

actually.... no. IRQ balancing is not a "fast" decision; every time you
move an interrupt around, you end up causing a really a TON of cache
line bounces, and generally really bad performance (esp if you do it
for networking ones, since you destroy the packet reassembly stuff in
the tcp/ip stack).

Instead, what ends up working is if you do high level categories of
interrupt classes and balance within those (so that no 2 networking
irqs are on the same core/package unless you have more nics than cores)
etc. Balancing on a 10 second scale seems to work quite well; no need
to pull that complexity into the kernel.... 

-- 
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: CONFIG_IRQBALANCE for 64-bit x86 ?
  2007-11-20  5:37   ` Arjan van de Ven
@ 2007-11-20  7:37     ` Nick Piggin
  2007-11-20 14:47       ` Arjan van de Ven
  0 siblings, 1 reply; 38+ messages in thread
From: Nick Piggin @ 2007-11-20  7:37 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Mark Lord, Andrew Morton, Linus Torvalds, Ingo Molnar, Linux Kernel

On Tuesday 20 November 2007 16:37, Arjan van de Ven wrote:
> On Tue, 20 Nov 2007 15:17:15 +1100

> > For that matter, I'd like to know why it has been decided that the
> > best place for IRQ balancing is in userspace. It should be in kernel
> > IMO, and it would probably allow better power saving, performance,
> > fairness, etc. if it were to be integrated with the task balancer as
> > well.
>
> actually.... no. IRQ balancing is not a "fast" decision; every time you

I didn't say anything of the sort. But IRQ load could still fluctuate
a lot more rapidly than we'd like to wake up the irqbalancer.


> move an interrupt around, you end up causing a really a TON of cache
> line bounces, and generally really bad performance

All the more reason why the kernel should do it. When I say move it to
the kernel, I don't mean because I want to move IRQs 1 000 000 times
per second and can't sustain enough context switches to do it in
userspace. Userspace basically has insufficient information to do it
as well as kernel.

We do task balancing in the kernel too, it's a pretty similar problem
(although granted it is less feasible for userspace because tasks are
created and destroyed very often)


> (esp if you do it 
> for networking ones, since you destroy the packet reassembly stuff in
> the tcp/ip stack).
>
> Instead, what ends up working is if you do high level categories of
> interrupt classes and balance within those (so that no 2 networking
> irqs are on the same core/package unless you have more nics than cores)

Sure, but you say that like it is difficult information for the kernel
to know about. Actually it is much easier. Note that you can still
bind interrupts to specific CPUs.


> etc. Balancing on a 10 second scale seems to work quite well; no need
> to pull that complexity into the kernel....

My perspective is that it isn't a good idea to have such a critical
piece of infrastructure outside the kernel.

I want the kernel to balance interrupts and tasks fairly; maybe move
interrupts closer to the tasks they are interacting with (instead of,
or combined with our current policy of moving tasks near the interrupts,
which can be much more damaging for cache and NUMA); move all interrupts
to a single core when there is enough capacity and we are balancing for
power savings; do exponential interrupt balancing backoff when it isn't
required; etc. Not easy to do all that in userspace.

Any reason you actually think it is a good idea, aside from the fact
that a userspace solution was able to be better than a crappy old
kernel one?

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: CONFIG_IRQBALANCE for 64-bit x86 ?
  2007-11-20  7:37     ` Nick Piggin
@ 2007-11-20 14:47       ` Arjan van de Ven
  2007-11-20 15:43         ` Nick Piggin
                           ` (2 more replies)
  0 siblings, 3 replies; 38+ messages in thread
From: Arjan van de Ven @ 2007-11-20 14:47 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Mark Lord, Andrew Morton, Linus Torvalds, Ingo Molnar, Linux Kernel

On Tue, 20 Nov 2007 18:37:39 +1100
Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> > actually.... no. IRQ balancing is not a "fast" decision; every time
> > you
> 
> I didn't say anything of the sort. But IRQ load could still fluctuate
> a lot more rapidly than we'd like to wake up the irqbalancer.

irq load fluctuates by definition. but acting on it faster isn't the
right thing.
> 
> 
> > move an interrupt around, you end up causing a really a TON of cache
> > line bounces, and generally really bad performance
> 
> All the more reason why the kernel should do it. When I say move it to
> the kernel, I don't mean because I want to move IRQs 1 000 000 times
> per second and can't sustain enough context switches to do it in
> userspace. Userspace basically has insufficient information to do it
> as well as kernel.

like what?
Assuming this is a "once every few seconds" decision (and really it is,
esp for networking)....
> 
> 
> > (esp if you do it 
> > for networking ones, since you destroy the packet reassembly stuff
> > in the tcp/ip stack).
> >
> > Instead, what ends up working is if you do high level categories of
> > interrupt classes and balance within those (so that no 2 networking
> > irqs are on the same core/package unless you have more nics than
> > cores)
> 
> Sure, but you say that like it is difficult information for the kernel
> to know about. Actually it is much easier. Note that you can still
> bind interrupts to specific CPUs.

I assume you've read what/how irqbalance does; good luck convincing
people that that kind of policy belongs in the kernel.
> 
> 
> > etc. Balancing on a 10 second scale seems to work quite well; no
> > need to pull that complexity into the kernel....
> 
> My perspective is that it isn't a good idea to have such a critical
> piece of infrastructure outside the kernel.

kernel or kernel source? If there was a good place in the kernel source
I'd not be against moving irqbalance there. In the kernel... not needed.
(also because on single socket machines, the irqbalancer basically has
a one-shot task because there balancing is effectively a static setup)

The same ("critical piece of infrastructure') can be said about other
things, like udev and ... even hal. Nobody is arguing for moving those
into the kernel though....

> 
> I want the kernel to balance interrupts and tasks fairly; 

with irqthreads that will come for free soon.

>maybe move
> interrupts closer to the tasks they are interacting with (instead of,
> or combined with our current policy of moving tasks near the
> interrupts, which can be much more damaging for cache and NUMA);

interrupts and tasks have an N:M relationship.... or sometimes 1:M
where tasks only depend on one irq. Moving the irq around then tends to
be a loss. For NUMA, you actually very likely want the IRQ on the node
that the IO is associdated with.

> move
> all interrupts to a single core when there is enough capacity and we
> are balancing for power savings; 

irqbalance does that today.

>do exponential interrupt balancing
> backoff when it isn't required; etc. Not easy to do all that in
> userspace.
> 
> Any reason you actually think it is a good idea, aside from the fact
> that a userspace solution was able to be better than a crappy old
> kernel one?

I listed a few;
1) it's policy 
2) the memory is only needed for a short time (20 seconds or so) on
single-socket machines
3) it makes decisions on "subjective" information such as interrupt
device classes that the kernel currently just doesn't have (it could
grow that obviously), and is clearly policy information.




-- 
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: CONFIG_IRQBALANCE for 64-bit x86 ?
  2007-11-20 14:47       ` Arjan van de Ven
@ 2007-11-20 15:43         ` Nick Piggin
  2007-11-20 19:07           ` Arjan van de Ven
  2007-11-20 15:47         ` Mark Lord
  2007-11-20 22:01         ` Ingo Molnar
  2 siblings, 1 reply; 38+ messages in thread
From: Nick Piggin @ 2007-11-20 15:43 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Mark Lord, Andrew Morton, Linus Torvalds, Ingo Molnar, Linux Kernel

On Wednesday 21 November 2007 01:47, Arjan van de Ven wrote:
> On Tue, 20 Nov 2007 18:37:39 +1100
>
> Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> > > actually.... no. IRQ balancing is not a "fast" decision; every time
> > > you
> >
> > I didn't say anything of the sort. But IRQ load could still fluctuate
> > a lot more rapidly than we'd like to wake up the irqbalancer.
>
> irq load fluctuates by definition. but acting on it faster isn't the
> right thing.

Of course it is, if you want to effectively use your resources.
Imagine if the task balancer only polled once every 10s.


> > > move an interrupt around, you end up causing a really a TON of cache
> > > line bounces, and generally really bad performance
> >
> > All the more reason why the kernel should do it. When I say move it to
> > the kernel, I don't mean because I want to move IRQs 1 000 000 times
> > per second and can't sustain enough context switches to do it in
> > userspace. Userspace basically has insufficient information to do it
> > as well as kernel.
>
> like what?

Knowledge of wakeup events, runqueue load, task and group fairness
requirements, the task balancer's consolidation of load to fewer cores.


> Assuming this is a "once every few seconds" decision (and really it is,
> esp for networking)....

Definitely not always the case. Sometimes fairness is a top concern, in
which case you probably want a lot better response than the hard coded
10 seconds in the userspace thing.


> > > (esp if you do it
> > > for networking ones, since you destroy the packet reassembly stuff
> > > in the tcp/ip stack).
> > >
> > > Instead, what ends up working is if you do high level categories of
> > > interrupt classes and balance within those (so that no 2 networking
> > > irqs are on the same core/package unless you have more nics than
> > > cores)
> >
> > Sure, but you say that like it is difficult information for the kernel
> > to know about. Actually it is much easier. Note that you can still
> > bind interrupts to specific CPUs.
>
> I assume you've read what/how irqbalance does; good luck convincing
> people that that kind of policy belongs in the kernel.

Lots of code to get topology and device information. Some constants
that make assumptions about the machine it is running on and may or may
not agree with what the task scheduler is trying to do. Some
classification stuff which makes guesses about how a particular bit of
hardware or device driver wants to be balanced. Hacks to poll hotplugging
and topology changes.

I'm still convinced. Who isn't?


> > > etc. Balancing on a 10 second scale seems to work quite well; no
> > > need to pull that complexity into the kernel....
> >
> > My perspective is that it isn't a good idea to have such a critical
> > piece of infrastructure outside the kernel.
>
> kernel or kernel source? If there was a good place in the kernel source
> I'd not be against moving irqbalance there. In the kernel... not needed.
> (also because on single socket machines, the irqbalancer basically has
> a one-shot task because there balancing is effectively a static setup)

I don't think that's a good argument for not having it in kernel.


> The same ("critical piece of infrastructure') can be said about other
> things, like udev and ... even hal. Nobody is arguing for moving those
> into the kernel though....

Maybe because there aren't any good arguments. I have good arguments
for irq balancing, though, which aren't invalidated by this observation.


> > I want the kernel to balance interrupts and tasks fairly;
>
> with irqthreads that will come for free soon.

No it won't. It will balance irqthreads. And irqthreads may not even
exist depending on the configuration.


> >maybe move
> > interrupts closer to the tasks they are interacting with (instead of,
> > or combined with our current policy of moving tasks near the
> > interrupts, which can be much more damaging for cache and NUMA);
>
> interrupts and tasks have an N:M relationship.... or sometimes 1:M
> where tasks only depend on one irq. Moving the irq around then tends to
> be a loss. For NUMA, you actually very likely want the IRQ on the node
> that the IO is associdated with.

And the kernel knows all this intimately. And it isn't always that
straightforward. And even if it were for NUMA, you still have SMP
within NUMA.


> > move
> > all interrupts to a single core when there is enough capacity and we
> > are balancing for power savings;
>
> irqbalance does that today.

To the same core which the task scheduler moves tasks? If so, I missed
that. Still, I guess that's the easiest thing to do.


> >do exponential interrupt balancing
> > backoff when it isn't required; etc. Not easy to do all that in
> > userspace.
> >
> > Any reason you actually think it is a good idea, aside from the fact
> > that a userspace solution was able to be better than a crappy old
> > kernel one?
>
> I listed a few;
> 1) it's policy

I don't think that's such a constructive point. Task balancing is
policy in exactly the same way.


> 2) the memory is only needed for a short time (20 seconds or so) on
> single-socket machines

Actually it could be a good idea for fairness and load balancing
to do it more than for a short time. Isn't it easily possible to
have a single socket, multicore system which can overload all cores
with combined IO (including a fair amount of int processing overhead),
but that often runs within CPU capacity?


> 3) it makes decisions on "subjective" information such as interrupt
> device classes that the kernel currently just doesn't have (it could
> grow that obviously), and is clearly policy information.

I'd argue that the kernel, eg. drivers, subsystems, arch code, knows
about this stuff better than irqbalance does anyway.

More out of place IMO, is irqbalance has things like checking for
NAPI turned on in a driver and in that case it does something specific
according to its knowledge of kernel implementation details.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: CONFIG_IRQBALANCE for 64-bit x86 ?
  2007-11-20 14:47       ` Arjan van de Ven
  2007-11-20 15:43         ` Nick Piggin
@ 2007-11-20 15:47         ` Mark Lord
  2007-11-20 15:52           ` Mark Lord
  2007-11-20 22:01         ` Ingo Molnar
  2 siblings, 1 reply; 38+ messages in thread
From: Mark Lord @ 2007-11-20 15:47 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Nick Piggin, Andrew Morton, Linus Torvalds, Ingo Molnar, Linux Kernel

Arjan van de Ven wrote:
>..
> I listed a few;
> 1) it's policy 
> 2) the memory is only needed for a short time (20 seconds or so) on
> single-socket machines
> 3) it makes decisions on "subjective" information such as interrupt
> device classes that the kernel currently just doesn't have (it could
> grow that obviously), and is clearly policy information.
..

It's much more than just "policy".
Distributing IRQs across available cores is *essential* functionality,
not an optional "extra" as this would have it be.

After reading some of the replies, I installed it on my malfunctioning 64-bit
system, but discovered it does not perform nearly as well as the kernel solution
in the 32-bit system does.

Responsiveness was jerky, and it took a long time to have any noticeable effect.

And in the end, it still just assigned IRQs to two of the four available cores.
Which still results in the task scheduler fighting against IRQs more than necessary.

Much of this could be due to a slow response curve in the userspace balancer (?),
but I have not yet examined it for such bugs.  Hopefully it also is clever enough
to mlock() itself, and to run at a low RT priority ? 

It really does need to respond *quickly* to changes in IRQ load,
as otherwise I see dropouts on sound playback (let along video..) and the like.

The vast majority of Linux machines are "single package", and this software
appears to be designed more for multi package, and doesn't do a great job here
on the single package Intel cores I have (Core2duo, Core2quad).

Cheers

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: CONFIG_IRQBALANCE for 64-bit x86 ?
  2007-11-20 15:47         ` Mark Lord
@ 2007-11-20 15:52           ` Mark Lord
  2007-11-20 16:02             ` Arjan van de Ven
  0 siblings, 1 reply; 38+ messages in thread
From: Mark Lord @ 2007-11-20 15:52 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Nick Piggin, Andrew Morton, Linus Torvalds, Ingo Molnar, Linux Kernel

Mark Lord wrote:
> Arjan van de Ven wrote:
>> ..
>> I listed a few;
>> 1) it's policy 2) the memory is only needed for a short time (20 
>> seconds or so) on
>> single-socket machines
>> 3) it makes decisions on "subjective" information such as interrupt
>> device classes that the kernel currently just doesn't have (it could
>> grow that obviously), and is clearly policy information.
> ..
> 
> It's much more than just "policy".
> Distributing IRQs across available cores is *essential* functionality,
> not an optional "extra" as this would have it be.
> 
> After reading some of the replies, I installed it on my malfunctioning 
> 64-bit
> system, but discovered it does not perform nearly as well as the kernel 
> solution
> in the 32-bit system does.
> 
> Responsiveness was jerky, and it took a long time to have any noticeable 
> effect.
> 
> And in the end, it still just assigned IRQs to two of the four available 
> cores.
> Which still results in the task scheduler fighting against IRQs more 
> than necessary.
> 
> Much of this could be due to a slow response curve in the userspace 
> balancer (?),
> but I have not yet examined it for such bugs.  Hopefully it also is 
> clever enough
> to mlock() itself, and to run at a low RT priority ?
> It really does need to respond *quickly* to changes in IRQ load,
> as otherwise I see dropouts on sound playback (let along video..) and 
> the like.
> 
> The vast majority of Linux machines are "single package", and this software
> appears to be designed more for multi package, and doesn't do a great 
> job here
> on the single package Intel cores I have (Core2duo, Core2quad).
..

All of which reminds me of perhaps *the* most important reason to keep
core functionality like "IRQ distribution" *inside* the kernel:

   It has to pass peer review on this mailing list.

External utilities have no such accountability, and can generally just
follow the whims of their maintainers at the expense of kernel performance.

Not that this may be the case (or not) here, but..

Cheers

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: CONFIG_IRQBALANCE for 64-bit x86 ?
  2007-11-20 15:52           ` Mark Lord
@ 2007-11-20 16:02             ` Arjan van de Ven
  2007-11-20 16:10               ` Mark Lord
  2007-11-20 18:42               ` Mark Lord
  0 siblings, 2 replies; 38+ messages in thread
From: Arjan van de Ven @ 2007-11-20 16:02 UTC (permalink / raw)
  To: Mark Lord
  Cc: Nick Piggin, Andrew Morton, Linus Torvalds, Ingo Molnar, Linux Kernel

On Tue, 20 Nov 2007 10:52:48 -0500
Mark Lord <lkml@rtr.ca> wro
> 
> All of which reminds me of perhaps *the* most important reason to keep
> core functionality like "IRQ distribution" *inside* the kernel:
> 
>    It has to pass peer review on this mailing list.


that's a reason to keep it in the *source*, that's not the same as
keeping it in ring0 pinning down memory all the time etc ;)

-- 
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: CONFIG_IRQBALANCE for 64-bit x86 ?
  2007-11-20 16:02             ` Arjan van de Ven
@ 2007-11-20 16:10               ` Mark Lord
  2007-11-20 18:42               ` Mark Lord
  1 sibling, 0 replies; 38+ messages in thread
From: Mark Lord @ 2007-11-20 16:10 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Nick Piggin, Andrew Morton, Linus Torvalds, Ingo Molnar, Linux Kernel

Arjan van de Ven wrote:
> On Tue, 20 Nov 2007 10:52:48 -0500
> Mark Lord <lkml@rtr.ca> wro
>> All of which reminds me of perhaps *the* most important reason to keep
>> core functionality like "IRQ distribution" *inside* the kernel:
>>
>>    It has to pass peer review on this mailing list.
> 
> 
> that's a reason to keep it in the *source*, that's not the same as
..

Ack.  :)


> keeping it in ring0 pinning down memory all the time etc ;)
..

I belive it *must* remain pinned in memory to be effective,
because I also know it must run much more frequently than it
currently seems to run, in order to respond to quick changes
in IRQ load.

Eg. a heretofore idle device is suddenly now being used to copy
a DVD-sized file around.  It *must* respond quickly to changes
in load like this, or system latencies will suffer badly.

Cheers

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: CONFIG_IRQBALANCE for 64-bit x86 ?
  2007-11-20 16:02             ` Arjan van de Ven
  2007-11-20 16:10               ` Mark Lord
@ 2007-11-20 18:42               ` Mark Lord
  1 sibling, 0 replies; 38+ messages in thread
From: Mark Lord @ 2007-11-20 18:42 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Nick Piggin, Andrew Morton, Linus Torvalds, Ingo Molnar, Linux Kernel

(resending this one to the list).

Arjan van de Ven wrote:
> On Tue, 20 Nov 2007 10:47:24 -0500
> Mark Lord <lkml@rtr.ca> wrote:
..
>> After reading some of the replies, I installed it on my
>> malfunctioning 64-bit system, but discovered it does not perform
>> nearly as well as the kernel solution in the 32-bit system does.
>
> can you send me the output you get from running irqbalance with the
> --debug option? That'll show me what decisions it made and why
..

The next time I'm using that system for large I/O I will try and do so.
But the shortcomings seem rather obvious already (more below).

>> Much of this could be due to a slow response curve in the userspace
>> balancer (?), but I have not yet examined it for such bugs.
>> Hopefully it also is clever enough to mlock() itself, and to run at a
>> low RT priority ? 
>
> there's no need for either of those two.
..

But there is!  If it is not in-memory, then it needs IRQs to be paged-in
before it can redistribute any IRQs.  And when the situation is bad,
the page-in device is one that can be suffering from poor response.
Which just makes the system stutter even more.

>> It really does need to respond *quickly* to changes in IRQ load,
>> as otherwise I see dropouts on sound playback (let along video..) and
>> the like.
>
> the problem is, you cannot respond quickly like that without
> sacrificing huge heaps of performance, especially on networking.
..

You are more expert on that aspect than I am.
But surely networking can be taken into account when
distributing other IRQs dynamically ?

>> The vast majority of Linux machines are "single package", and this
>> software appears to be designed more for multi package, 
>
> it's not. It just right now makes the assumption that on single package
> it can do a good enough job with a static balancing.
> Maybe you've found a case that proves that assumption wrong.
..

I think perhaps the existing algorithm makes the assumptions of
a static configuration of IRQ generating devices, and an unchanging
IRQ average frequency among them.

Neither assumption is valid in a hotplug environment, and the second
assumption is certainly not true on most of my machines.

The lone 64-bit desktop configuration I have here is the only one without
in-kernel IRQ distribution, and it has the fastest (2.5GHz) clock speed,
the largest number (4) of CPU cores, and the most memory (4GB)
of any of the machines here.

And yet it really felt "jerky" in use when copying data around last night,
even after installing/running the userspace irqbalance daemon.

I eventually just moved the work over to a slower machine with a 32-bit
kernel (notebook, 2.1GHz, two cores, 3GB), and things finished more rapidly
and with no noticeable mal effects on the GUI at the time.

The workload in all cases here was plugging in 2GB USB sticks,
and copying a 2GB image to them, and the unplugging/replugging them
and running md5sum to verify correct data transfers.

I had a lot of them (14) to do, and so generally two or three sticks
were plugged in and in use at any given time.

A last note on the quad-core, is that irqbalance *never* used
more than two cores.  Dunno why not.

Cheers



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: CONFIG_IRQBALANCE for 64-bit x86 ?
  2007-11-20 15:43         ` Nick Piggin
@ 2007-11-20 19:07           ` Arjan van de Ven
  2007-11-20 20:02             ` Mark Lord
  2007-11-22  7:54             ` Nick Piggin
  0 siblings, 2 replies; 38+ messages in thread
From: Arjan van de Ven @ 2007-11-20 19:07 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Mark Lord, Andrew Morton, Linus Torvalds, Ingo Molnar, Linux Kernel

On Wed, 21 Nov 2007 02:43:46 +1100
Nick Piggin <nickpiggin@yahoo.com.au> wrote:

> On Wednesday 21 November 2007 01:47, Arjan van de Ven wrote:
> > On Tue, 20 Nov 2007 18:37:39 +1100
> >
> > Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> > > > actually.... no. IRQ balancing is not a "fast" decision; every
> > > > time you
> > >
> > > I didn't say anything of the sort. But IRQ load could still
> > > fluctuate a lot more rapidly than we'd like to wake up the
> > > irqbalancer.
> >
> > irq load fluctuates by definition. but acting on it faster isn't the
> > right thing.
> 
> Of course it is, if you want to effectively use your resources.
> Imagine if the task balancer only polled once every 10s.

but unlike the task balancer, moving an irq is really expensive.
(at least for networking and a few other similar systems)
ANd no it's not just the cache bouncing, it's the entire reassembly of
multiple packets etc etc that gets really messy.


> >
> > I assume you've read what/how irqbalance does; good luck convincing
> > people that that kind of policy belongs in the kernel.
> 
> Lots of code to get topology and device information.

yes this would go away in the kernel

> Some constants
> that make assumptions about the machine it is running on and may or
> may not agree with what the task scheduler is trying to do.

> Some
> classification stuff which makes guesses about how a particular bit of

you misunderstood this; the classification stuff is there to spread
different irqs of similar class (say networking) over multiple
cores/packages. Doing this is a system resource balancing proposition
not just a cpu time one. 

You may think this spreading based on classification is a mistake, but
it's based on the following observation: 
1) servers with multiple network cards serving internet traffic out
really need to load balance their loads; this is for various per-cpu
resource reasons (such as per cpu memory pools) to be evenly used. It
also makes sure that under network spikes on both interfaces, the
response is sane
2) servers with multiple IO devices need this to be spread out, just
think of oracle etc.

for both you could argue "but we could balance this based on actual
observed load in some way", but you can only do that if you rebalance
at a relatively high frequency, which you really don't want to do for
networking and probably even storage.

We used to rebalance this frequently in the 2.4-early kernels based on
a patch from Ingo. Turned out to be a really really bad idea;
performance really tanked.

> hardware or device driver wants to be balanced. Hacks to poll
> hotplugging and topology changes.

"hacks" as in "rescan".. so falls under the topology code and would
indeed be changed to hook into hotplug inside the kernel; just
different complexity.

> 
> I'm still convinced. Who isn't?

I know you can do SOME sort of balancing in the kernel. But please
describe the algorithm you would use; I started out with the same
thought but when it got down to the algorithm to me at least it became
clear "we really don't want this complexity in kernel mode".



> > > > etc. Balancing on a 10 second scale seems to work quite well; no
> > > > need to pull that complexity into the kernel....
> > >
> > > My perspective is that it isn't a good idea to have such a
> > > critical piece of infrastructure outside the kernel.
> >
> > kernel or kernel source? If there was a good place in the kernel
> > source I'd not be against moving irqbalance there. In the kernel...
> > not needed. (also because on single socket machines, the
> > irqbalancer basically has a one-shot task because there balancing
> > is effectively a static setup)
> 
> I don't think that's a good argument for not having it in kernel.

if you don't care about kernel unpagable memory footprint, fine.
Others do.


> > The same ("critical piece of infrastructure') can be said about
> > other things, like udev and ... even hal. Nobody is arguing for
> > moving those into the kernel though....
> 
> Maybe because there aren't any good arguments. I have good arguments
> for irq balancing, though, which aren't invalidated by this
> observation.

I'm not arguing against doing irqbalancing per se (heck that's why I
wrote irqbalance); just that every time I try to do it in kernel the
complexity to get the behavior people (and benchmarks) want turns me
right off that again.

> 
> 
> > > I want the kernel to balance interrupts and tasks fairly;
> >
> > with irqthreads that will come for free soon.
> 
> No it won't. It will balance irqthreads.

and it will know how much cpu they take and it'll move work around to
compensate for any unfairness. CFS is really good at that.

> > >maybe move
> > > interrupts closer to the tasks they are interacting with (instead
> > > of, or combined with our current policy of moving tasks near the
> > > interrupts, which can be much more damaging for cache and NUMA);
> >
> > interrupts and tasks have an N:M relationship.... or sometimes 1:M
> > where tasks only depend on one irq. Moving the irq around then
> > tends to be a loss. For NUMA, you actually very likely want the IRQ
> > on the node that the IO is associdated with.
> 
> And the kernel knows all this intimately. And it isn't always that
> straightforward. And even if it were for NUMA, you still have SMP
> within NUMA.

for now yes. I agree the kernel "knows" this to some form (well it
COULD know). I just don't believe the "extra" information it has is in
practice useful for making decisions on.


> 
> 
> > > move
> > > all interrupts to a single core when there is enough capacity and
> > > we are balancing for power savings;
> >
> > irqbalance does that today.
> 
> To the same core which the task scheduler moves tasks? If so, I missed
> that. Still, I guess that's the easiest thing to do.

yes; the power aware scheduler also moves processes to the first
package .. as does irqbalance.

> >
> > I listed a few;
> > 1) it's policy
> 
> I don't think that's such a constructive point. Task balancing is
> policy in exactly the same way.

not really; CFS has shown that.... the only real policy in task
balancing is the fairness part, and that seems to be general accepted
as the right thing.



> More out of place IMO, is irqbalance has things like checking for
> NAPI turned on in a driver and in that case it does something specific
> according to its knowledge of kernel implementation details.

no it doesn't; it uses "packet counts" to deal with NAPI and other
effects such as irq mitigation to get a more accurate estimate of load
caused by an irq, but it's not fair to call this inappropriate checking
for NAPI being turned on.




-- 
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: CONFIG_IRQBALANCE for 64-bit x86 ?
  2007-11-20  4:17 ` Nick Piggin
  2007-11-20  4:29   ` Willy Tarreau
  2007-11-20  5:37   ` Arjan van de Ven
@ 2007-11-20 19:17   ` Andi Kleen
  2 siblings, 0 replies; 38+ messages in thread
From: Andi Kleen @ 2007-11-20 19:17 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Mark Lord, Andrew Morton, Linus Torvalds, Ingo Molnar, Linux Kernel

Nick Piggin <nickpiggin@yahoo.com.au> writes:
>
> For that matter, I'd like to know why it has been decided that the
> best place for IRQ balancing is in userspace.

There is a lot of possible policy in it 

> It should be in kernel
> IMO, and it would probably allow better power saving, performance,
> fairness, etc. if it were to be integrated with the task balancer as
> well.

Integrating with the task balancer makes really only sense if the
device supports MSI-X and if it does that you don't really need
an irq balancer because you can just send to all CPUs as needed.

Without MSI-X you would be trying to reprogram the interrupts
all the time when a task is migrating and it is highly doubtful
that doing that automatically would do any good.

-Andi


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: CONFIG_IRQBALANCE for 64-bit x86 ?
  2007-11-20 19:07           ` Arjan van de Ven
@ 2007-11-20 20:02             ` Mark Lord
  2007-11-20 21:58               ` Arjan van de Ven
  2007-11-22  7:54             ` Nick Piggin
  1 sibling, 1 reply; 38+ messages in thread
From: Mark Lord @ 2007-11-20 20:02 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Nick Piggin, Andrew Morton, Linus Torvalds, Ingo Molnar, Linux Kernel

Arjan van de Ven wrote:
> On Wed, 21 Nov 2007 02:43:46 +1100
> Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> 
>> On Wednesday 21 November 2007 01:47, Arjan van de Ven wrote:
>>> On Tue, 20 Nov 2007 18:37:39 +1100
>>>
>>> Nick Piggin <nickpiggin@yahoo.com.au> wrote:
>>>>> actually.... no. IRQ balancing is not a "fast" decision; every
>>>>> time you
>>>> I didn't say anything of the sort. But IRQ load could still
>>>> fluctuate a lot more rapidly than we'd like to wake up the
>>>> irqbalancer.
>>> irq load fluctuates by definition. but acting on it faster isn't the
>>> right thing.
>> Of course it is, if you want to effectively use your resources.
>> Imagine if the task balancer only polled once every 10s.
> 
> but unlike the task balancer, moving an irq is really expensive.
> (at least for networking and a few other similar systems)
> ANd no it's not just the cache bouncing, it's the entire reassembly of
> multiple packets etc etc that gets really messy.
> 
> 
>>> I assume you've read what/how irqbalance does; good luck convincing
>>> people that that kind of policy belongs in the kernel.
>> Lots of code to get topology and device information.
> 
> yes this would go away in the kernel
> 
>> Some constants
>> that make assumptions about the machine it is running on and may or
>> may not agree with what the task scheduler is trying to do.
> 
>> Some
>> classification stuff which makes guesses about how a particular bit of
> 
> you misunderstood this; the classification stuff is there to spread
> different irqs of similar class (say networking) over multiple
> cores/packages. Doing this is a system resource balancing proposition
> not just a cpu time one. 
> 
> You may think this spreading based on classification is a mistake, but
> it's based on the following observation: 
> 1) servers with multiple network cards serving internet traffic out
> really need to load balance their loads; this is for various per-cpu
> resource reasons (such as per cpu memory pools) to be evenly used. It
> also makes sure that under network spikes on both interfaces, the
> response is sane
> 2) servers with multiple IO devices need this to be spread out, just
> think of oracle etc.
> 
> for both you could argue "but we could balance this based on actual
> observed load in some way", but you can only do that if you rebalance
> at a relatively high frequency, which you really don't want to do for
> networking and probably even storage.
> 
> We used to rebalance this frequently in the 2.4-early kernels based on
> a patch from Ingo. Turned out to be a really really bad idea;
> performance really tanked.
> 
>> hardware or device driver wants to be balanced. Hacks to poll
>> hotplugging and topology changes.
> 
> "hacks" as in "rescan".. so falls under the topology code and would
> indeed be changed to hook into hotplug inside the kernel; just
> different complexity.
> 
>> I'm still convinced. Who isn't?
> 
> I know you can do SOME sort of balancing in the kernel. But please
> describe the algorithm you would use; I started out with the same
> thought but when it got down to the algorithm to me at least it became
> clear "we really don't want this complexity in kernel mode".
..

Well, for my dualCore notebook, dualCore MythTV box, and QuadCore desktop,
the behaviour of the existing, working, 32-bit kernel IRQBALANCE code
outperforms the userspace utility.

Mostly, I suspect, due to it's much faster response to changing conditions.
That's something the external one could try to match, but at present it
seems tuned specifically for high-traffic network servers, not for the
average notebook or desktop.

Cheers

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: CONFIG_IRQBALANCE for 64-bit x86 ?
  2007-11-20 20:02             ` Mark Lord
@ 2007-11-20 21:58               ` Arjan van de Ven
  2007-11-20 23:17                 ` Mark Lord
  0 siblings, 1 reply; 38+ messages in thread
From: Arjan van de Ven @ 2007-11-20 21:58 UTC (permalink / raw)
  To: Mark Lord
  Cc: Nick Piggin, Andrew Morton, Linus Torvalds, Ingo Molnar, Linux Kernel

On Tue, 20 Nov 2007 15:02:43 -0500
Mark Lord <lkml@rtr.ca> wrote:
> ..
> 
> Well, for my dualCore notebook, dualCore MythTV box, and QuadCore
> desktop, the behaviour of the existing, working, 32-bit kernel
> IRQBALANCE code outperforms the userspace utility.
> 
> Mostly, I suspect, due to it's much faster response to changing
> conditions. That's something the external one could try to match, but
> at present it seems tuned specifically for high-traffic network
> servers, not for the average notebook or desktop.

I'd really like to see what it's doing before commenting on this;
at minimum can you give me the /proc/interrupts of the system?
It might a simple bug or simple missing item, not a total "scratch the
full system".


-- 
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: CONFIG_IRQBALANCE for 64-bit x86 ?
  2007-11-20 14:47       ` Arjan van de Ven
  2007-11-20 15:43         ` Nick Piggin
  2007-11-20 15:47         ` Mark Lord
@ 2007-11-20 22:01         ` Ingo Molnar
  2007-11-20 23:22           ` Mark Lord
  2 siblings, 1 reply; 38+ messages in thread
From: Ingo Molnar @ 2007-11-20 22:01 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Nick Piggin, Mark Lord, Andrew Morton, Linus Torvalds,
	Linux Kernel, H. Peter Anvin, Thomas Gleixner


* Arjan van de Ven <arjan@infradead.org> wrote:

> kernel or kernel source? If there was a good place in the kernel 
> source I'd not be against moving irqbalance there. [...]

would this be a good case study to use klibc and start up irqbalanced 
automatically? I'd love it if we moved more of the 'system support' 
userspace into the kernel proper, to keep it under control. (and to 
simplify the compatibility and QA matrix)

	Ingo

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: CONFIG_IRQBALANCE for 64-bit x86 ?
  2007-11-20 21:58               ` Arjan van de Ven
@ 2007-11-20 23:17                 ` Mark Lord
  0 siblings, 0 replies; 38+ messages in thread
From: Mark Lord @ 2007-11-20 23:17 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Nick Piggin, Andrew Morton, Linus Torvalds, Ingo Molnar, Linux Kernel

Arjan van de Ven wrote:
> On Tue, 20 Nov 2007 15:02:43 -0500
> Mark Lord <lkml@rtr.ca> wrote:
>> ..
>>
>> Well, for my dualCore notebook, dualCore MythTV box, and QuadCore
>> desktop, the behaviour of the existing, working, 32-bit kernel
>> IRQBALANCE code outperforms the userspace utility.
>>
>> Mostly, I suspect, due to it's much faster response to changing
>> conditions. That's something the external one could try to match, but
>> at present it seems tuned specifically for high-traffic network
>> servers, not for the average notebook or desktop.
> 
> I'd really like to see what it's doing before commenting on this;
> at minimum can you give me the /proc/interrupts of the system?
> It might a simple bug or simple missing item, not a total "scratch the
> full system".
..

Next time I'm doing something significant there,
I'll collect some data for you.  Got other work now.

But it does make sense that this mechanism cannot be longterm for
a desktop.  Intensive loads come and go quickly there, and the
interrupt handling has to respond in a timely fashion.

It's not like a server where loads generally increase/decrease gradually.

Cheers


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: CONFIG_IRQBALANCE for 64-bit x86 ?
  2007-11-20 22:01         ` Ingo Molnar
@ 2007-11-20 23:22           ` Mark Lord
  2007-11-20 23:27             ` Ingo Molnar
  2007-11-20 23:28             ` H. Peter Anvin
  0 siblings, 2 replies; 38+ messages in thread
From: Mark Lord @ 2007-11-20 23:22 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Arjan van de Ven, Nick Piggin, Andrew Morton, Linus Torvalds,
	Linux Kernel, H. Peter Anvin, Thomas Gleixner

Ingo Molnar wrote:
> * Arjan van de Ven <arjan@infradead.org> wrote:
> 
>> kernel or kernel source? If there was a good place in the kernel 
>> source I'd not be against moving irqbalance there. [...]
> 
> would this be a good case study to use klibc and start up irqbalanced 
> automatically? I'd love it if we moved more of the 'system support' 
> userspace into the kernel proper, to keep it under control. (and to 
> simplify the compatibility and QA matrix)
..

Perhaps, but this also violates the principle that the kernel
should just *work* with sensible defaults.  I don't use an initrd,
or an initramfs, and have no intention of ever doing so.

I *like* having a single boot image with no unneeded/unwanted complexity.
It's only recently that I've even come round to using some loadable
modules for things like network drivers -- I prefer a single image
for as much as possible (like Linus there).

If putting a C-library and utilities "into the kernel" still leaves
me with a single image file, then.. maybe.  Seems clumsy, though.

Handling interrupts efficiently is a very basic, core function
for any operating system kernel.  With CONFIG_IRQBALANCE=y, Linux is
fine at present.  But that's not available in 64-bit mode,
so we have a deficiency there.

I guess I'll patch it into my kernels soon-ish.

Cheers

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: CONFIG_IRQBALANCE for 64-bit x86 ?
  2007-11-20 23:22           ` Mark Lord
@ 2007-11-20 23:27             ` Ingo Molnar
  2007-11-20 23:33               ` H. Peter Anvin
  2007-11-20 23:28             ` H. Peter Anvin
  1 sibling, 1 reply; 38+ messages in thread
From: Ingo Molnar @ 2007-11-20 23:27 UTC (permalink / raw)
  To: Mark Lord
  Cc: Arjan van de Ven, Nick Piggin, Andrew Morton, Linus Torvalds,
	Linux Kernel, H. Peter Anvin, Thomas Gleixner


* Mark Lord <lkml@rtr.ca> wrote:

> Perhaps, but this also violates the principle that the kernel should 
> just *work* with sensible defaults.  I don't use an initrd, or an 
> initramfs, and have no intention of ever doing so.

nor do i - i was under the impression that klibc was able to work out of 
a bzImage too? Am i wrong?

	Ingo

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: CONFIG_IRQBALANCE for 64-bit x86 ?
  2007-11-20 23:22           ` Mark Lord
  2007-11-20 23:27             ` Ingo Molnar
@ 2007-11-20 23:28             ` H. Peter Anvin
  1 sibling, 0 replies; 38+ messages in thread
From: H. Peter Anvin @ 2007-11-20 23:28 UTC (permalink / raw)
  To: Mark Lord
  Cc: Ingo Molnar, Arjan van de Ven, Nick Piggin, Andrew Morton,
	Linus Torvalds, Linux Kernel, Thomas Gleixner

Mark Lord wrote:
> 
> Perhaps, but this also violates the principle that the kernel
> should just *work* with sensible defaults.  I don't use an initrd,
> or an initramfs, and have no intention of ever doing so.
> 
> I *like* having a single boot image with no unneeded/unwanted complexity.
> It's only recently that I've even come round to using some loadable
> modules for things like network drivers -- I prefer a single image
> for as much as possible (like Linus there).
> 
> If putting a C-library and utilities "into the kernel" still leaves
> me with a single image file, then.. maybe.  Seems clumsy, though.
> 

That was the whole point of klibc, and in fact it was in -mm that way 
for a while.  Linus rejected it at the time on the grounds that it added 
no new features, only moved existing features to userspace.

The unified build tree has since then bitrotted slightly due to lack of 
time on my part, but it wouldn't be hard at all to bring it up to current.

	-hpa

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: CONFIG_IRQBALANCE for 64-bit x86 ?
  2007-11-20 23:27             ` Ingo Molnar
@ 2007-11-20 23:33               ` H. Peter Anvin
  2007-11-20 23:47                 ` Ingo Molnar
  0 siblings, 1 reply; 38+ messages in thread
From: H. Peter Anvin @ 2007-11-20 23:33 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Mark Lord, Arjan van de Ven, Nick Piggin, Andrew Morton,
	Linus Torvalds, Linux Kernel, Thomas Gleixner

Ingo Molnar wrote:
> * Mark Lord <lkml@rtr.ca> wrote:
> 
>> Perhaps, but this also violates the principle that the kernel should 
>> just *work* with sensible defaults.  I don't use an initrd, or an 
>> initramfs, and have no intention of ever doing so.
> 
> nor do i - i was under the impression that klibc was able to work out of 
> a bzImage too? Am i wrong?
> 

Nope.  It runs inside an initramfs, of course; that initramfs is linked 
into the kernel binary.

	-hpa

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: CONFIG_IRQBALANCE for 64-bit x86 ?
  2007-11-20 23:33               ` H. Peter Anvin
@ 2007-11-20 23:47                 ` Ingo Molnar
  2007-11-20 23:50                   ` H. Peter Anvin
  0 siblings, 1 reply; 38+ messages in thread
From: Ingo Molnar @ 2007-11-20 23:47 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Mark Lord, Arjan van de Ven, Nick Piggin, Andrew Morton,
	Linus Torvalds, Linux Kernel, Thomas Gleixner


* H. Peter Anvin <hpa@zytor.com> wrote:

> Ingo Molnar wrote:
>> * Mark Lord <lkml@rtr.ca> wrote:
>>
>>> Perhaps, but this also violates the principle that the kernel should just 
>>> *work* with sensible defaults.  I don't use an initrd, or an initramfs, 
>>> and have no intention of ever doing so.
>>
>> nor do i - i was under the impression that klibc was able to work out of a 
>> bzImage too? Am i wrong?
>>
>
> Nope.  It runs inside an initramfs, of course; that initramfs is 
> linked into the kernel binary.

would be nice to have a single-image variant for all of this. having the 
separate initrd was always trouble - and it's pointless as well. (we 
rarely update the initrd without updating the vmlinuz as well)

	Ingo

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: CONFIG_IRQBALANCE for 64-bit x86 ?
  2007-11-20 23:47                 ` Ingo Molnar
@ 2007-11-20 23:50                   ` H. Peter Anvin
  2007-11-21  0:07                     ` Ingo Molnar
  0 siblings, 1 reply; 38+ messages in thread
From: H. Peter Anvin @ 2007-11-20 23:50 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Mark Lord, Arjan van de Ven, Nick Piggin, Andrew Morton,
	Linus Torvalds, Linux Kernel, Thomas Gleixner

Ingo Molnar wrote:
> * H. Peter Anvin <hpa@zytor.com> wrote:
> 
>> Ingo Molnar wrote:
>>> * Mark Lord <lkml@rtr.ca> wrote:
>>>
>>>> Perhaps, but this also violates the principle that the kernel should just 
>>>> *work* with sensible defaults.  I don't use an initrd, or an initramfs, 
>>>> and have no intention of ever doing so.
>>> nor do i - i was under the impression that klibc was able to work out of a 
>>> bzImage too? Am i wrong?
>>>
>> Nope.  It runs inside an initramfs, of course; that initramfs is 
>> linked into the kernel binary.
> 
> would be nice to have a single-image variant for all of this. having the 
> separate initrd was always trouble - and it's pointless as well. (we 
> rarely update the initrd without updating the vmlinuz as well)
> 

We do.  Am I missing something?

	-hpa

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: CONFIG_IRQBALANCE for 64-bit x86 ?
  2007-11-20 23:50                   ` H. Peter Anvin
@ 2007-11-21  0:07                     ` Ingo Molnar
  2007-11-21  0:20                       ` H. Peter Anvin
  0 siblings, 1 reply; 38+ messages in thread
From: Ingo Molnar @ 2007-11-21  0:07 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Mark Lord, Arjan van de Ven, Nick Piggin, Andrew Morton,
	Linus Torvalds, Linux Kernel, Thomas Gleixner


* H. Peter Anvin <hpa@zytor.com> wrote:

>>> Nope.  It runs inside an initramfs, of course; that initramfs is 
>>> linked into the kernel binary.
>>
>> would be nice to have a single-image variant for all of this. having 
>> the separate initrd was always trouble - and it's pointless as well. 
>> (we rarely update the initrd without updating the vmlinuz as well)
>
> We do.  Am I missing something?

do we have a single-image way of getting both the kernel image and the 
initram set up at once? What i know of is a two-image approach: vmlinuz 
and initrd.

	Ingo

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: CONFIG_IRQBALANCE for 64-bit x86 ?
  2007-11-21  0:07                     ` Ingo Molnar
@ 2007-11-21  0:20                       ` H. Peter Anvin
  2007-11-21  0:36                         ` Ingo Molnar
  0 siblings, 1 reply; 38+ messages in thread
From: H. Peter Anvin @ 2007-11-21  0:20 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Mark Lord, Arjan van de Ven, Nick Piggin, Andrew Morton,
	Linus Torvalds, Linux Kernel, Thomas Gleixner

Ingo Molnar wrote:
> * H. Peter Anvin <hpa@zytor.com> wrote:
> 
>>>> Nope.  It runs inside an initramfs, of course; that initramfs is 
>>>> linked into the kernel binary.
>>> would be nice to have a single-image variant for all of this. having 
>>> the separate initrd was always trouble - and it's pointless as well. 
>>> (we rarely update the initrd without updating the vmlinuz as well)
>> We do.  Am I missing something?
> 
> do we have a single-image way of getting both the kernel image and the 
> initram set up at once? What i know of is a two-image approach: vmlinuz 
> and initrd.
> 

Yes, we do.  The initramfs can be linked into the kernel image.  The 
unified klibc build tree does that by default.

	-hpa

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: CONFIG_IRQBALANCE for 64-bit x86 ?
  2007-11-21  0:20                       ` H. Peter Anvin
@ 2007-11-21  0:36                         ` Ingo Molnar
  2007-11-21  0:47                           ` H. Peter Anvin
  2007-11-21  2:48                           ` Jeff Garzik
  0 siblings, 2 replies; 38+ messages in thread
From: Ingo Molnar @ 2007-11-21  0:36 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Mark Lord, Arjan van de Ven, Nick Piggin, Andrew Morton,
	Linus Torvalds, Linux Kernel, Thomas Gleixner


* H. Peter Anvin <hpa@zytor.com> wrote:

> Ingo Molnar wrote:
>> * H. Peter Anvin <hpa@zytor.com> wrote:
>>
>>>>> Nope.  It runs inside an initramfs, of course; that initramfs is linked 
>>>>> into the kernel binary.
>>>> would be nice to have a single-image variant for all of this. having the 
>>>> separate initrd was always trouble - and it's pointless as well. (we 
>>>> rarely update the initrd without updating the vmlinuz as well)
>>> We do.  Am I missing something?
>>
>> do we have a single-image way of getting both the kernel image and the 
>> initram set up at once? What i know of is a two-image approach: vmlinuz 
>> and initrd.
>>
>
> Yes, we do.  The initramfs can be linked into the kernel image.  The 
> unified klibc build tree does that by default.

argh. Guess i misread your answer:

>>>>>> nor do i - i was under the impression that klibc was able to work 
>>>>>> out of a bzImage too? Am i wrong?
>>>>> Nope.  It runs inside an initramfs, of course; that initramfs is linked 
>>>>> into the kernel binary.

i took that "Nope" as referring to my impression - but you in fact meant 
that i am not wrong? :-) So nothing to see here. single-bzImage initrd 
was and is possible, so we could in fact move chunks of system-related 
userland (such as irqbalanced) into the kernel proper?

	Ingo

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: CONFIG_IRQBALANCE for 64-bit x86 ?
  2007-11-21  0:36                         ` Ingo Molnar
@ 2007-11-21  0:47                           ` H. Peter Anvin
  2007-11-21  2:48                           ` Jeff Garzik
  1 sibling, 0 replies; 38+ messages in thread
From: H. Peter Anvin @ 2007-11-21  0:47 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Mark Lord, Arjan van de Ven, Nick Piggin, Andrew Morton,
	Linus Torvalds, Linux Kernel, Thomas Gleixner

Ingo Molnar wrote:
> 
> i took that "Nope" as referring to my impression - but you in fact meant 
> that i am not wrong? :-) So nothing to see here. single-bzImage initrd 
> was and is possible, so we could in fact move chunks of system-related 
> userland (such as irqbalanced) into the kernel proper?
> 

Yes, it should be quite straightforward.

	-hpa

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: CONFIG_IRQBALANCE for 64-bit x86 ?
  2007-11-21  0:36                         ` Ingo Molnar
  2007-11-21  0:47                           ` H. Peter Anvin
@ 2007-11-21  2:48                           ` Jeff Garzik
  2007-11-21  2:59                             ` H. Peter Anvin
  1 sibling, 1 reply; 38+ messages in thread
From: Jeff Garzik @ 2007-11-21  2:48 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: H. Peter Anvin, Mark Lord, Arjan van de Ven, Nick Piggin,
	Andrew Morton, Linus Torvalds, Linux Kernel, Thomas Gleixner

Ingo Molnar wrote:
> single-bzImage initrd 
> was and is possible,

Correct (though s/initrd/initramfs/).

Take a look at usr/Makefile for how initramfs is automatically included 
in the image, right now.

The intention at the time was to quickly follow up this stub (generated 
by gen_init_cpio) with a full inclusion of klibc + some basics like 
nfsroot.  It should be a very straightforward step to go from what we 
have today to including klibc initramfs into the kernel image.


>  so we could in fact move chunks of system-related 
> userland (such as irqbalanced) into the kernel proper?

s/kernel/kernel tree/ I presume you mean...

With regards to irqbalanced, if you are thinking about including it in 
initramfs, you would need to work out the details of how 
userland/distros modify the default policy configurations.

	Jeff




^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: CONFIG_IRQBALANCE for 64-bit x86 ?
  2007-11-21  2:48                           ` Jeff Garzik
@ 2007-11-21  2:59                             ` H. Peter Anvin
  0 siblings, 0 replies; 38+ messages in thread
From: H. Peter Anvin @ 2007-11-21  2:59 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Ingo Molnar, Mark Lord, Arjan van de Ven, Nick Piggin,
	Andrew Morton, Linus Torvalds, Linux Kernel, Thomas Gleixner

Jeff Garzik wrote:
> 
> Take a look at usr/Makefile for how initramfs is automatically included 
> in the image, right now.
> 
> The intention at the time was to quickly follow up this stub (generated 
> by gen_init_cpio) with a full inclusion of klibc + some basics like 
> nfsroot.  It should be a very straightforward step to go from what we 
> have today to including klibc initramfs into the kernel image.
> 

http://git.kernel.org/?p=linux/kernel/git/hpa/linux-2.6-klibc.git;a=summary

	-hpa

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: CONFIG_IRQBALANCE for 64-bit x86 ?
  2007-11-20 19:07           ` Arjan van de Ven
  2007-11-20 20:02             ` Mark Lord
@ 2007-11-22  7:54             ` Nick Piggin
  2007-11-23 13:09               ` Ingo Molnar
  1 sibling, 1 reply; 38+ messages in thread
From: Nick Piggin @ 2007-11-22  7:54 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Mark Lord, Andrew Morton, Linus Torvalds, Ingo Molnar, Linux Kernel

On Wednesday 21 November 2007 06:07, Arjan van de Ven wrote:
> On Wed, 21 Nov 2007 02:43:46 +1100

> > Of course it is, if you want to effectively use your resources.
> > Imagine if the task balancer only polled once every 10s.
>
> but unlike the task balancer, moving an irq is really expensive.
> (at least for networking and a few other similar systems)
> ANd no it's not just the cache bouncing, it's the entire reassembly of
> multiple packets etc etc that gets really messy.

Actually a blanket statement like that is just wrong. Moving a
network interrupt yes is probably quite expensive, but it is
about the worst case one to move. What's more, moving tasks between
NUMA nodes could easily be many orders of magnitude worse than the
transient slowdown of moving irqs.

Furthermore, what you say doesn't really seem to be an argument
for doing it in userspace or an argument against moving IRQs. It
actually shows that there are complex, hardware and kernel
implementation dependent issues, all of which suggest it is better
to be in kernel.


> > Some constants
> > that make assumptions about the machine it is running on and may or
> > may not agree with what the task scheduler is trying to do.
> >
> > Some
> > classification stuff which makes guesses about how a particular bit of
>
> you misunderstood this; the classification stuff is there to spread
> different irqs of similar class (say networking) over multiple
> cores/packages. Doing this is a system resource balancing proposition
> not just a cpu time one.
>
> You may think this spreading based on classification is a mistake, but
> it's based on the following observation:

No I'm not misunderstanding or think it is a mistake. But it is
something which the kernel and the devices themselves should have
better knowledge of. You have a process which is reading off disk
and sending to a network interface? You may well want to put the
process and the disk interrupt and the network interrupt all on
the same CPU.

[snip]

> We used to rebalance this frequently in the 2.4-early kernels based on
> a patch from Ingo. Turned out to be a really really bad idea;
> performance really tanked.

To reiterate, I do not think that IRQs should be moved more frequently.
I think the kernel is in the position to know far better than userspace
about irq balancing.


> > hardware or device driver wants to be balanced. Hacks to poll
> > hotplugging and topology changes.
>
> "hacks" as in "rescan".. so falls under the topology code and would
> indeed be changed to hook into hotplug inside the kernel; just
> different complexity.

ie. simpler. All the topology stuff would be far simpler.


> > I'm still convinced. Who isn't?
>
> I know you can do SOME sort of balancing in the kernel. But please
> describe the algorithm you would use; I started out with the same
> thought but when it got down to the algorithm to me at least it became
> clear "we really don't want this complexity in kernel mode".

I'd rather not to this far into handwaving. I'm not saying that
I know exactly how it should work right now. I'm questioning the
established viewpoint that irq balancing belongs in userspace.

For that matter, I guess from the results you get, it's not terribly
bad to do in userspace or anything. But I think it can be done in
kernel.

Policy... I think that's a misused argument. The "policy" of any
kernel code I write is to utilise the hardware as efficiently as
possible within restrictions (eg. fairness, permissions). Setting
those restrictions is the realm of userspace, otherwise IMO it is
fine to go in kernel.

Using the same argument, task balancing and even scheduling is
policy, so is page reclaim, page writeback, filesystem block
allocation, etc. Now many of those things can be directed or
restricted somehow from userspace, and in-kernel irq balancing
would be no different.


> > > not needed. (also because on single socket machines, the
> > > irqbalancer basically has a one-shot task because there balancing
> > > is effectively a static setup)
> >
> > I don't think that's a good argument for not having it in kernel.
>
> if you don't care about kernel unpagable memory footprint, fine.
> Others do.

It would be a couple of K, right? I mean it would be probably less than
half the code of irqbalance because of the parsing and topology stuff.

Also, I don't think the one-shot behaviour on single socket machines is
good policy at all, and it can't capture dynamic behaviour at all.


> > > I listed a few;
> > > 1) it's policy
> >
> > I don't think that's such a constructive point. Task balancing is
> > policy in exactly the same way.
>
> not really; CFS has shown that.... the only real policy in task
> balancing is the fairness part,

Ahh, hate to get off topic, but let's not perpetuate this myth.
It wasn't Con, or CFS, or anything that showed fairness is some
great new idea. Actually I was arguing for fairness first,
against both Con and Ingo, way back when the old scheduler was
having so much problems.

Not that I am trying to claim the idea for myself. Fairness is
like the most fundamental and obvious behaviour for any sort of
resource scheduler that I have to laugh when people get "credited"
with this idea.

Back on topic... no, fairness is not the only real policy. Not at
all. Fariness is one of the most important ones, and that is exactly
why it is the default behaviour. After that, deviation from that is
a userspace thing.


> and that seems to be general accepted 
> as the right thing.
>
> > More out of place IMO, is irqbalance has things like checking for
> > NAPI turned on in a driver and in that case it does something specific
> > according to its knowledge of kernel implementation details.
>
> no it doesn't; it uses "packet counts" to deal with NAPI and other
> effects such as irq mitigation to get a more accurate estimate of load
> caused by an irq, but it's not fair to call this inappropriate checking
> for NAPI being turned on.

I don't think it is inappropriate. Obviously it needs to check it
to do something close to the right thing. This to me is yet another
signal that says the kernel is the right place for it.

Anyway, I'm clearly not going to change your mind. But I do have an
idea of the rationale for doing it in userspace now. So if I wanted
to challenge that, I guess I'd have to write code and prove it...


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: CONFIG_IRQBALANCE for 64-bit x86 ?
  2007-11-22  7:54             ` Nick Piggin
@ 2007-11-23 13:09               ` Ingo Molnar
  2007-11-25 10:03                 ` Nick Piggin
  0 siblings, 1 reply; 38+ messages in thread
From: Ingo Molnar @ 2007-11-23 13:09 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Arjan van de Ven, Mark Lord, Andrew Morton, Linus Torvalds, Linux Kernel


* Nick Piggin <nickpiggin@yahoo.com.au> wrote:

> Ahh, hate to get off topic, but let's not perpetuate this myth. It 
> wasn't Con, or CFS, or anything that showed fairness is some great new 
> idea. Actually I was arguing for fairness first, against both Con and 
> Ingo, way back when the old scheduler was having so much problems.
> 
> Not that I am trying to claim the idea for myself. Fairness is like 
> the most fundamental and obvious behaviour for any sort of resource 
> scheduler that I have to laugh when people get "credited" with this 
> idea.

just out of curiosity (and to get my own sense of history corrected), do 
you remember in which thread you said that? (and even better, could you 
dig out any URLs for that thread?)

btw., the question was never really whether fairness was a good idea for 
a resource scheduler - the question was whether _strict fairness_ was a 
good idea for a general purpose OS (and the desktop in particular). My 
point back then was that strict fairness is not good enough and that we 
thus need the interactivity estimator - and i still maintain the first 
half of that position while conceding that i was wrong about the second 
part :-)

I dont think anyone was arguing for a scheduler with no fairness at all 
- but "fairness" indeed was more of an after-thought, not the driving 
principle.

Current CFS uses a modified "sleeper fairness" model (not a strict 
fairness model) via which we in essence replace the effect of the 
interactivity estimator with "sleeper fairness". So in essence we've 
replaced the O(1) scheduler's sleep average code with a deterministic 
sleep average code. This in turn also made the allocation of CPU time 
deterministic throughout. (which in other words can also be called "fair 
allocation of CPU time")

_That_ scheme seems to behave rather well in practice and i think i can 
take credit for _that_ bit ;-) [many people have hacked upon that 
concept and code since then so it's nowhere near "my code" anymore, of 
course.]

	Ingo

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: CONFIG_IRQBALANCE for 64-bit x86 ?
  2007-11-23 13:09               ` Ingo Molnar
@ 2007-11-25 10:03                 ` Nick Piggin
  0 siblings, 0 replies; 38+ messages in thread
From: Nick Piggin @ 2007-11-25 10:03 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Arjan van de Ven, Mark Lord, Andrew Morton, Linus Torvalds, Linux Kernel

On Saturday 24 November 2007 00:09, Ingo Molnar wrote:
> * Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> > Ahh, hate to get off topic, but let's not perpetuate this myth. It
> > wasn't Con, or CFS, or anything that showed fairness is some great new
> > idea. Actually I was arguing for fairness first, against both Con and
> > Ingo, way back when the old scheduler was having so much problems.
> >
> > Not that I am trying to claim the idea for myself. Fairness is like
> > the most fundamental and obvious behaviour for any sort of resource
> > scheduler that I have to laugh when people get "credited" with this
> > idea.
>
> just out of curiosity (and to get my own sense of history corrected), do
> you remember in which thread you said that? (and even better, could you
> dig out any URLs for that thread?)

No, I have no idea except for the vague talking pictures in my noggin ;)
"nicksched", maybe? Or Con's patches on the old scheduler (around 2.6.0
time, was it?)


> btw., the question was never really whether fairness was a good idea for
> a resource scheduler - the question was whether _strict fairness_ was a
> good idea for a general purpose OS (and the desktop in particular). My
> point back then was that strict fairness is not good enough and that we
> thus need the interactivity estimator - and i still maintain the first
> half of that position while conceding that i was wrong about the second
> part :-)

I'm not sure what you mean by strict fairness. Obviously there are
fundamental points where you have to make some heuristic choice about
priority -- process creation/destruction, and sleep patterns in
particular. So yes, you do need decaying priority.

But if all that is applied consistently, it shouldn't be possible for
a process to get more CPU time than another of the same (or more)
demand, over a given period.


> I dont think anyone was arguing for a scheduler with no fairness at all
> - but "fairness" indeed was more of an after-thought, not the driving
> principle.

And actually it was systemically unfair by design ;) That's where
most the bad behavioural corner cases came in.


> Current CFS uses a modified "sleeper fairness" model (not a strict
> fairness model) via which we in essence replace the effect of the
> interactivity estimator with "sleeper fairness". So in essence we've
> replaced the O(1) scheduler's sleep average code with a deterministic
> sleep average code. This in turn also made the allocation of CPU time
> deterministic throughout. (which in other words can also be called "fair
> allocation of CPU time")

Yeah, it's OK I guess. I think it is quite complex -- you're dealing
with a complete heuristic anyway, so while the equations may look nice,
I don't actually know what justifies the equations themselves (not that
*any* scheduler can be completely justified in that way, but...). But
at least there is fairness and some rationale for it.

Nicksched had what I'd call a deterministic sleep average code too
(though much simpler). The big problem it had was that it also had
to scale timeslices back when there were high priority processes on
the runqueue in order to keep latency down while retaining O(1)
scheduling. It was hard or impossible to do exactly right. It would
have been easy with an O(lgn) data structure, though :P


> _That_ scheme seems to behave rather well in practice and i think i can
> take credit for _that_ bit ;-) [many people have hacked upon that
> concept and code since then so it's nowhere near "my code" anymore, of
> course.]

I found that just doing something relatively sane (eg. a simple, fair,
decaying priority system) that doesn't violate the principle of least
surprise (ie. that unix apps and programmers have expected over the
years) has resulted in good behaviour.

I don't really know about taking credit for ideas. Probably your exact
algorithm is unique, but there is a lot of research on CPU schedulers
I have never reviewed, so I can't say. Still, if you came up with it
independently, I guess that is the main thing for one's ego ;)

Still, what I can say with at least one counterexample is that fairness
is not a new concept (curious: wasn't the 2.4 scheduler fair?).

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: CONFIG_IRQBALANCE for 64-bit x86 ?
@ 2007-11-21  2:22 Walt H
  0 siblings, 0 replies; 38+ messages in thread
From: Walt H @ 2007-11-21  2:22 UTC (permalink / raw)
  To: linux-kernel; +Cc: lkml, arjan, nickpiggin

>
> On Tue, 20 Nov 2007 15:17:15 +1100
> Nick Piggin <nickpiggin@yahoo.com.au <mailto:nickpiggin@yahoo.com.au>> wrote:
>
> > On Tuesday 20 November 2007 15:12, Mark Lord wrote:
> > > On 32-bit x86, we have CONFIG_IRQBALANCE available,
> > > but not on 64-bit x86.  Why not?
>
> because the in-kernel one is actually quite bad.
>
>
> > > My QuadCore box works very well in 32-bit mode with IRQBALANCE,
> > > but responsiveness sucks bigtime when run in 64-bit mode (no
> > > IRQBALANCE) during periods of multiple heavy I/O streams (USB flash
> > > drives).
>
> please run the userspace irq balancer, see http://www.irqbalance.org
> afaik most distros ship that by default anyway.

I've been running the daemon for quite some time, however, have noticed 
something on my newest computer.  It's a core2 duo and the IRQ balance 
daemon always exits after some time.  After looking at the source, I see 
it's because dual core/hyperthreaded boxes (single domain caches) always 
get treated as though the --oneshot option were passed and exit after 
the first pass (I assume same thing happens on quad cores?).

Does this not adversely affect IRQ balancing on those CPU's?  If the IRQ 
load of a mostly idle device changes from when the daemon was run, 
wouldn't the inability of the balance to adjust it adversely affect 
performance if the load changes at a later time? I'm used to my old SMP 
box with 2 physical cores, so this is just something I've wondered about 
on the new box.  Thanks,

-Walt



^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2007-11-25 10:03 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-11-20  4:12 CONFIG_IRQBALANCE for 64-bit x86 ? Mark Lord
2007-11-20  4:15 ` Ismail Dönmez
2007-11-20  4:17 ` Nick Piggin
2007-11-20  4:29   ` Willy Tarreau
2007-11-20  4:37     ` Adrian Bunk
2007-11-20  5:24       ` Nick Piggin
2007-11-20  5:28         ` H. Peter Anvin
2007-11-20  5:37   ` Arjan van de Ven
2007-11-20  7:37     ` Nick Piggin
2007-11-20 14:47       ` Arjan van de Ven
2007-11-20 15:43         ` Nick Piggin
2007-11-20 19:07           ` Arjan van de Ven
2007-11-20 20:02             ` Mark Lord
2007-11-20 21:58               ` Arjan van de Ven
2007-11-20 23:17                 ` Mark Lord
2007-11-22  7:54             ` Nick Piggin
2007-11-23 13:09               ` Ingo Molnar
2007-11-25 10:03                 ` Nick Piggin
2007-11-20 15:47         ` Mark Lord
2007-11-20 15:52           ` Mark Lord
2007-11-20 16:02             ` Arjan van de Ven
2007-11-20 16:10               ` Mark Lord
2007-11-20 18:42               ` Mark Lord
2007-11-20 22:01         ` Ingo Molnar
2007-11-20 23:22           ` Mark Lord
2007-11-20 23:27             ` Ingo Molnar
2007-11-20 23:33               ` H. Peter Anvin
2007-11-20 23:47                 ` Ingo Molnar
2007-11-20 23:50                   ` H. Peter Anvin
2007-11-21  0:07                     ` Ingo Molnar
2007-11-21  0:20                       ` H. Peter Anvin
2007-11-21  0:36                         ` Ingo Molnar
2007-11-21  0:47                           ` H. Peter Anvin
2007-11-21  2:48                           ` Jeff Garzik
2007-11-21  2:59                             ` H. Peter Anvin
2007-11-20 23:28             ` H. Peter Anvin
2007-11-20 19:17   ` Andi Kleen
2007-11-21  2:22 Walt H

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).